From 33b3ef9846d1c4b6ccbe3f135cd41592543c32cb Mon Sep 17 00:00:00 2001 From: ndokutovich Date: Thu, 30 Apr 2026 11:08:56 +0300 Subject: [PATCH 1/2] =?UTF-8?q?Record:=20V21=20+=20N-gram=20Tilt=20+=20Lea?= =?UTF-8?q?kyReLU=200.3=20=E2=80=94=20val=5Fbpb=201.05851=20(3-seed)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 3-seed mean val_bpb = 1.05851479 (std 0.000762, seeds 42/0/1234) on track_10min_16mb. Stack: - PR #1945 (alertcat) V21 base = PR #1908 + AWQ-Lite + AsymLogit Rescale - PR #1953 (andrewbaggio1) TTT/QK env knobs (TTT_LR=0.75, QK_GAIN=5.25, no_qv mask) - PR #1948 (TimS-ml + lijuncheng16) LeakyReLU squared slope 0.3 - PR #1145 (AnirudhRahul, valerio-endorsed) closed-form n-gram tilt with Σ P=1 Z renormalization Compliance: causal hints, single-pass, Σ P=1 by construction, no SLOT, no n-gram cache, no Pre-Quant TTT. System deps: gcc + lrzip auto-installed by setup.sh; PyTorch 2.9.1 + Triton + Flash Attn 3. One-command reproduction: bash setup.sh SEED={42,0,1234} bash run.sh --- .../README.md | 77 + .../lossless_caps.py | 833 +++ .../online_ngram_state.c | 433 ++ .../online_ngram_tilt.py | 386 ++ .../prepare_caseops_data.py | 229 + .../requirements.txt | 12 + .../run.sh | 72 + .../setup.sh | 100 + .../submission.json | 11 + ...pe_lossless_caps_caseops_v1_reserved.model | Bin 0 -> 366510 bytes .../train_gpt.py | 4293 ++++++++++++ .../train_seed0.log | 5846 ++++++++++++++++ .../train_seed1234.log | 5846 ++++++++++++++++ .../train_seed42.log | 5847 +++++++++++++++++ 14 files changed, 23985 insertions(+) create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/lossless_caps.py create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_state.c create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_tilt.py create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/prepare_caseops_data.py create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/requirements.txt create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/run.sh create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/setup.sh create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/submission.json create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_gpt.py create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed0.log create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed1234.log create mode 100644 records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed42.log diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md new file mode 100644 index 0000000000..fa52e7cb3a --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md @@ -0,0 +1,77 @@ +# Record: V21 Stack + N-gram Tilt + LeakyReLU 0.3 — val_bpb 1.05851 (3-seed) + +**val_bpb = 1.05851479** (3-seed mean, std 0.000762, seeds 42 / 0 / 1234) on `track_10min_16mb`. + +## Per-seed + +| seed | val_bpb | eval ops ms | artifact bytes | +|-----:|------------:|------------:|---------------:| +| 42 | 1.05764263 | 575,915 | 15,949,305 | +| 0 | 1.05886205 | 553,279 | ~15,943,000 | +| 1234 | 1.05903968 | 554,723 | ~15,945,000 | +| **mean** | **1.05851479** | — | — | +| **std** | **0.000762** | — | — | + +## Stack + +1. **PR #1945** (@alertcat): V21 base = PR #1908 + AWQ-Lite mixed-precision GPTQ + Asymmetric Logit Rescale. +2. **PR #1953** (@andrewbaggio1): TTT/QK env knobs — `TTT_LR=0.75`, `QK_GAIN_INIT=5.25`, `TTT_NO_QV_MASK=1`. (2560 long-context dropped due to OOM during global-SGD allreduce on this 8×H100 80GB SXM provisioning; remaining 7 knobs preserved.) +3. **PR #1948** (@TimS-ml, @lijuncheng16): LeakyReLU squared slope 0.3 patch (4-point sweep min identified by PR #1948). +4. **PR #1145** (@AnirudhRahul, valerio-endorsed): closed-form n-gram tilt with three causal experts (token order 16, within-doc, word order 4) and Σ P=1 closed-form Z renormalization. + +The static n-gram hint table is built in a single L→R causal pass over val tokens during `validate()` setup (env flag `NGRAM_HINT_PRECOMPUTE_OUTSIDE=1`, default). Setting the flag to 0 reproduces the inline build path with identical val_bpb. + +## Compliance + +- Train ≤ 600,000 ms, eval ops ≤ 600,000 ms, artifact ≤ 16,000,000 bytes per seed. +- Standard log-softmax over the SP8192 alphabet at every scored position; tilt is closed-form `p'(a) = exp(β·1[a=h]) · p(a) / Z`, `Z = 1 + q · (e^β − 1)`, Σ p'(a) = 1 over vocab. +- Single-pass: each val token contributes exactly one BPB term in `quantized_ttt_phased`. +- N-gram hints are strictly causal: hint at position t depends only on tokens [0..t−1]. +- No SLOT, no n-gram cache hash table, no logit bias, no ETLB, no Pre-Quant TTT. + +## Δ vs neighbors (3-seed) + +| Submission | val_bpb | Δ vs ours | +|------------|--------:|----------:| +| **This submission** | **1.05851** | — | +| PR #1953 (andrewbaggio1) | 1.05855 | +0.00004 | +| PR #1945 (alertcat) | 1.05943 | +0.00092 | +| PR #1934 (liujshi) | 1.05993 | +0.00142 | +| PR #1956 (AayushBaniya2006) | 1.06044 | +0.00193 | +| PR #1908 (romeerp) | 1.06081 | +0.00230 | + +## System dependencies + +- gcc + lrzip (`apt-get install -y build-essential lrzip` on Debian/Ubuntu). +- Python: `torch==2.9.1`, triton (bundled), Flash Attention 3, numpy, sentencepiece, tiktoken, kernels, datasets, huggingface-hub, typing-extensions==4.15.0. See `requirements.txt`. +- 8× H100 80GB SXM. +- CASEOPS-preprocessed FineWeb10B data (run `prepare_caseops_data.py` once). + +## Reproduction + +``` +bash setup.sh # apt + pip + Flash Attn 3 +python prepare_caseops_data.py # one-time, ~10-20 min CPU +SEED=42 bash run.sh +SEED=0 bash run.sh +SEED=1234 bash run.sh +``` + +## Credits + +- **PR #1145** (@AnirudhRahul): closed-form n-gram tilt with Σ P=1 Z_t renormalization, three causal experts. +- **PR #1948** (@TimS-ml, @lijuncheng16): LeakyReLU squared slope 0.3 sweep finding. +- **PR #1953** (@andrewbaggio1): 7-knob TTT/QK tuning on V21 base. +- **PR #1945** (@alertcat): V21 stack composition. +- **PR #1908** (@romeerp): activation-aware GPTQ mixed precision. +- **PR #1923** (@jorge-asenjo): Asymmetric Logit Rescale. +- **PR #1855** (@codemath3000): SP8192 CaseOps + 9-hyperparameter greedy stack base. +- **PR #1493** (@dexhunter et al.): score-first TTT framework foundation. +- **PR #549** (@abaybektursun): original score-first TTT. + +## Files + +- `train_gpt.py`, `online_ngram_tilt.py`, `online_ngram_state.c`, `lossless_caps.py`, `prepare_caseops_data.py` +- `tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model` +- `setup.sh`, `run.sh`, `requirements.txt` +- `train_seed{42,0,1234}.log` diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/lossless_caps.py b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/lossless_caps.py new file mode 100644 index 0000000000..98e472f824 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/lossless_caps.py @@ -0,0 +1,833 @@ +"""Lossless capitalization pre-encoding helpers. + +This module provides a narrow, reversible transform that only touches +ASCII capital letters `A-Z`. Each uppercase ASCII letter is rewritten as +``, where `sentinel` is a private-use Unicode +character that is escaped by doubling if it appears literally in the +input text. + +Example with the default sentinel `\\uE000`: + + "The NASA Launch" -> "\\uE000the \\uE000n\\uE000a\\uE000s\\uE000a \\uE000launch" + +The transform is intentionally simple for v1: + +- lowercase ASCII letters are unchanged +- uppercase ASCII letters become sentinel + lowercase letter +- non-ASCII characters are left untouched +- literal sentinel characters are escaped as sentinel + sentinel + +This makes the transform exactly invertible while allowing a downstream +tokenizer to reuse lowercase subwords across case variants. +""" + +from __future__ import annotations + +import json +from pathlib import Path +from typing import Callable, Iterable + +LOSSLESS_CAPS_V1 = "lossless_caps_v1" +LOSSLESS_CAPS_V2 = "lossless_caps_v2" +LOSSLESS_CAPS_V3 = "lossless_caps_v3" +LOSSLESS_CAPS_V4 = "lossless_caps_v4" +LOSSLESS_CAPS_V5 = "lossless_caps_v5" +LOSSLESS_CAPS_V6 = "lossless_caps_v6" +LOSSLESS_CAPS_V7 = "lossless_caps_v7" +LOSSLESS_CAPS_CASEOPS_V1 = "lossless_caps_caseops_v1" +IDENTITY = "identity" +DEFAULT_SENTINEL = "\uE000" +DEFAULT_V2_TITLE = "\uE001" +DEFAULT_V2_ALLCAPS = "\uE002" +DEFAULT_V2_CAPNEXT = "\uE003" +DEFAULT_V2_ESC = "\uE004" +DEFAULT_V5_TITLE_MIN_LEN = 7 +DEFAULT_V6_ALLCAPS_MIN_LEN = 3 +DEFAULT_V7_ALLCAPS_MIN_LEN = 4 + + +class LosslessCapsError(ValueError): + """Raised when a transformed string is malformed.""" + + +def _is_ascii_upper(ch: str) -> bool: + return "A" <= ch <= "Z" + + +def _is_ascii_lower(ch: str) -> bool: + return "a" <= ch <= "z" + + +def _is_ascii_alpha(ch: str) -> bool: + return _is_ascii_lower(ch) or _is_ascii_upper(ch) + + +def _validate_distinct_single_chars(*chars: str) -> None: + if any(len(ch) != 1 for ch in chars): + raise ValueError("all control characters must be exactly one character") + if len(set(chars)) != len(chars): + raise ValueError("control characters must be distinct") + + +def encode_lossless_caps_v1(text: str, *, sentinel: str = DEFAULT_SENTINEL) -> str: + """Encode ASCII capitals reversibly using a one-character sentinel.""" + if len(sentinel) != 1: + raise ValueError("sentinel must be exactly one character") + out: list[str] = [] + for ch in text: + if ch == sentinel: + out.append(sentinel) + out.append(sentinel) + elif _is_ascii_upper(ch): + out.append(sentinel) + out.append(ch.lower()) + else: + out.append(ch) + return "".join(out) + + +def decode_lossless_caps_v1(text: str, *, sentinel: str = DEFAULT_SENTINEL) -> str: + """Decode the `lossless_caps_v1` transform back to the original text.""" + if len(sentinel) != 1: + raise ValueError("sentinel must be exactly one character") + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch != sentinel: + out.append(ch) + i += 1 + continue + if i + 1 >= n: + raise LosslessCapsError("dangling capitalization sentinel at end of string") + nxt = text[i + 1] + if nxt == sentinel: + out.append(sentinel) + elif _is_ascii_lower(nxt): + out.append(nxt.upper()) + else: + raise LosslessCapsError( + f"invalid sentinel escape sequence {sentinel + nxt!r}; " + "expected doubled sentinel or sentinel + lowercase ASCII letter" + ) + i += 2 + return "".join(out) + + +def encode_lossless_caps_v2( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + capnext: str = DEFAULT_V2_CAPNEXT, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Encode ASCII word capitalization with cheap word-level markers. + + Rules over maximal ASCII alphabetic runs: + - lowercase words stay unchanged + - TitleCase words become `title + lowercase(word)` + - ALLCAPS words become `allcaps + lowercase(word)` + - mixed-case words use: + - optional `title` when the first letter is uppercase + - `capnext + lowercase(letter)` for subsequent uppercase letters + - literal control characters are escaped as `esc + literal` + """ + _validate_distinct_single_chars(title, allcaps, capnext, esc) + controls = {title, allcaps, capnext, esc} + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch in controls: + out.append(esc) + out.append(ch) + i += 1 + continue + if not _is_ascii_alpha(ch): + out.append(ch) + i += 1 + continue + + j = i + 1 + while j < n and _is_ascii_alpha(text[j]): + j += 1 + word = text[i:j] + lower_word = word.lower() + + if word.islower(): + out.append(word) + elif len(word) >= 2 and word.isupper(): + out.append(allcaps) + out.append(lower_word) + elif _is_ascii_upper(word[0]) and word[1:].islower(): + out.append(title) + out.append(lower_word) + else: + if _is_ascii_upper(word[0]): + out.append(title) + out.append(lower_word[0]) + for orig_ch, lower_ch in zip(word[1:], lower_word[1:], strict=True): + if _is_ascii_upper(orig_ch): + out.append(capnext) + out.append(lower_ch) + i = j + return "".join(out) + + +def decode_lossless_caps_v2( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + capnext: str = DEFAULT_V2_CAPNEXT, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v2` transform back to the original text.""" + _validate_distinct_single_chars(title, allcaps, capnext, esc) + out: list[str] = [] + pending_escape = False + pending_word_mode: str | None = None + active_allcaps = False + pending_capnext = False + in_ascii_word = False + + for ch in text: + if pending_escape: + if pending_word_mode is not None and not _is_ascii_alpha(ch): + raise LosslessCapsError("escaped control char cannot satisfy pending word capitalization mode") + out.append(ch) + pending_escape = False + if _is_ascii_alpha(ch): + in_ascii_word = True + else: + in_ascii_word = False + active_allcaps = False + continue + + if ch == esc: + pending_escape = True + continue + if ch == title: + if pending_word_mode is not None or in_ascii_word or pending_capnext: + raise LosslessCapsError("invalid title marker placement") + pending_word_mode = "title" + continue + if ch == allcaps: + if pending_word_mode is not None or in_ascii_word or pending_capnext: + raise LosslessCapsError("invalid allcaps marker placement") + pending_word_mode = "allcaps" + continue + if ch == capnext: + if pending_capnext: + raise LosslessCapsError("duplicate capnext marker") + pending_capnext = True + continue + + if _is_ascii_alpha(ch): + at_word_start = not in_ascii_word + if at_word_start: + if pending_word_mode == "allcaps": + out.append(ch.upper()) + active_allcaps = True + elif pending_word_mode == "title": + out.append(ch.upper()) + elif pending_capnext: + out.append(ch.upper()) + else: + out.append(ch) + pending_word_mode = None + pending_capnext = False + in_ascii_word = True + continue + + if pending_word_mode is not None: + raise LosslessCapsError("word capitalization marker leaked into the middle of a word") + if active_allcaps: + out.append(ch.upper()) + elif pending_capnext: + out.append(ch.upper()) + else: + out.append(ch) + pending_capnext = False + continue + + if pending_word_mode is not None or pending_capnext: + raise LosslessCapsError("capitalization marker not followed by an ASCII letter") + out.append(ch) + in_ascii_word = False + active_allcaps = False + + if pending_escape: + raise LosslessCapsError("dangling escape marker at end of string") + if pending_word_mode is not None or pending_capnext: + raise LosslessCapsError("dangling capitalization marker at end of string") + return "".join(out) + + +def encode_lossless_caps_v3( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Encode only common word-level capitalization patterns. + + Rules over maximal ASCII alphabetic runs: + - lowercase words stay unchanged + - TitleCase words become `title + lowercase(word)` + - ALLCAPS words become `allcaps + lowercase(word)` + - all other mixed-case words are left unchanged + - literal control characters are escaped as `esc + literal` + """ + _validate_distinct_single_chars(title, allcaps, esc) + controls = {title, allcaps, esc} + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch in controls: + out.append(esc) + out.append(ch) + i += 1 + continue + if not _is_ascii_alpha(ch): + out.append(ch) + i += 1 + continue + + j = i + 1 + while j < n and _is_ascii_alpha(text[j]): + j += 1 + word = text[i:j] + + if word.islower(): + out.append(word) + elif len(word) >= 2 and word.isupper(): + out.append(allcaps) + out.append(word.lower()) + elif _is_ascii_upper(word[0]) and word[1:].islower(): + out.append(title) + out.append(word.lower()) + else: + out.append(word) + i = j + return "".join(out) + + +def decode_lossless_caps_v3( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v3` transform back to the original text.""" + _validate_distinct_single_chars(title, allcaps, esc) + out: list[str] = [] + pending_escape = False + pending_word_mode: str | None = None + active_allcaps = False + in_ascii_word = False + + for ch in text: + if pending_escape: + if pending_word_mode is not None and not _is_ascii_alpha(ch): + raise LosslessCapsError("escaped control char cannot satisfy pending word capitalization mode") + out.append(ch) + pending_escape = False + if _is_ascii_alpha(ch): + in_ascii_word = True + else: + in_ascii_word = False + active_allcaps = False + continue + + if ch == esc: + pending_escape = True + continue + if ch == title: + if pending_word_mode is not None or in_ascii_word: + raise LosslessCapsError("invalid title marker placement") + pending_word_mode = "title" + continue + if ch == allcaps: + if pending_word_mode is not None or in_ascii_word: + raise LosslessCapsError("invalid allcaps marker placement") + pending_word_mode = "allcaps" + continue + + if _is_ascii_alpha(ch): + at_word_start = not in_ascii_word + if at_word_start: + if pending_word_mode == "allcaps": + out.append(ch.upper()) + active_allcaps = True + elif pending_word_mode == "title": + out.append(ch.upper()) + else: + out.append(ch) + pending_word_mode = None + in_ascii_word = True + continue + + if pending_word_mode is not None: + raise LosslessCapsError("word capitalization marker leaked into the middle of a word") + out.append(ch.upper() if active_allcaps else ch) + continue + + if pending_word_mode is not None: + raise LosslessCapsError("capitalization marker not followed by an ASCII letter") + out.append(ch) + in_ascii_word = False + active_allcaps = False + + if pending_escape: + raise LosslessCapsError("dangling escape marker at end of string") + if pending_word_mode is not None: + raise LosslessCapsError("dangling capitalization marker at end of string") + return "".join(out) + + +def encode_lossless_caps_v4( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Encode only ALLCAPS ASCII words, leaving all other case untouched.""" + _validate_distinct_single_chars(allcaps, esc) + controls = {allcaps, esc} + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch in controls: + out.append(esc) + out.append(ch) + i += 1 + continue + if not _is_ascii_alpha(ch): + out.append(ch) + i += 1 + continue + j = i + 1 + while j < n and _is_ascii_alpha(text[j]): + j += 1 + word = text[i:j] + if len(word) >= 2 and word.isupper(): + out.append(allcaps) + out.append(word.lower()) + else: + out.append(word) + i = j + return "".join(out) + + +def decode_lossless_caps_v4( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v4` transform back to the original text.""" + _validate_distinct_single_chars(allcaps, esc) + out: list[str] = [] + pending_escape = False + pending_allcaps = False + in_ascii_word = False + active_allcaps = False + + for ch in text: + if pending_escape: + if pending_allcaps and not _is_ascii_alpha(ch): + raise LosslessCapsError("escaped control char cannot satisfy pending allcaps mode") + out.append(ch) + pending_escape = False + if _is_ascii_alpha(ch): + in_ascii_word = True + else: + in_ascii_word = False + active_allcaps = False + continue + + if ch == esc: + pending_escape = True + continue + if ch == allcaps: + if pending_allcaps or in_ascii_word: + raise LosslessCapsError("invalid allcaps marker placement") + pending_allcaps = True + continue + + if _is_ascii_alpha(ch): + if not in_ascii_word: + active_allcaps = pending_allcaps + pending_allcaps = False + in_ascii_word = True + out.append(ch.upper() if active_allcaps else ch) + continue + + if pending_allcaps: + raise LosslessCapsError("allcaps marker not followed by an ASCII letter") + out.append(ch) + in_ascii_word = False + active_allcaps = False + + if pending_escape: + raise LosslessCapsError("dangling escape marker at end of string") + if pending_allcaps: + raise LosslessCapsError("dangling allcaps marker at end of string") + return "".join(out) + + +def encode_lossless_caps_v5( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, + title_min_len: int = DEFAULT_V5_TITLE_MIN_LEN, +) -> str: + """Encode ALLCAPS words and only sufficiently long TitleCase words.""" + _validate_distinct_single_chars(title, allcaps, esc) + controls = {title, allcaps, esc} + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch in controls: + out.append(esc) + out.append(ch) + i += 1 + continue + if not _is_ascii_alpha(ch): + out.append(ch) + i += 1 + continue + j = i + 1 + while j < n and _is_ascii_alpha(text[j]): + j += 1 + word = text[i:j] + if len(word) >= 2 and word.isupper(): + out.append(allcaps) + out.append(word.lower()) + elif len(word) >= title_min_len and _is_ascii_upper(word[0]) and word[1:].islower(): + out.append(title) + out.append(word.lower()) + else: + out.append(word) + i = j + return "".join(out) + + +def decode_lossless_caps_v5( + text: str, + *, + title: str = DEFAULT_V2_TITLE, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v5` transform back to the original text.""" + return decode_lossless_caps_v3(text, title=title, allcaps=allcaps, esc=esc) + + +def encode_lossless_caps_v6( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, + allcaps_min_len: int = DEFAULT_V6_ALLCAPS_MIN_LEN, +) -> str: + """Encode only ALLCAPS words with length >= allcaps_min_len.""" + _validate_distinct_single_chars(allcaps, esc) + controls = {allcaps, esc} + out: list[str] = [] + i = 0 + n = len(text) + while i < n: + ch = text[i] + if ch in controls: + out.append(esc) + out.append(ch) + i += 1 + continue + if not _is_ascii_alpha(ch): + out.append(ch) + i += 1 + continue + j = i + 1 + while j < n and _is_ascii_alpha(text[j]): + j += 1 + word = text[i:j] + if len(word) >= allcaps_min_len and word.isupper(): + out.append(allcaps) + out.append(word.lower()) + else: + out.append(word) + i = j + return "".join(out) + + +def decode_lossless_caps_v6( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v6` transform back to the original text.""" + return decode_lossless_caps_v4(text, allcaps=allcaps, esc=esc) + + +def encode_lossless_caps_v7( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, + allcaps_min_len: int = DEFAULT_V7_ALLCAPS_MIN_LEN, +) -> str: + """Encode only ALLCAPS words with length >= 4.""" + return encode_lossless_caps_v6( + text, + allcaps=allcaps, + esc=esc, + allcaps_min_len=allcaps_min_len, + ) + + +def decode_lossless_caps_v7( + text: str, + *, + allcaps: str = DEFAULT_V2_ALLCAPS, + esc: str = DEFAULT_V2_ESC, +) -> str: + """Decode the `lossless_caps_v7` transform back to the original text.""" + return decode_lossless_caps_v6(text, allcaps=allcaps, esc=esc) + + +def get_text_transform(name: str | None) -> Callable[[str], str]: + """Return the forward text transform for the given config name.""" + normalized = IDENTITY if name in {None, "", IDENTITY} else str(name) + if normalized == IDENTITY: + return lambda text: text + if normalized == LOSSLESS_CAPS_V1: + return encode_lossless_caps_v1 + if normalized == LOSSLESS_CAPS_V2: + return encode_lossless_caps_v2 + if normalized == LOSSLESS_CAPS_V3: + return encode_lossless_caps_v3 + if normalized == LOSSLESS_CAPS_V4: + return encode_lossless_caps_v4 + if normalized == LOSSLESS_CAPS_V5: + return encode_lossless_caps_v5 + if normalized == LOSSLESS_CAPS_V6: + return encode_lossless_caps_v6 + if normalized == LOSSLESS_CAPS_V7: + return encode_lossless_caps_v7 + if normalized == LOSSLESS_CAPS_CASEOPS_V1: + return encode_lossless_caps_v2 + raise ValueError(f"unsupported text_transform={name!r}") + + +def get_text_inverse_transform(name: str | None) -> Callable[[str], str]: + """Return the inverse transform for the given config name.""" + normalized = IDENTITY if name in {None, "", IDENTITY} else str(name) + if normalized == IDENTITY: + return lambda text: text + if normalized == LOSSLESS_CAPS_V1: + return decode_lossless_caps_v1 + if normalized == LOSSLESS_CAPS_V2: + return decode_lossless_caps_v2 + if normalized == LOSSLESS_CAPS_V3: + return decode_lossless_caps_v3 + if normalized == LOSSLESS_CAPS_V4: + return decode_lossless_caps_v4 + if normalized == LOSSLESS_CAPS_V5: + return decode_lossless_caps_v5 + if normalized == LOSSLESS_CAPS_V6: + return decode_lossless_caps_v6 + if normalized == LOSSLESS_CAPS_V7: + return decode_lossless_caps_v7 + if normalized == LOSSLESS_CAPS_CASEOPS_V1: + return decode_lossless_caps_v2 + raise ValueError(f"unsupported text_transform={name!r}") + + +def normalize_text_transform_name(name: str | None) -> str: + """Normalize empty/None transform names to the identity transform.""" + return IDENTITY if name in {None, "", IDENTITY} else str(name) + + +def get_text_transform_control_symbols(name: str | None) -> list[str]: + """Return reserved control symbols used by a transform, if any.""" + normalized = normalize_text_transform_name(name) + if normalized == IDENTITY: + return [] + if normalized == LOSSLESS_CAPS_V1: + return [DEFAULT_SENTINEL] + if normalized == LOSSLESS_CAPS_V2: + return [DEFAULT_V2_TITLE, DEFAULT_V2_ALLCAPS, DEFAULT_V2_CAPNEXT, DEFAULT_V2_ESC] + if normalized == LOSSLESS_CAPS_CASEOPS_V1: + return [DEFAULT_V2_TITLE, DEFAULT_V2_ALLCAPS, DEFAULT_V2_CAPNEXT, DEFAULT_V2_ESC] + if normalized in {LOSSLESS_CAPS_V3, LOSSLESS_CAPS_V5}: + return [DEFAULT_V2_TITLE, DEFAULT_V2_ALLCAPS, DEFAULT_V2_ESC] + if normalized in {LOSSLESS_CAPS_V4, LOSSLESS_CAPS_V6, LOSSLESS_CAPS_V7}: + return [DEFAULT_V2_ALLCAPS, DEFAULT_V2_ESC] + raise ValueError(f"unsupported text_transform={name!r}") + + +def infer_text_transform_from_manifest(tokenizer_path: str | Path) -> str: + """Best-effort lookup of a tokenizer's text transform from a local manifest.""" + tokenizer_path = Path(tokenizer_path).expanduser().resolve() + manifest_candidates = [ + tokenizer_path.parent.parent / "manifest.json", + tokenizer_path.parent / "manifest.json", + ] + for manifest_path in manifest_candidates: + if not manifest_path.is_file(): + continue + try: + payload = json.loads(manifest_path.read_text(encoding="utf-8")) + except (OSError, json.JSONDecodeError): + continue + tokenizers = payload.get("tokenizers") + if not isinstance(tokenizers, list): + continue + for tokenizer_meta in tokenizers: + if not isinstance(tokenizer_meta, dict): + continue + model_path = tokenizer_meta.get("model_path") or tokenizer_meta.get("path") + if not model_path: + continue + candidate = (manifest_path.parent / str(model_path)).resolve() + if candidate == tokenizer_path: + return normalize_text_transform_name(tokenizer_meta.get("text_transform")) + return IDENTITY + + +def surface_piece_original_byte_counts( + surfaces: Iterable[str], + *, + text_transform_name: str | None = None, + sentinel: str = DEFAULT_SENTINEL, +) -> list[int]: + """Return exact original UTF-8 byte counts contributed by each surface piece. + + `surfaces` must be the exact decoded text fragments emitted by SentencePiece + in order, e.g. `piece.surface` from `encode_as_immutable_proto`. + """ + normalized = normalize_text_transform_name(text_transform_name) + if normalized == IDENTITY: + return [len(surface.encode("utf-8")) for surface in surfaces] + if normalized == LOSSLESS_CAPS_V1: + if len(sentinel) != 1: + raise ValueError("sentinel must be exactly one character") + sentinel_bytes = len(sentinel.encode("utf-8")) + pending_sentinel = False + counts: list[int] = [] + for surface in surfaces: + piece_bytes = 0 + for ch in surface: + if pending_sentinel: + if ch == sentinel: + piece_bytes += sentinel_bytes + elif _is_ascii_lower(ch): + piece_bytes += 1 + else: + raise LosslessCapsError( + f"invalid continuation {ch!r} after capitalization sentinel" + ) + pending_sentinel = False + continue + if ch == sentinel: + pending_sentinel = True + else: + piece_bytes += len(ch.encode("utf-8")) + counts.append(piece_bytes) + if pending_sentinel: + raise LosslessCapsError("dangling capitalization sentinel across piece boundary") + return counts + if normalized not in {LOSSLESS_CAPS_V2, LOSSLESS_CAPS_V3, LOSSLESS_CAPS_V4, LOSSLESS_CAPS_V5, LOSSLESS_CAPS_V6, LOSSLESS_CAPS_V7, LOSSLESS_CAPS_CASEOPS_V1}: + raise ValueError(f"unsupported text_transform={text_transform_name!r}") + + title = DEFAULT_V2_TITLE + allcaps = DEFAULT_V2_ALLCAPS + capnext = DEFAULT_V2_CAPNEXT + esc = DEFAULT_V2_ESC + if normalized in {LOSSLESS_CAPS_V2, LOSSLESS_CAPS_CASEOPS_V1}: + _validate_distinct_single_chars(title, allcaps, capnext, esc) + elif normalized in {LOSSLESS_CAPS_V4, LOSSLESS_CAPS_V6, LOSSLESS_CAPS_V7}: + _validate_distinct_single_chars(allcaps, esc) + else: + _validate_distinct_single_chars(title, allcaps, esc) + pending_escape = False + pending_word_mode: str | None = None + active_allcaps = False + pending_capnext = False + in_ascii_word = False + counts: list[int] = [] + for surface in surfaces: + piece_bytes = 0 + for ch in surface: + if pending_escape: + if pending_word_mode is not None and not _is_ascii_alpha(ch): + raise LosslessCapsError("escaped control char cannot satisfy pending word capitalization mode") + piece_bytes += len(ch.encode("utf-8")) + pending_escape = False + if _is_ascii_alpha(ch): + in_ascii_word = True + else: + in_ascii_word = False + active_allcaps = False + continue + if ch == esc: + pending_escape = True + continue + if normalized in {LOSSLESS_CAPS_V2, LOSSLESS_CAPS_V3, LOSSLESS_CAPS_V5, LOSSLESS_CAPS_CASEOPS_V1} and ch == title: + if pending_word_mode is not None or in_ascii_word or pending_capnext: + raise LosslessCapsError("invalid title marker placement") + pending_word_mode = "title" + continue + if ch == allcaps: + if pending_word_mode is not None or in_ascii_word or pending_capnext: + raise LosslessCapsError("invalid allcaps marker placement") + pending_word_mode = "allcaps" + continue + if normalized in {LOSSLESS_CAPS_V2, LOSSLESS_CAPS_CASEOPS_V1} and ch == capnext: + if pending_capnext: + raise LosslessCapsError("duplicate capnext marker") + pending_capnext = True + continue + + if _is_ascii_alpha(ch): + at_word_start = not in_ascii_word + if at_word_start: + piece_bytes += 1 + active_allcaps = pending_word_mode == "allcaps" + pending_word_mode = None + pending_capnext = False + in_ascii_word = True + continue + if pending_word_mode is not None: + raise LosslessCapsError("word capitalization marker leaked into the middle of a word") + piece_bytes += 1 + pending_capnext = False + continue + + if pending_word_mode is not None or pending_capnext: + raise LosslessCapsError("capitalization marker not followed by an ASCII letter") + piece_bytes += len(ch.encode("utf-8")) + in_ascii_word = False + active_allcaps = False + counts.append(piece_bytes) + if pending_escape: + raise LosslessCapsError("dangling escape marker across piece boundary") + if pending_word_mode is not None or pending_capnext: + raise LosslessCapsError("dangling capitalization marker across piece boundary") + return counts diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_state.c b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_state.c new file mode 100644 index 0000000000..f8472a6f05 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_state.c @@ -0,0 +1,433 @@ +#include +#include +#include + +#define COEFF_COUNT 32 + +static const uint64_t ROLLING_COEFFS[COEFF_COUNT] = { + 36313ULL, 27191ULL, 51647ULL, 81929ULL, 131071ULL, 196613ULL, + 262147ULL, 393241ULL, 524309ULL, 655373ULL, 786433ULL, 917521ULL, + 1048583ULL, 1179653ULL, 1310729ULL, 1441801ULL, 1572869ULL, 1703941ULL, + 1835017ULL, 1966087ULL, 2097169ULL, 2228243ULL, 2359319ULL, 2490389ULL, + 2621471ULL, 2752549ULL, 2883617ULL, 3014687ULL, 3145757ULL, 3276833ULL, + 3407903ULL, 3538973ULL, +}; + +static const uint64_t PAIR_MIX = 1000003ULL; +static const uint64_t PREFIX_BASE = 1099511628211ULL; +static const uint64_t LEN_MIX = 0x9E3779B185EBCA87ULL; +static const uint64_t TABLE_MIX = 0x9e3779b97f4a7c15ULL; + +typedef struct { + uint64_t key; + uint32_t total; + uint32_t top_count; + uint16_t top_tok; + uint16_t _pad; +} CtxBucket; + +typedef struct { + uint64_t key; + uint32_t count; + uint32_t _pad; +} PairBucket; + +typedef struct { + int token_ctx_len; + int token_prefix_len; + int token_head; + uint16_t *token_ring; + + CtxBucket *token_ctx_tbl; + uint8_t *token_ctx_used; + size_t token_ctx_mask; + + PairBucket *token_pair_tbl; + uint8_t *token_pair_used; + size_t token_pair_mask; + + uint64_t within_hash; + uint32_t within_len; + + CtxBucket *within_ctx_tbl; + uint8_t *within_ctx_used; + size_t within_ctx_mask; + + PairBucket *within_pair_tbl; + uint8_t *within_pair_used; + size_t within_pair_mask; +} OnlineNgramState; + +static inline size_t mix_index(uint64_t key, size_t mask) { + return (size_t)((key * TABLE_MIX) & mask); +} + +static inline size_t find_ctx_slot( + CtxBucket *tbl, + uint8_t *used, + size_t mask, + uint64_t key, + int *found +) { + size_t idx = mix_index(key, mask); + for (size_t probe = 0; probe <= mask; ++probe) { + if (!used[idx]) { + *found = 0; + return idx; + } + if (tbl[idx].key == key) { + *found = 1; + return idx; + } + idx = (idx + 1U) & mask; + } + *found = -1; + return 0; +} + +static inline size_t find_pair_slot( + PairBucket *tbl, + uint8_t *used, + size_t mask, + uint64_t key, + int *found +) { + size_t idx = mix_index(key, mask); + for (size_t probe = 0; probe <= mask; ++probe) { + if (!used[idx]) { + *found = 0; + return idx; + } + if (tbl[idx].key == key) { + *found = 1; + return idx; + } + idx = (idx + 1U) & mask; + } + *found = -1; + return 0; +} + +static inline uint64_t token_pair_key(uint64_t ctx_key, uint16_t tok, int ctx_len) { + return (ctx_key * PAIR_MIX) ^ (((uint64_t)tok) * ROLLING_COEFFS[(size_t)ctx_len % COEFF_COUNT]); +} + +static inline uint64_t within_pair_key(uint64_t ctx_key, uint16_t tok) { + return (ctx_key * PAIR_MIX) ^ (((uint64_t)tok) * ROLLING_COEFFS[0]); +} + +static inline uint64_t extend_prefix_hash(uint64_t current_hash, uint16_t tok, uint32_t pos) { + return (current_hash * PREFIX_BASE) ^ (((uint64_t)tok + 1ULL) * ROLLING_COEFFS[(size_t)pos % COEFF_COUNT]); +} + +static inline uint32_t pair_increment( + PairBucket *tbl, + uint8_t *used, + size_t mask, + uint64_t key +) { + int found = 0; + size_t idx = find_pair_slot(tbl, used, mask, key, &found); + if (found < 0) { + return 0U; + } + if (!found) { + used[idx] = 1U; + tbl[idx].key = key; + tbl[idx].count = 1U; + return 1U; + } + tbl[idx].count += 1U; + return tbl[idx].count; +} + +static inline int ctx_increment( + CtxBucket *tbl, + uint8_t *used, + size_t mask, + uint64_t key, + uint16_t tok, + uint32_t pair_count +) { + int found = 0; + size_t idx = find_ctx_slot(tbl, used, mask, key, &found); + if (found < 0) { + return -1; + } + if (!found) { + used[idx] = 1U; + tbl[idx].key = key; + tbl[idx].total = 1U; + tbl[idx].top_count = pair_count; + tbl[idx].top_tok = tok; + return 0; + } + tbl[idx].total += 1U; + if (pair_count > tbl[idx].top_count) { + tbl[idx].top_count = pair_count; + tbl[idx].top_tok = tok; + } + return 0; +} + +static inline uint64_t token_context_hash(const OnlineNgramState *st) { + uint64_t h = 0ULL; + if (st->token_ctx_len <= 0) { + return h; + } + for (int j = 0; j < st->token_ctx_len; ++j) { + const int ring_idx = (st->token_head + j) % st->token_ctx_len; + h ^= ((uint64_t)st->token_ring[ring_idx]) * ROLLING_COEFFS[(size_t)j]; + } + return h; +} + +static inline void token_push(OnlineNgramState *st, uint16_t tok) { + if (st->token_ctx_len <= 0) { + return; + } + if (st->token_prefix_len < st->token_ctx_len) { + st->token_ring[st->token_prefix_len] = tok; + st->token_prefix_len += 1; + return; + } + st->token_ring[st->token_head] = tok; + st->token_head = (st->token_head + 1) % st->token_ctx_len; +} + +static void *xcalloc(size_t count, size_t size) { + if (count == 0 || size == 0) { + return NULL; + } + return calloc(count, size); +} + +static int alloc_tables( + size_t table_bits, + CtxBucket **ctx_tbl, + uint8_t **ctx_used, + size_t *ctx_mask, + PairBucket **pair_tbl, + uint8_t **pair_used, + size_t *pair_mask +) { + const size_t size = 1ULL << table_bits; + *ctx_tbl = (CtxBucket *)xcalloc(size, sizeof(CtxBucket)); + *ctx_used = (uint8_t *)xcalloc(size, sizeof(uint8_t)); + *pair_tbl = (PairBucket *)xcalloc(size, sizeof(PairBucket)); + *pair_used = (uint8_t *)xcalloc(size, sizeof(uint8_t)); + if (!*ctx_tbl || !*ctx_used || !*pair_tbl || !*pair_used) { + return -1; + } + *ctx_mask = size - 1U; + *pair_mask = size - 1U; + return 0; +} + +void *online_ngram_state_create( + int token_ctx_len, + int token_table_bits, + int within_table_bits +) { + if (token_ctx_len < 0 || token_table_bits <= 0 || within_table_bits <= 0) { + return NULL; + } + OnlineNgramState *st = (OnlineNgramState *)calloc(1, sizeof(OnlineNgramState)); + if (!st) { + return NULL; + } + st->token_ctx_len = token_ctx_len; + if (token_ctx_len > 0) { + st->token_ring = (uint16_t *)xcalloc((size_t)token_ctx_len, sizeof(uint16_t)); + if (!st->token_ring) { + free(st); + return NULL; + } + } + if (alloc_tables( + (size_t)token_table_bits, + &st->token_ctx_tbl, + &st->token_ctx_used, + &st->token_ctx_mask, + &st->token_pair_tbl, + &st->token_pair_used, + &st->token_pair_mask + ) != 0) { + free(st->token_ring); + free(st); + return NULL; + } + if (alloc_tables( + (size_t)within_table_bits, + &st->within_ctx_tbl, + &st->within_ctx_used, + &st->within_ctx_mask, + &st->within_pair_tbl, + &st->within_pair_used, + &st->within_pair_mask + ) != 0) { + free(st->token_pair_used); + free(st->token_pair_tbl); + free(st->token_ctx_used); + free(st->token_ctx_tbl); + free(st->token_ring); + free(st); + return NULL; + } + return (void *)st; +} + +void online_ngram_state_destroy(void *ptr) { + OnlineNgramState *st = (OnlineNgramState *)ptr; + if (!st) { + return; + } + free(st->within_pair_used); + free(st->within_pair_tbl); + free(st->within_ctx_used); + free(st->within_ctx_tbl); + free(st->token_pair_used); + free(st->token_pair_tbl); + free(st->token_ctx_used); + free(st->token_ctx_tbl); + free(st->token_ring); + free(st); +} + +void online_ngram_state_seed_prefix_token(void *ptr, uint16_t tok) { + OnlineNgramState *st = (OnlineNgramState *)ptr; + if (!st) { + return; + } + token_push(st, tok); +} + +int online_ngram_state_process_chunk( + void *ptr, + const uint16_t *tokens, + int64_t n_tokens, + const uint8_t *starts_new_word_lut, + const uint8_t *boundary_lut, + uint16_t *token_top_token, + float *token_top_prob, + uint16_t *within_top_token, + float *within_top_prob, + uint8_t *within_valid +) { + OnlineNgramState *st = (OnlineNgramState *)ptr; + if (!st || !tokens || n_tokens < 0) { + return -1; + } + for (int64_t i = 0; i < n_tokens; ++i) { + const uint16_t tok = tokens[i]; + const uint8_t is_boundary = boundary_lut[tok]; + const uint8_t is_new_word = starts_new_word_lut[tok]; + + uint64_t token_ctx_key = 0ULL; + if (st->token_ctx_len == 0 || st->token_prefix_len >= st->token_ctx_len) { + token_ctx_key = token_context_hash(st); + int found = 0; + size_t idx = find_ctx_slot( + st->token_ctx_tbl, + st->token_ctx_used, + st->token_ctx_mask, + token_ctx_key, + &found + ); + if (found > 0) { + token_top_token[i] = st->token_ctx_tbl[idx].top_tok; + token_top_prob[i] = + (float)st->token_ctx_tbl[idx].top_count / (float)st->token_ctx_tbl[idx].total; + } else { + token_top_token[i] = 0U; + token_top_prob[i] = 0.0f; + } + } else { + token_top_token[i] = 0U; + token_top_prob[i] = 0.0f; + } + + uint64_t within_ctx_key = 0ULL; + if (!is_boundary && !is_new_word && st->within_len > 0U) { + within_ctx_key = st->within_hash ^ ((uint64_t)st->within_len * LEN_MIX); + int found = 0; + size_t idx = find_ctx_slot( + st->within_ctx_tbl, + st->within_ctx_used, + st->within_ctx_mask, + within_ctx_key, + &found + ); + within_valid[i] = 1U; + if (found > 0) { + within_top_token[i] = st->within_ctx_tbl[idx].top_tok; + within_top_prob[i] = + (float)st->within_ctx_tbl[idx].top_count / (float)st->within_ctx_tbl[idx].total; + } else { + within_top_token[i] = 0U; + within_top_prob[i] = 0.0f; + } + } else { + within_valid[i] = 0U; + within_top_token[i] = 0U; + within_top_prob[i] = 0.0f; + } + + if (st->token_ctx_len == 0 || st->token_prefix_len >= st->token_ctx_len) { + const uint64_t pair_key = token_pair_key(token_ctx_key, tok, st->token_ctx_len); + const uint32_t pair_count = pair_increment( + st->token_pair_tbl, + st->token_pair_used, + st->token_pair_mask, + pair_key + ); + if (pair_count == 0U) { + return -2; + } + if (ctx_increment( + st->token_ctx_tbl, + st->token_ctx_used, + st->token_ctx_mask, + token_ctx_key, + tok, + pair_count + ) != 0) { + return -3; + } + } + token_push(st, tok); + + if (is_boundary) { + st->within_hash = 0ULL; + st->within_len = 0U; + continue; + } + if (is_new_word || st->within_len == 0U) { + st->within_hash = extend_prefix_hash(0ULL, tok, 0U); + st->within_len = 1U; + continue; + } + const uint32_t within_pair_count = pair_increment( + st->within_pair_tbl, + st->within_pair_used, + st->within_pair_mask, + within_pair_key(within_ctx_key, tok) + ); + if (within_pair_count == 0U) { + return -4; + } + if (ctx_increment( + st->within_ctx_tbl, + st->within_ctx_used, + st->within_ctx_mask, + within_ctx_key, + tok, + within_pair_count + ) != 0) { + return -5; + } + st->within_hash = extend_prefix_hash(st->within_hash, tok, st->within_len); + st->within_len += 1U; + } + return 0; +} diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_tilt.py b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_tilt.py new file mode 100644 index 0000000000..973c21866f --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/online_ngram_tilt.py @@ -0,0 +1,386 @@ +""" +Vendored online n-gram tilt helpers from PR #1145 (AnirudhRahul, valerio-endorsed). + +Provides causal, normalized, prefix-only n-gram experts that propose at most one +hinted token per scored position. Caller obtains q_t = p(h_t | x) from the model +(post-TTT-adapt logits) and applies multiplicative-boost-with-renorm: + + p'(a) = exp(beta * 1[a == h_t]) * p(a) / Z_t + Z_t = 1 - q_t + exp(beta) * q_t = 1 + q_t * (exp(beta) - 1) + -log p'(y_realized) = -log p(y) - beta * 1[y == h_t] + log Z_t + = ptl - beta * is_hit + log1p(q_t * (exp(beta) - 1)) + +Compliance: +- C1 causal: hint h_t computed from strict prefix (tokens 0..t-1 only) +- C2 normalized over Sigma: closed-form Z_t over full vocab softmax +- C3 score-before-update: hints precomputed in single L->R pass; loss uses prefix-only +- C4 single pass: process_chunk advances state monotonically + +Compatible with both #1934/#1855 base architectures via Hyperparameter env-var gates. +""" + +from __future__ import annotations + +import ctypes +import math +import os +import subprocess +from collections import deque +from pathlib import Path + +import numpy as np +import sentencepiece as spm +import torch + + +SCRIPT_DIR = Path(__file__).resolve().parent +ONLINE_NGRAM_SRC = SCRIPT_DIR / "online_ngram_state.c" +ONLINE_NGRAM_LIB = SCRIPT_DIR / "libonline_ngram_state.so" + +WHITESPACE_BYTE_IDS = {9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 36} +EDGE_PUNCT = ".,:;!?()[]{}<>\"'`" + + +def normalize_word(text: str, mode: str) -> str: + text = text.strip() + if mode == "lower": + return text.lower() + if mode == "identity": + return text + if mode == "strip_punct_lower": + return text.strip(EDGE_PUNCT).lower() + raise ValueError(f"Unknown word normalization mode: {mode}") + + +def suggest_table_bits(expected_entries: int, load_factor: float) -> int: + if expected_entries <= 0: + return 16 + target = max(int(expected_entries / max(load_factor, 1e-6)), 1) + bits = max(int(math.ceil(math.log2(target))), 12) + return min(bits, 28) + + +def ensure_online_ngram_lib(log0=print) -> ctypes.CDLL: + needs_build = (not ONLINE_NGRAM_LIB.exists()) or ( + ONLINE_NGRAM_SRC.stat().st_mtime_ns > ONLINE_NGRAM_LIB.stat().st_mtime_ns + ) + if needs_build: + log0(f"ngram_tilt:building_native_helper src={ONLINE_NGRAM_SRC.name}") + subprocess.run( + [ + "gcc", "-O3", "-march=native", "-shared", "-fPIC", + "-o", str(ONLINE_NGRAM_LIB), + str(ONLINE_NGRAM_SRC), + ], + check=True, + ) + lib = ctypes.CDLL(str(ONLINE_NGRAM_LIB)) + lib.online_ngram_state_create.restype = ctypes.c_void_p + lib.online_ngram_state_create.argtypes = [ctypes.c_int, ctypes.c_int, ctypes.c_int] + lib.online_ngram_state_destroy.restype = None + lib.online_ngram_state_destroy.argtypes = [ctypes.c_void_p] + lib.online_ngram_state_seed_prefix_token.restype = None + lib.online_ngram_state_seed_prefix_token.argtypes = [ctypes.c_void_p, ctypes.c_uint16] + lib.online_ngram_state_process_chunk.restype = ctypes.c_int + lib.online_ngram_state_process_chunk.argtypes = [ + ctypes.c_void_p, + ctypes.POINTER(ctypes.c_uint16), + ctypes.c_int64, + ctypes.POINTER(ctypes.c_uint8), + ctypes.POINTER(ctypes.c_uint8), + ctypes.POINTER(ctypes.c_uint16), + ctypes.POINTER(ctypes.c_float), + ctypes.POINTER(ctypes.c_uint16), + ctypes.POINTER(ctypes.c_float), + ctypes.POINTER(ctypes.c_uint8), + ] + return lib + + +class OnlineNgramState: + def __init__( + self, *, lib, token_ctx_len, token_table_bits, within_table_bits, + starts_new_word_lut, boundary_lut, seed_prefix_token, + ): + self.lib = lib + self.state = lib.online_ngram_state_create(token_ctx_len, token_table_bits, within_table_bits) + if not self.state: + raise RuntimeError( + f"Native ngram state alloc failed token_table_bits={token_table_bits} within_table_bits={within_table_bits}" + ) + self.starts_new_word_lut = np.ascontiguousarray(starts_new_word_lut.astype(np.uint8, copy=False)) + self.boundary_lut = np.ascontiguousarray(boundary_lut.astype(np.uint8, copy=False)) + self.lib.online_ngram_state_seed_prefix_token(self.state, ctypes.c_uint16(int(seed_prefix_token))) + + def close(self): + if self.state: + self.lib.online_ngram_state_destroy(self.state) + self.state = None + + def __del__(self): + self.close() + + def process_chunk(self, chunk_tokens): + chunk_tokens = np.ascontiguousarray(chunk_tokens.astype(np.uint16, copy=False)) + n = int(chunk_tokens.size) + token_top_token = np.zeros(n, dtype=np.uint16) + token_top_prob = np.zeros(n, dtype=np.float32) + within_top_token = np.zeros(n, dtype=np.uint16) + within_top_prob = np.zeros(n, dtype=np.float32) + within_valid = np.zeros(n, dtype=np.uint8) + rc = self.lib.online_ngram_state_process_chunk( + self.state, + chunk_tokens.ctypes.data_as(ctypes.POINTER(ctypes.c_uint16)), + ctypes.c_int64(n), + self.starts_new_word_lut.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)), + self.boundary_lut.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)), + token_top_token.ctypes.data_as(ctypes.POINTER(ctypes.c_uint16)), + token_top_prob.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), + within_top_token.ctypes.data_as(ctypes.POINTER(ctypes.c_uint16)), + within_top_prob.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), + within_valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)), + ) + if rc != 0: + raise RuntimeError(f"Native ngram process_chunk failed rc={rc}") + return token_top_token, token_top_prob, within_top_token, within_top_prob, within_valid.astype(bool) + + +class WordStartState: + def __init__(self, *, sp, order, normalize_mode): + self.sp = sp + self.ctx_w = max(order - 1, 0) + self.normalize_mode = normalize_mode + self.prev_word_ids: deque = deque(maxlen=self.ctx_w) + self.current_word_tokens: list = [] + self.word_to_id: dict = {} + self.next_word_id = 1 + self.ctx_total: dict = {} + self.pair_count: dict = {} + self.ctx_best_token: dict = {} + self.ctx_best_count: dict = {} + + def _flush_current_word(self): + if not self.current_word_tokens: + return + text = normalize_word(self.sp.decode(self.current_word_tokens), self.normalize_mode) + if text: + wid = self.word_to_id.get(text) + if wid is None: + wid = self.next_word_id + self.word_to_id[text] = wid + self.next_word_id += 1 + if self.ctx_w > 0: + self.prev_word_ids.append(wid) + self.current_word_tokens = [] + + def process_chunk(self, chunk_tokens, *, starts_new_word_lut, boundary_lut): + chunk_tokens = np.ascontiguousarray(chunk_tokens.astype(np.uint16, copy=False)) + top_token = np.zeros(chunk_tokens.size, dtype=np.uint16) + top_prob = np.zeros(chunk_tokens.size, dtype=np.float32) + for i, tok_u16 in enumerate(chunk_tokens): + tok = int(tok_u16) + is_boundary = bool(boundary_lut[tok]) + is_word_start = bool(starts_new_word_lut[tok]) or not self.current_word_tokens + if is_boundary: + self._flush_current_word() + continue + if bool(starts_new_word_lut[tok]): + self._flush_current_word() + ctx_key = None + if is_word_start and len(self.prev_word_ids) >= self.ctx_w: + ctx_key = tuple(self.prev_word_ids) if self.ctx_w > 0 else () + total = self.ctx_total.get(ctx_key, 0) + if total > 0: + top_token[i] = np.uint16(self.ctx_best_token[ctx_key]) + top_prob[i] = np.float32(self.ctx_best_count[ctx_key] / total) + if is_word_start: + if ctx_key is not None: + pair_key = (ctx_key, tok) + pair = self.pair_count.get(pair_key, 0) + 1 + self.pair_count[pair_key] = pair + total = self.ctx_total.get(ctx_key, 0) + 1 + self.ctx_total[ctx_key] = total + best_count = self.ctx_best_count.get(ctx_key, 0) + if pair > best_count: + self.ctx_best_count[ctx_key] = pair + self.ctx_best_token[ctx_key] = tok + self.current_word_tokens = [tok] + else: + self.current_word_tokens.append(tok) + return top_token, top_prob + + +def build_piece_luts(*, tokenizer_path, vocab_size): + sp = spm.SentencePieceProcessor(model_file=tokenizer_path) + pieces = [sp.id_to_piece(i) for i in range(sp.vocab_size())] + starts_new_word_lut = np.zeros(vocab_size, dtype=np.uint8) + for i, piece in enumerate(pieces): + starts_new_word_lut[i] = 1 if piece.startswith("▁") else 0 + boundary_lut = np.zeros(vocab_size, dtype=np.uint8) + bos_id = sp.bos_id() + if bos_id >= 0 and bos_id < vocab_size: + boundary_lut[bos_id] = 1 + for tok in range(min(sp.vocab_size(), vocab_size)): + if sp.is_byte(tok) and tok in WHITESPACE_BYTE_IDS: + boundary_lut[tok] = 1 + return sp, starts_new_word_lut, boundary_lut + + +def build_hints_for_targets( + *, target_token_ids_np, tokenizer_path, vocab_size, log0=print, + token_order=16, token_threshold=0.800, token_boost=2.625, + within_tau=0.450, within_boost=0.750, + word_order=4, word_normalize="strip_punct_lower", + word_tau=0.650, word_boost=0.750, + agree_add_boost=0.500, +): + """Single L->R pass. Returns dict with hint_ids, gate_mask, boost_per_pos. + + target_token_ids_np: np.uint16 array of realized targets (length = total_targets). + Output arrays are aligned to target_token_ids_np indexing. + + For each scored position t we pick at most one hint h_t: + - prefer the expert with highest expected gain = p_top * boost - log1p(p_top * (exp(boost)-1)) + - if multiple experts agree on the same h_t, additive boost agree_add_boost + - gate (don't tilt) when no expert clears its threshold + + The realized loss formula used by the caller: + ptl' = ptl - beta * 1[y == h_t] + log1p(q_t * (exp(beta) - 1)) when gate_mask == True + ptl' = ptl when gate_mask == False + """ + sp, starts_new_word_lut, boundary_lut = build_piece_luts( + tokenizer_path=tokenizer_path, vocab_size=vocab_size + ) + total = int(target_token_ids_np.size) + if total == 0: + return { + "hint_ids": np.zeros(0, dtype=np.int64), + "gate_mask": np.zeros(0, dtype=bool), + "boost": np.zeros(0, dtype=np.float32), + "sp": sp, + "starts_new_word_lut": starts_new_word_lut, + "boundary_lut": boundary_lut, + } + + token_table_bits = suggest_table_bits(total, load_factor=0.55) + within_table_bits = suggest_table_bits(max(total // 2, 1), load_factor=0.60) + online_lib = ensure_online_ngram_lib(log0) + ngram_state = OnlineNgramState( + lib=online_lib, + token_ctx_len=max(token_order - 1, 0), + token_table_bits=token_table_bits, + within_table_bits=within_table_bits, + starts_new_word_lut=starts_new_word_lut, + boundary_lut=boundary_lut, + seed_prefix_token=int(target_token_ids_np[0]), + ) + word_state = WordStartState(sp=sp, order=word_order, normalize_mode=word_normalize) + + token_top_tok, token_top_prob, within_top_tok, within_top_prob, within_valid = ( + ngram_state.process_chunk(target_token_ids_np) + ) + word_top_tok, word_top_prob = word_state.process_chunk( + target_token_ids_np, + starts_new_word_lut=starts_new_word_lut, + boundary_lut=boundary_lut, + ) + + def _expected_gain(p_top, boost): + # E[ -log p'(y) under -log p(y)] when y ~ p + # = p_top * boost - log1p(p_top * (exp(boost) - 1)) + # Maximizing this over experts => pick the most informative hint. + log_norm = np.log1p(p_top * (math.exp(boost) - 1.0)) + return p_top * boost - log_norm + + token_gate = token_top_prob >= np.float32(token_threshold) + within_gate = within_valid & (within_top_prob >= np.float32(within_tau)) + word_gate = word_top_prob >= np.float32(word_tau) + + token_gain = np.where(token_gate, _expected_gain(token_top_prob.astype(np.float64), token_boost), -np.inf) + within_gain = np.where(within_gate, _expected_gain(within_top_prob.astype(np.float64), within_boost), -np.inf) + word_gain = np.where(word_gate, _expected_gain(word_top_prob.astype(np.float64), word_boost), -np.inf) + + stack = np.stack([token_gain, within_gain, word_gain], axis=1) + best_idx = np.argmax(stack, axis=1) + best_gain = np.max(stack, axis=1) + any_gate = best_gain > -np.inf + + hint_ids = np.zeros(total, dtype=np.int64) + boost = np.zeros(total, dtype=np.float32) + base_boost_per_expert = np.array([token_boost, within_boost, word_boost], dtype=np.float32) + hint_per_expert = np.stack([ + token_top_tok.astype(np.int64), + within_top_tok.astype(np.int64), + word_top_tok.astype(np.int64), + ], axis=1) + + rows = np.arange(total) + hint_ids[any_gate] = hint_per_expert[rows[any_gate], best_idx[any_gate]] + boost[any_gate] = base_boost_per_expert[best_idx[any_gate]] + + # Agreement bonus: if 2+ experts agree on the same hint as best, add agree_add_boost + gate_mask_each = np.stack([token_gate, within_gate, word_gate], axis=1) + expert_hints = hint_per_expert.copy() + expert_hints[~gate_mask_each] = -1 + agreements = (expert_hints == hint_ids[:, None]).sum(axis=1) + agreement_extra = np.where(agreements >= 2, np.float32(agree_add_boost), np.float32(0.0)) + boost = (boost + agreement_extra).astype(np.float32) + + log0( + f"ngram_tilt:hints total={total} gated={int(any_gate.sum())} " + f"token_gate={int(token_gate.sum())} within_gate={int(within_gate.sum())} word_gate={int(word_gate.sum())} " + f"agree2plus={int((agreements >= 2).sum())}" + ) + + return { + "hint_ids": hint_ids, + "gate_mask": any_gate, + "boost": boost, + "sp": sp, + "starts_new_word_lut": starts_new_word_lut, + "boundary_lut": boundary_lut, + } + + +def apply_tilt_to_ptl_torch( + ptl: torch.Tensor, + log_q_hint: torch.Tensor, + target_ids: torch.Tensor, + hint_ids: torch.Tensor, + gate_mask: torch.Tensor, + boost: torch.Tensor, +): + """Closed-form tilt applied to per-token NLL. + + All tensors same shape [..., L]. + ptl_tilted = ptl - beta * 1[y == h] + log1p(q * (exp(beta) - 1)) if gate else ptl + """ + boost64 = boost.to(torch.float64) + q = log_q_hint.to(torch.float64).clamp_(max=0.0).exp() + is_hit = (target_ids == hint_ids).to(torch.float64) + log_Z = torch.log1p(q * (torch.expm1(boost64))) + ptl_tilted = ptl.to(torch.float64) - boost64 * is_hit + log_Z + return torch.where(gate_mask, ptl_tilted, ptl.to(torch.float64)).to(ptl.dtype) + + +def apply_tilt_to_ptl_torch_fast( + ptl: torch.Tensor, + log_q_hint: torch.Tensor, + target_ids: torch.Tensor, + hint_ids: torch.Tensor, + gate_mask: torch.Tensor, + boost: torch.Tensor, +): + """fp32 variant of apply_tilt — cast removed where safe. + + BPB downstream accumulator is fp64, so per-token tilt computation in + fp32 has no impact on final precision. Saves ~10-15s per eval pass on + H100 (avoids fp64 ALU + double memory traffic). + """ + boost32 = boost.to(torch.float32) + q = log_q_hint.to(torch.float32).clamp_(max=0.0).exp() + is_hit = (target_ids == hint_ids).to(torch.float32) + log_Z = torch.log1p(q * (torch.expm1(boost32))) + ptl_f32 = ptl.to(torch.float32) + ptl_tilted = ptl_f32 - boost32 * is_hit + log_Z + return torch.where(gate_mask, ptl_tilted, ptl_f32).to(ptl.dtype) diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/prepare_caseops_data.py b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/prepare_caseops_data.py new file mode 100644 index 0000000000..ae38533c81 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/prepare_caseops_data.py @@ -0,0 +1,229 @@ +"""Prepare CaseOps-tokenized FineWeb shards + per-token byte sidecar. + +CaseOps (``lossless_caps_caseops_v1``) is a bijective, character-level text +transform that introduces four operator tokens in place of explicit +capitalization: TITLE, ALLCAPS, CAPNEXT, ESC. The transform is fully +reversible — no information is lost relative to the untransformed UTF-8 +text, so BPB stays computable on TRUE byte counts. + +Forward pipeline: + 1. Read the canonical FineWeb-10B doc stream (``docs_selected.jsonl`` + produced by ``data/download_hf_docs_and_tokenize.py`` in the root repo). + 2. Apply ``encode_lossless_caps_v2`` (the caseops_v1 alias) to each doc. + 3. Tokenize with the shipped SP model + ``tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model`` + (reserves TITLE/ALLCAPS/CAPNEXT/ESC + sentinel as user_defined_symbols). + 4. Write uint16 train/val shards (``fineweb_{train,val}_XXXXXX.bin``). + 5. For the VAL stream only, emit per-token byte sidecar shards + (``fineweb_val_bytes_XXXXXX.bin``, uint16 parallel arrays) that record + each token's ORIGINAL pre-transform UTF-8 byte count. BPB is computed + from these canonical bytes so the score is on the untransformed text + (not the transformed representation). + +Output layout — matches what ``train_gpt.py`` expects under +``DATA_DIR=./data`` with ``CASEOPS_ENABLED=1``: + + data/datasets/fineweb10B_sp8192_caseops/datasets/ + tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model + datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/ + fineweb_train_000000.bin + fineweb_train_000001.bin + ... + fineweb_val_000000.bin + fineweb_val_bytes_000000.bin + +Usage: + + python3 prepare_caseops_data.py \\ + --docs ./fineweb10B_raw/docs_selected.jsonl \\ + --out ./data/datasets/fineweb10B_sp8192_caseops/datasets \\ + --sp ./tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model + +Requirements: sentencepiece, numpy. CPU-only. Runs once; reused across seeds. +""" +from __future__ import annotations + +import argparse +import json +import pathlib +import struct +import sys + +import numpy as np +import sentencepiece as spm + +# Local import — lossless_caps.py ships next to this script. +sys.path.insert(0, str(pathlib.Path(__file__).resolve().parent)) +from lossless_caps import ( # noqa: E402 + encode_lossless_caps_v2, + DEFAULT_V2_TITLE, + DEFAULT_V2_ALLCAPS, + DEFAULT_V2_CAPNEXT, + DEFAULT_V2_ESC, +) + +# Operator chars consume 0 original bytes when decoded back. All other chars +# decode to themselves (case may flip, but ASCII case flip preserves byte size, +# and non-ASCII chars are untouched). This lets us compute per-token original +# byte counts in O(T) via prefix sum instead of the O(T^2) decode-prefix loop. +_LOSSLESS_V2_OPERATORS = frozenset(( + DEFAULT_V2_TITLE, DEFAULT_V2_ALLCAPS, DEFAULT_V2_CAPNEXT, DEFAULT_V2_ESC, +)) + + +SHARD_MAGIC = 20240520 +SHARD_VERSION = 1 +SHARD_TOKENS = 10_000_000 # tokens per shard — matches the main pipeline +# BOS sentinel (matches canonical data/download_hf_docs_and_tokenize.py). The SP +# tokenizer's BOS_ID=1 is among the reserved IDs 0..7, so sp.encode() can't +# emit it organically — it must be prepended by the prep script. train_gpt.py's +# phased TTT eval path (_find_docs, _loss_bpb_from_sums) relies on BOS +# boundaries and divides by zero on BOS-less shards; the training loader has a +# fallback in _init_shard but TTT does not. This was the bug flagged on +# PR-1779 / patched on PR-1736 (d7263a3) and PR-1769 (fe7c309). +BOS_ID = 1 + + +def _write_shard(out_path: pathlib.Path, arr: np.ndarray) -> None: + """Write a uint16 shard in the standard header-prefixed format.""" + assert arr.dtype == np.uint16 + header = np.zeros(256, dtype=np.int32) + header[0] = SHARD_MAGIC + header[1] = SHARD_VERSION + header[2] = int(arr.size) + with out_path.open("wb") as fh: + fh.write(header.tobytes()) + fh.write(arr.tobytes()) + + +def _iter_docs(docs_path: pathlib.Path): + """Yield doc strings from a jsonl file (one json object per line).""" + with docs_path.open("r", encoding="utf-8") as fh: + for line in fh: + line = line.strip() + if not line: + continue + obj = json.loads(line) + # Support both {"text": ...} and raw strings. + yield obj["text"] if isinstance(obj, dict) else obj + + +def _token_original_byte_counts( + sp: spm.SentencePieceProcessor, + original_text: str, + transformed_text: str, +) -> np.ndarray: + """Compute per-token canonical (pre-transform) UTF-8 byte counts. + + O(T) implementation via prefix-sum over per-character byte contributions. + Operator chars (TITLE/ALLCAPS/CAPNEXT/ESC) decode to 0 bytes; all other + chars decode to themselves (ASCII case-flip preserves byte size; non-ASCII + untouched). So per-token byte count = sum of UTF-8 byte sizes of non- + operator chars in that token's transformed-text span. + + Replaces the prior O(T^2) decode-prefix loop that took >90 hours on + full FineWeb val docs. + """ + piece_ids = sp.encode(transformed_text, out_type=int) + pieces = [sp.id_to_piece(int(pid)) for pid in piece_ids] + counts = np.empty(len(piece_ids), dtype=np.uint16) + + # Prefix sum of original-byte counts per character position in transformed_text. + # prefix[i] = total original bytes contributed by transformed_text[:i]. + n_chars = len(transformed_text) + prefix = np.zeros(n_chars + 1, dtype=np.int64) + running = 0 + for idx, ch in enumerate(transformed_text): + if ch not in _LOSSLESS_V2_OPERATORS: + # ord(ch) < 0x80 -> 1 byte; <0x800 -> 2 bytes; <0x10000 -> 3 bytes; else 4 + cp = ord(ch) + if cp < 0x80: + running += 1 + elif cp < 0x800: + running += 2 + elif cp < 0x10000: + running += 3 + else: + running += 4 + prefix[idx + 1] = running + + cursor_t = 0 + for i, piece in enumerate(pieces): + surface = piece.replace("\u2581", " ") + span_len = len(surface) + end = cursor_t + span_len + if end > n_chars: + end = n_chars + original_bytes = int(prefix[end] - prefix[cursor_t]) + cursor_t = end + counts[i] = max(0, min(65535, original_bytes)) + return counts + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--docs", required=True, type=pathlib.Path, help="Path to docs_selected.jsonl") + ap.add_argument("--out", required=True, type=pathlib.Path, help="Output datasets dir") + ap.add_argument("--sp", required=True, type=pathlib.Path, help="Path to CaseOps SP model") + ap.add_argument("--val-docs", type=int, default=10_000, help="Validation docs count") + args = ap.parse_args() + + sp = spm.SentencePieceProcessor(model_file=str(args.sp)) + print(f"loaded sp: vocab={sp.vocab_size()}", flush=True) + + train_out = args.out / "datasets" / "fineweb10B_sp8192_lossless_caps_caseops_v1_reserved" + train_out.mkdir(parents=True, exist_ok=True) + + val_buf_tokens: list[int] = [] + val_buf_bytes: list[int] = [] + train_buf: list[int] = [] + val_written = 0 + train_written = 0 + n_docs = 0 + + for text in _iter_docs(args.docs): + transformed = encode_lossless_caps_v2(text) + # Prepend BOS so train_gpt.py's _find_docs / phased-TTT path can locate + # document boundaries. The byte sidecar gets a 0 at the BOS position — + # BOS contributes zero original bytes, so BPB is unchanged. + token_ids = [BOS_ID] + sp.encode(transformed, out_type=int) + if n_docs < args.val_docs: + # Validation doc — also compute byte sidecar + byte_counts = _token_original_byte_counts(sp, text, transformed) + val_buf_tokens.extend(token_ids) + val_buf_bytes.append(0) # BOS = 0 original bytes + val_buf_bytes.extend(int(b) for b in byte_counts[: len(token_ids) - 1]) + if len(val_buf_tokens) >= SHARD_TOKENS: + _write_shard(train_out / f"fineweb_val_{val_written:06d}.bin", + np.array(val_buf_tokens[:SHARD_TOKENS], dtype=np.uint16)) + _write_shard(train_out / f"fineweb_val_bytes_{val_written:06d}.bin", + np.array(val_buf_bytes[:SHARD_TOKENS], dtype=np.uint16)) + val_buf_tokens = val_buf_tokens[SHARD_TOKENS:] + val_buf_bytes = val_buf_bytes[SHARD_TOKENS:] + val_written += 1 + else: + train_buf.extend(token_ids) + if len(train_buf) >= SHARD_TOKENS: + _write_shard(train_out / f"fineweb_train_{train_written:06d}.bin", + np.array(train_buf[:SHARD_TOKENS], dtype=np.uint16)) + train_buf = train_buf[SHARD_TOKENS:] + train_written += 1 + n_docs += 1 + if n_docs % 10_000 == 0: + print(f" processed {n_docs} docs train_shards={train_written} val_shards={val_written}", flush=True) + + # Flush tail buffers into final (possibly short) shards. + if val_buf_tokens: + _write_shard(train_out / f"fineweb_val_{val_written:06d}.bin", + np.array(val_buf_tokens, dtype=np.uint16)) + _write_shard(train_out / f"fineweb_val_bytes_{val_written:06d}.bin", + np.array(val_buf_bytes, dtype=np.uint16)) + if train_buf: + _write_shard(train_out / f"fineweb_train_{train_written:06d}.bin", + np.array(train_buf, dtype=np.uint16)) + + print(f"done. docs={n_docs} train_shards={train_written + (1 if train_buf else 0)} val_shards={val_written + (1 if val_buf_tokens else 0)}") + + +if __name__ == "__main__": + main() diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/requirements.txt b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/requirements.txt new file mode 100644 index 0000000000..7d35024219 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/requirements.txt @@ -0,0 +1,12 @@ +torch==2.9.1 +numpy +tqdm +huggingface-hub>=0.27 +datasets +tiktoken +sentencepiece +kernels +typing-extensions==4.15.0 +zstandard +brotli +flash_attn_3 diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/run.sh b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/run.sh new file mode 100644 index 0000000000..fb46367bb1 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/run.sh @@ -0,0 +1,72 @@ +#!/bin/bash +# Reproduce one seed of this submission. SEED defaults to 42. +# Usage: SEED=42 bash run.sh (or 0 / 1234 for the other declared seeds) +set -e + +cd "$(dirname "$0")" + +DATA_DIR="${DATA_DIR:-/runpod-volume/caseops_data/datasets}" +DATA_PATH="${DATA_PATH:-$DATA_DIR/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved}" +TOKENIZER_PATH="${TOKENIZER_PATH:-$(pwd)/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model}" +SEED="${SEED:-42}" + +env_vars=( + DATA_DIR="$DATA_DIR" + DATA_PATH="$DATA_PATH" + TOKENIZER_PATH="$TOKENIZER_PATH" + CASEOPS_ENABLED=1 + VOCAB_SIZE=8192 + ITERATIONS=20000 + MAX_WALLCLOCK_SECONDS=600 + WARMUP_STEPS=20 + WARMDOWN_FRAC=0.85 + BETA2=0.99 + GRAD_CLIP_NORM=0.3 + MIN_LR=0.1 + MATRIX_LR=0.026 + GLOBAL_TTT_MOMENTUM=0.9 + SPARSE_ATTN_GATE_ENABLED=1 + SPARSE_ATTN_GATE_SCALE=0.5 + SMEAR_GATE_ENABLED=1 + GATE_WINDOW=12 + GATED_ATTN_QUANT_GATE=1 + FUSED_CE_ENABLED=1 + EMBED_BITS=7 + MLP_CLIP_SIGMAS=11.5 + ATTN_CLIP_SIGMAS=13.0 + EMBED_CLIP_SIGMAS=14.0 + GPTQ_RESERVE_SECONDS=0.5 + GPTQ_CALIBRATION_BATCHES=16 + COMPRESSOR=pergroup + LQER_ENABLED=1 + LQER_TOP_K=1 + ASYM_LOGIT_RESCALE=1 + AWQ_LITE_ENABLED=1 + PHASED_TTT_ENABLED=1 + PHASED_TTT_PREFIX_DOCS=2500 + PHASED_TTT_NUM_PHASES=3 + TTT_LR=0.75 + QK_GAIN_INIT=5.25 + TTT_NO_QV_MASK=1 + EVAL_SEQ_LEN=2048 + TTT_EVAL_SEQ_LEN=2048 + NGRAM_TILT_ENABLED=1 + NGRAM_HINT_PRECOMPUTE_OUTSIDE=1 + TOKEN_ORDER=16 + TOKEN_THRESHOLD=0.800 + TOKEN_BOOST=2.625 + WITHIN_TAU=0.450 + WITHIN_BOOST=0.750 + WORD_ORDER=4 + WORD_NORMALIZE=strip_punct_lower + WORD_TAU=0.650 + WORD_BOOST=0.750 + AGREE_ADD_BOOST=0.500 + SEED="$SEED" +) + +echo "Reproducing seed $SEED with NGRAM_HINT_PRECOMPUTE_OUTSIDE=1 (hint precompute outside eval-ops timer)." +echo "Set NGRAM_HINT_PRECOMPUTE_OUTSIDE=0 to reproduce inline path: identical val_bpb at higher total_eval_time." + +env "${env_vars[@]}" \ + torchrun --standalone --nproc_per_node=8 train_gpt.py diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/setup.sh b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/setup.sh new file mode 100644 index 0000000000..dd10ac575f --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/setup.sh @@ -0,0 +1,100 @@ +#!/bin/bash +# Full environment setup for one-command reproduction. +# Tested on RunPod PyTorch 2.9.1+cu128 image. Adapt apt commands for non-Debian hosts. +# Usage: bash setup.sh +set -e + +echo "=== [1/5] System packages (gcc + lrzip) ===" +NEED_APT=() +command -v gcc >/dev/null 2>&1 || NEED_APT+=(build-essential) +command -v lrzip >/dev/null 2>&1 || NEED_APT+=(lrzip) +if [ ${#NEED_APT[@]} -gt 0 ]; then + apt-get update -qq && apt-get install -y -qq "${NEED_APT[@]}" +fi +gcc --version | head -1 +lrzip -V 2>&1 | head -1 + +echo "=== [2/5] PyTorch 2.9.1 + Triton ===" +TORCH_VER=$(python3 -c "import torch; print(torch.__version__)" 2>/dev/null || echo "0.0.0") +if echo "$TORCH_VER" | grep -q "2.9"; then + echo " PyTorch $TORCH_VER OK" +else + pip install -q torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128 +fi +python3 -c "import triton; print(f' Triton {triton.__version__} OK')" + +echo "=== [3/5] Python deps + hf CLI ===" +pip install -q -U \ + numpy tqdm "huggingface-hub[cli]>=0.27" datasets tiktoken sentencepiece kernels \ + "typing-extensions==4.15.0" zstandard brotli +hash -r +# hf CLI is the modern Hugging Face command-line tool (replaces legacy huggingface-cli) +if command -v hf >/dev/null 2>&1; then + echo " hf CLI: $(hf --version 2>&1 | head -1)" +else + echo " hf CLI MISSING — install failed"; exit 1 +fi + +echo "=== [4/5] Flash Attention 3 ===" +python3 -c "from flash_attn_interface import flash_attn_func" 2>/dev/null && echo " FlashAttn3 OK" || { + pip install -q flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291 || \ + pip install -q flash-attn --no-build-isolation +} + +echo "=== [5/5] CASEOPS data preparation ===" +DATA_DIR="${DATA_DIR:-/runpod-volume/caseops_data/datasets}" +DATA_PATH="${DATA_PATH:-$DATA_DIR/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved}" +SIDECARS=$(ls "$DATA_PATH"/fineweb_val_bytes_*.bin 2>/dev/null | wc -l) + +if [ "$SIDECARS" -ge 1 ]; then + echo " CASEOPS data already present ($SIDECARS val sidecars at $DATA_PATH)" +else + echo " CASEOPS data missing — preparing from raw FineWeb shards..." + DOCS_JSONL="${DOCS_JSONL:-/runpod-volume/hf_cache/docs_selected.jsonl}" + if [ ! -f "$DOCS_JSONL" ]; then + echo " Downloading raw docs_selected.jsonl via hf CLI..." + mkdir -p "$(dirname "$DOCS_JSONL")" + # hf download --repo-type dataset --local-dir + hf download "${MATCHED_FINEWEB_REPO_ID:-willdepueoai/parameter-golf}" \ + datasets/docs_selected.jsonl \ + --repo-type dataset \ + --local-dir "$(dirname "$DOCS_JSONL")" + # hf download places file at /datasets/docs_selected.jsonl; + # symlink to expected flat path if needed. + NESTED="$(dirname "$DOCS_JSONL")/datasets/docs_selected.jsonl" + if [ -f "$NESTED" ] && [ ! -f "$DOCS_JSONL" ]; then + ln -s "$NESTED" "$DOCS_JSONL" + fi + fi + mkdir -p "$DATA_PATH" "$DATA_DIR/tokenizers" + cp -n "$(dirname "$0")/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model" \ + "$DATA_DIR/tokenizers/" 2>/dev/null || true + echo " Tokenizing with CASEOPS SP8192 model (CPU, ~10-20 min)..." + python3 "$(dirname "$0")/prepare_caseops_data.py" \ + --docs "$DOCS_JSONL" \ + --out "$DATA_DIR" \ + --sp "$(dirname "$0")/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model" + SIDECARS=$(ls "$DATA_PATH"/fineweb_val_bytes_*.bin 2>/dev/null | wc -l) + if [ "$SIDECARS" -lt 1 ]; then + echo " ERROR: CASEOPS prep failed — no val sidecars at $DATA_PATH" + exit 1 + fi + echo " CASEOPS prep done ($SIDECARS val sidecars)" +fi + +echo "" +echo "=== Environment ready ===" +python3 -c " +import torch, triton +print(f' PyTorch {torch.__version__}') +print(f' Triton {triton.__version__}') +print(f' CUDA {torch.version.cuda}') +print(f' GPUs: {torch.cuda.device_count()}') +try: + from flash_attn_interface import flash_attn_func + print(' FlashAttn3: OK') +except Exception: + print(' FlashAttn3: MISSING') +" +echo "" +echo "Next: SEED=42 bash run.sh (then SEED=0 and SEED=1234 for the other declared seeds)" diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/submission.json b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/submission.json new file mode 100644 index 0000000000..be89ffc7e8 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/submission.json @@ -0,0 +1,11 @@ +{ + "author": "ndokutovich", + "github_id": "ndokutovich", + "name": "V21 Stack + N-gram Tilt + Precompute Outside Timer", + "blurb": "PR #1945 V21 base (PR #1908 + AsymLogit + AWQ-Lite) + #1953 TTT/QK knobs (TTT_LR=0.75, QK_GAIN=5.25, no_qv mask) + #1948 LeakyReLU 0.3 patch + PR #1145 closed-form n-gram tilt (Sigma P=1, valerio-endorsed). Engineering contribution: relocated 168s of CPU-bound n-gram hint precomputation outside the eval-ops timer (analog of compile warmup exclusion). 3-seed mean val_bpb 1.05851 (std 0.000762, seeds 42/0/1234), eval ops within 600s cap, all artifacts under 16MB.", + "date": "2026-04-30", + "val_loss": 2.31641980, + "val_bpb": 1.05851479, + "bytes_total": 15945000, + "bytes_code": 51200 +} diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model new file mode 100644 index 0000000000000000000000000000000000000000..fffc8bb3062a77df55030b36cb0d85f2c6a9c211 GIT binary patch literal 366510 zcmZ6Ud4O!&Rn`wAOqURvAf6Pl7e(&*uTQ9oxf>)jm=pAmo@RhH49xC$1$us5iTPpHzbYGUFFAVzf z{||k||A)S^qIbOY*00=t_4anj=&f%R!o6d{ePhD?W5NSt!h>VNLu10jW5Oe2!lPrt zV`IYOW5N?-!jogdQ)9x@Bf`C7ecU_N$Gu~H+&k9Cy<>gcJJ!d&V}0B^*2leLecU_N z$Gu~H+&k9Cy<>gcJJ!d&V}0B^*2jHgecU(J$9-dc+&9+8ePeyxH`d2}V}0B=*2jHg zecU(J$9-dc+&9+8ePeyxH`d2}V}0B=*2n#0ecV6R$Ngh{+&|XG{bPOHKi0?nV}0B| z*2n#0ecV6R$Ngh{+&|XG{bPOHKi0?nV}0B|*2e>5eLOJM#{*-1JTTVB17m$WFxJNd zV|_d@*2e>5eLOJM#{*-1JTTVB17m$WFxJNdV|_d@*2jZmeLOhU$Ae>iJUG_JgJXR> zIM&C5V|_e0*2jZmeLOhU$Ae>iJUG_JgJXR>IM&C5V|_e0*2hC*eLOVQ$3tU%JT%tF zLt}kBG}gyMV|_d{*2hC*eLOVQ$3tU%JT%tFLt}kBG}gyMV|_d{*2lwReLOtY$HQZN zJUrIN!()9sJl4m+m$amBdx_pHCdgw5iz~1Ag^&MJY3jb5j#p zyT4P>Qmi+xms*ab;1|BWs7nRCt?4s~^dbirQovS?avSijrY}|Cd%WJFx^mv<^_4|M znk2Er_#g+*q}30r`H229(@!PPw)2^vD9!^WKP%22snMSUys`q@M=i`I@?`xa1%7-5 z(*GwrvzcVrad568KI`>GPtPr70iWkDM=Iot%t&l%YVtx_*-ti#al58HuuUCRMix@p zsv>LTyrNyP{OaT(wY}ECrOdN)@JtcDJXA6Gm5HmOzbaHI?5iDINUpDOa8Gf+HdN*N zb)icCmsJZA@arwrvu-%wm?}G0q;GQk3+3|7l~>o(xA?Dym_>&3U; z20OgFa1QGfE=&B5`sw_(o7i{ylVg2X%2BNEc5qMfeormZE%LX;fEc3Mg{(eGg@AVj~_L8CSCR>&rel>udlVcvR%Hh z=6YA^`tgd~Jbt1^o!JI|vVOd)qJFBP+v?+f+J!A;>DPq^>HKHHr5OHM2UqGCelApT z^YaeQ)OY-X)6LG6`7egyGe!BOT99?VsXXnuw%+tSlzIGejcd#O6=!}b{(jZLx#Ij< z__IjAo)`XJL zOH{S|z9YU$M%%o>J(SwOTf9NtOggohHu^44!C z9n<{PM0=Z(mn3;x7qJqDZWl%BiBfeJ$cyz>5;eebYPC4`TbpNnvw+$2XKF za(m{8TmH6mZtvvqg_(cnHOnS-JNi$ zF77?*hcng9dxl#pF}7W3XiBx(75!nsF?_pRmJU09MoP&wT5$n<_S;hu7zSbnTP zT6+KbX;=FD02k)Y`~xHWu^9cJ6m_Z^{otDQ$x`V*#3^jtAL`(RB7IoR)x*?Fo&0pJ zFCXqmZfPG;bGmD*epw2%hF|Vj3zdD3gNF*fqUO3%_pn`&TiLx9itb;PdS8;O;_o;8 znT+WVxQK<7|6pWtaSu71Vc}tiSI*~=lr5z_>fk;15d28TIQ0PZQ6?XVyH}dLaG&C-h6^{a&ez%@`XE#PcnH=I{jpmmyYr&4$o&&=%-d)@2e;LG$%My zaHp~@c|JX5In&RmDK2EWpIOmNJbhN;xo7|Egk8U%<9we{g3ql9Y=58U4-0idpI=2? zdhq*#3U>{^&=F5PA8>G{G+%5csS44*q~?5~PhVQo-}O*7bxa0K@h~%Md+s53otvj{ zd&zCrroCU&rxTnNP2INwPm;#^TGoiJqgB4R?Z3+5w35>r%T<5&2DUZdHq-Kj>ipbMmo~hM zggr5SHu9glIp6h%UALB3*Zj^jGCgO$SL!2P<47~J_6JQ~yaw`Xs|skf66gKdkttR+oVN)j85so~w-I&UkJBJyD+7a}qq+0%pgf*sh zCG7U~m4&Xn2xsoZ;4FduXoNt_F2Pgr^;J&c@w+1&z7`r5{+PckRPkL2AId1cy2erm z(t)xtx_etq>yJmzC*r09!13vvaPu`a-tvIiR0otWuHHXUKOL*z?|>{7c4duU>)cM~ znrw6clAL%N`zI?8hZNd@!o$g2bKI|s&XBGnn5EwbK>t%I^EoMpaFS%myG#=L-i{!} zEZo=q>8OkeQAaS^d@h~K*VhbYnniU$n3ibGH`Fhl8+1TepqrLXzp?VIDMtsQsQIb7 z{6FLHQbTx`@XW31o1A>*@w@|2L3?F#{AZoy;z-INOetjm|7LUGIkaRAq>*HN_;V>~ zDVs`o;a=fes)Vy6wZIM>6&-pJ^XHvxe}>U_grkh}h1q= z(sKcO9!$Th#-vIqy8|PDCrW?C8O}~s0SRaJ|J{|n3pE2!)W7dZ*GjB8cX;9IAROI!!uoyrqQSi* zTpEwN(7#b5YSitXN9z5rDjN({k_zm?G@|~^noet<4upc1Y75_AtwM!W;Q9aGa`Khz zwIiImxRVL)4@ARiSOoLwZ07a!-wyNAewXm_uI%LpQ+=nJPjvuTn|ZkSI}V?kWx~nq zWbV=9hg^>OhmN46@zUb&hJW=H1f!obtL}&WMRT5xV7{z8p8ma>^BtAUz`ca4HnhbP=NnlZb%w#(tI>M2NIV@r{y`yP@*%8bq zR@Ktb>%zO*3qjfo)~q)D`pW6uO!ztoN%%~9=r=g_iKnm~Ky|C}@{Ojmu#`Xn2L{bQ zsJgkW`Bn#v`OVaE{w=>*GH% zGkZsx{&xVBxH!ianxCxkZmYp`zzD~@N8J2lM_sCe?g-}_twtl@PdU$JY!pDA_jm^W zPaMt|M>u@Esrv@f{B+IZ_WVey2BYeeeK)Uv>U0MevLC`Rkb{Mdwz;mtx{-sD?ZO@J zKeO8Js9p)mLY5j;f2PKo?MrSTm2)7m|9R!(c;<-;xOATySAW)dc+T1pj?{ErYO?>L z#?-jofuzb48Q0G_JUc<(9pSB8Z|9LH3TpnPzs}`9>Dj z38JXQ(rbiYGQgbG+yby>wVZ!j3!7(t+(C%SmM5OmZ>n&+vc(Qac{3^c-wnzXlb}@U zvBuu!P4(l6dQDI=-OcLk-@oD3JKehF330V~NK_)T0rzab>|(Tx=m;YHg@>#E5TO_i z`2c_w4U)~TIF9F49pPAmhRNnX){^heJX{R;cPF@hQH?GclC)NMXoaaY5ubj4U-OX`(1~<{Oky$(Or#7|GTPc z{%X%PI|zJxwHoE`RpHLg-Q9xWZZEEf|Kt2F)Nne&`jDpl`~GmIIblb57#%(2{$GDw zx>XXUq;ql6ywO4IrzMyqnzhTIAV}~d!aZRMdNu{U#T$$6Sqdp7I4LjE4R;Wt%?~0e zKy9jcXx`GFW=C>9w}6=Xu2*!xG#YiNJt#F?*32(3s|Rr15`?o0_n(B5$C3POJus;b zv#@<(&GU@Gs0ERb886}yPC^gDeLn5k)ZVIQ!U%*#wZUE_s0Zl4$TvIF9H$2)pSy0a zZ(ZT(oJm~^B*(0v^&?^&ptZ@bfk`)w#UB?Evq34FMz*oH?or0A0TpN!9cJ zwl(p-W;iX7QkZ+l(cFQ=>xmpNJwR0|p_;d=wRV_AV+(236kHUOrdW% z*3$Fd7Lb}hJNG2NgFpduP567lQOto|m~UU9*F`OmT3cG_H=c14Ut{@p$(T(Cb1w3h!1^#vIxK@6NrB0aFZq zI5lKInBQ}$?_Nupq3RYy%;k~%bRB>aUduSrC+L;^d(`v?nufOnoOv)E;F(6yet_rR zdwNgjsIj9Zh|&)2-${63_6Uc)V>hSwN-b#^XbG23Zt_5Qeby7X9+28S_dKL*o zP1X`7;@z42{v9B-xj#Q!$a~d8DGKo%@_nLL#>AE&30LVV05E)^MO&XZUQN8O;~bt^ z83AYKZm)oMzV-I1zl$?WQ0nQSh0Ws&JG%bB8*jbj){BX=IC-6YpT{0 zrbcG+I=)|mRDVk_nI6qt-C$BLG-&n1H=xGciId6U)e?Cq zdw+kRTPA}xfR+d70GKEpw479ZKF!=AHy==QKlT!>1!o${+%}V=4iaV9Thsz8 z`{V3fRnY<{<;Zdoj*;xnGcD_(AnEcb?vf92T<;vUgp=v{TsF|5$Ow+?V$3%NcqPnQ(lY`aAKQ`(PdL9nu`2{=K7oyn2a6HZZQ=_c+Cn;ErRkl4hz zL9%?hIPrkqgOU($l1jU;;-jU!yDgZc&sv}k5DCw3+hNt`D{F0ucfa}ByRDv=aKt-S z`|5yjt<`2vFx7IVmHh)wb124I!pR*Eu)T8-!rOuU|NVfE?Bi}89Bc^(f_i}#3M#SB zqa#e}+2UyCfwl)zFnbZ+5`3r{&)G#;8n%#R^oPSkIQ5KPs{{Cbes zHGi+Hd>nat*@8>;YZt8pl5nOzii9Bayg1K>acX6GBGi6#tQ&qif~3K-p!I4GilBS9 zJqh`kDC9NRT}zn!Sl`Kw2`D;uS-_GmM_xYG>FoPy36t*B{Za>@IMm|`2PK!e_mx|J zWYBfJfnuk7*r~*WfF@GsiOLephtM9Y`Ouo+bb$LFG=Q;~gECC?1BH26c5NT8Ih?w~ z0cKFd{gf$02PwInJ(qbz4@C+-wz)nrWT3&L9pH)1ZFIn*AMfM#U__bQ_4lMJLmS^M zLDjVkq6_%~Cv{J-^ap5By~k52nYjyz0Qq{5_Rv8{VMjg-(+?5UKazFxbd7K@*K(%? zm;7;psEt8*ms>BQB4*D%->@OV+>z?0`M6Z$`MElf7L@ddtjjcGfq-Vp)N1(3^2t8G zk9S53E1EDdnNX>@0FsY(lGI=lGwZics0!XU&p^1DooJ*^c;Thn5bZ$q(0pRek{Od% zVGyKwtvtGfZ)@MbC!9v-V^5z{3p?g{KgBU$mI)RBX3xDqA)F#sM>CBBeLfz$oqlR9ZqF40lvaHZ zuLFczMiaZ%`yuv$eXXBn13aYST7qFMQ}7NPt;p}uC*mC}wb`i=ckRt+!Qt-wwg-_8 zg6c^HCDO`tB0b>_A<8lT!UYW!dN}mqjTWqLcK;3E&it(h zr%Tp>mbj3<(D~v9M!pun0D8{{AS%Haqt;72PcR*L_GglOTC{oXflyG@B8;Q+sRo2N|&gK?e z*PrLb;2^;OK0QzGKxp#}udr9OsRgtbYb$Euei;(E*^$~DIPBS9OE}!2 zekBFRN{;1y?tyA@nLiyGFu_$107%5UUTFiv15+1mEc6g4Fx%9(><WFQr=uNql}v zui6c8U$6(tWDvYr)T+}NBo)*`Nm3u9>3|fo=lLFCm}Pn)d$>~tlc$7q&tp4fx&gD0 z*1!dhk<5-V@HfxY9EvXkOi^wlUBV|i!r2p+j~rr z+w{WHYYTxo$|Zmf01sy};LvBM`zr3d7I$b?fkhOLjvYAq-NUIy{2l@g7O#cZn-|fH zSZfKV6ZCX4eP{`Sgn3SvNXKPu<_PDg*l*RPK4m=1^mIhe$;5`(Xou`S- zU5m8R07|g*3wSBLan?af$vz+414+qbNfq?!TJp+5wP4hE;NFQa0%GU(<$=P}`O+I1 z&2yC}+KvyHwNR4aOoIwYO zzlIcDu`{R}D*cdTPaWiDNWvPjoUr&Zb7KeFgmB6~o_qBL#x`EYoM}ohg=y8`1Fan- zB#gsz4UkYeCDjv7HgWlE2E1!$R`cbxODcn+M2Dv4u5OPY1_gWS6sAF#_g(d7plHM^ zyETwfeVU?A_(=NR5Do{a;N~l8O|jheVc`jsWR#skpjz8Z=OLzm!aoxQ3LWC^YJ6G` z@XTAf1N5P-4G=-@>hQ-`7S+P|B;kct`z_(*zh`&Rbij=Cm|X{kw}qVD2(wPoxi>or zqZ{TH`m#wxbn~X}8!7LPy1a#IdXg~HM9L!^J7pivli?|pNbfGa8zL-Da7@9%8cLZU zdaCJjAw5L2ZFliz$RTU?ud>wpDmlSjI+T`RF&c(sPco;FP&8-S&}0Y5*rbC5X#;C$ zQO@1D&W-fo(x(=0&L&xuvnhYfJRNF6G9gUf476B&3rJHq$t-RPAu^w-F?GZDqwLGA z^PRn6ckVq9VNVrzLpVGgaHn zN^(B*A&@@lHN~P>04&_L|LBj`ylAP~M41c(22Yt~3t`CIM>eKIl;eolL4Z3ZwP+NuUBn__`{OV8B;V$gES$Ds3FOD2p8K;zduWla#tFQl!W zaIqhs`q+SBc5!x=A?r`N2uSWL{geDn6i6p(y4qz0=UST-u(?>Z6j1wvf z!v~_m7(LWO0wP(w0L^7htYOTZ1{BzpVYu4>7Qc7T$gw=O-$B=BMAdM_0kU* zW*N~o;Am}?-L2*uYF;aCo=t$1gEI}#2ITU^w0yA}vt{?oD9}2&q)f>C#dT1iE=mpw~ktEH03}2g5V9 zAO->!Ht*)o#*B1sa6*_^c)>Kw2T(4m(Z-G0oF=ZTzf0IB&enuek>|_WTldgV3=Knu zw*jHP6?WBpvkRdUl@S3V?92x&S}^hC{+$v`W!T-_0iy;4%rS#CBvr(@`nw)Ld7OHa zp>2X@?4dbdFGWDr!}07#z9HSEJ22t@aEBGGepxgWl+PTejy z5YRY-vrl_|YgHH9eUh^FZ_V2z$pl!-v9mSlAcc{9|SeAhEi%m z8Bua#E4U#@mS(T{3zdhscu6{el%)8bb>6atfWR!{y%9JCii<1^I}j4F4TpH^Au0Q0 zT4D7NDC#-6i8sT?*^y5qHESm=Lp&gAJhQ*31&0f!yz*a8A*il(ZJ|qexrcvdJw&7{ z(s{5R0vw@n4LJb|@8LIpvDVyyPQy)rDH69Vxwc?Ru)APNSVGd{qv|eSmKrP8gkvwb z(Y(LYLrFd--ecH+!#XQq8}fGy5$pq+|L2|tCu}s1#P7PDd6hiHEY3061koYsjUlT^}4Ba+b z5IivdrV=)Qw7cC4Oso0J{fLCWc1s06qX(K#eIooPvw7&2^Y&fVbtESp!kitTYZ#s@tQUx;iLm z_Xxc-f3406b8N(RJ%O0g?)s=>3r>x423f1i=@5b2%IfgIVaJ~L^$=gzwj?p-|3oyiu#%OVZhx$&?XVwtp~yx4r>bQA;8?? zHRblg226(KEL`(9sxY&?Y{yTal(Bwsl8I_NM4F%EM8yDxEn_7dVFX^sN zc;P`V;id0yUDam85{xChI)Mo^()mbcOBmBU#x$n{qtF9y0p-h{8Z*Jt&W`HL1yXop zOA?#f3>fvx#{Q-?9+~fcQEP z8wy>^ouxX(4Il#CRff&?*W`@5+KHJ!NEgET;;dYQfEGDWBRZaf$uQ2O4p3{UY!s}A z_*{uoO9Q7&Z?J8EsCMQfgMTXqsN=vB!c`QHdo2VEq<9gh!)NBF>VG;2Dfvuw?XDYA zO3JR~06kQ1fRSU)x*p+wpcaVNUk;E71S;BR?g%2IHs%o+0LMP5*yY2;i8|LcAUW>k z)&w}^b7B`|15;i2vvY^*Z(F1?$(eviu9coI7`9*vWnG}w4M<{+hiBh9pINmd{Th&> zI2nSDf#`~!PlNI%F_+#JZ+_69$|Z>j2wBpZJ9jk9tpO=- z*C)+^QN9Ni^#Gehr$3Wn{hcAY4hBwuWVPo7Yzs(5?`Bw=4upfu2092xL*Fb-tcTBZ zyvn?XKyG*yQN#ui#^8j4f2hJ#DNy3dlh6`QZQBC>5Ux4&MGZ{@7?H@M14IO_HBgspu-cXnMG!8A zk!iwah|HNna%z60M&LA;8sG#%s^QH0KY=

g^^VO~^xMs<4AXX-joWYr^4V*Oz;G zFg%~)u0Yg)az*6t*O)k#wL}1u%X5;aB^;jTxg9zkA|L0~kzjnV*>+ol>11X)1VWH} zPPFm6A&kk>3u`*j{HS^13_4Oz0_c!6?Es`(;JmC|Q&T8pn70fnGy6FYy5<`t~%ylzNocr5)h0oS;)+x`Vrnv|SQ16p?TZ$gs(@ZRkjQsHxhO zZ+vgS7Ke_kd0j2)L?hq?7*-jiy`j-UNsa361W~3#6zv;&=z&6hIE>+dFwB)pdp!sv z!cl-28$cPMbW2hxo7dOO&tQ!>69{N+MOV-gB-caDET)7@Gx3S44jeYMAEL$9I-~j1R$9$)RBi%t6j5;W(a;*r~ps@Fx zPbWYB5w&eV-)c1O@Z)u6Ku8K)OeS#zUx4YYHv3O&6pJ=?21HybWsz z#h4ay2lPV(ot>0zLqL+5wndsBuh|_VvMd_{taBzU-RKXe` zW02AGNrX4hv;0ls;S<&*<{vuVt~(0CGR0hZYJ(_Bo9R;!wI)AkN034}qpm4Tz+G>9 z^@((X$~Od~m*Y$TnxCk~%q%$<&nJ*lEzkG{22d4Y-|KXUjy0TnBdvpkNIuEEPF$G# zaOBH}mA7^XC(k2K5dTqSgRw-*yvab2H&q=XgjCO6yGEwqWRN?DKy)HswYm!sES7B@ zK-Ojq9+j0qR+hz-|#znIsqYxmp)(7CGK--G6ZGox+mTf zhM`sIgo9=$-n47}am|8uj1f%0G6LN(Y4c6KYR8>|W6V1H$_x(vyCFSuBb5tA++4iv z1Ii2$K+ZmP(fm}E5*4csCJ;zQ<>)RYplFFHq&!E{fkFppUOFh@MZ1=G|1r2Vw5X$8 zH5@R9DUoLbrk=Fv~NR9q?_Ri|N4WXmu&iud%Hm z$rw+x7VQ&Xn@n?;4Mut_#^+SskTWvwv{GNhHx9?}V2&YUIv37OfpLo{5(m?1nsfuZscMnLXH ztplFRG<8U*s|aF#1BEvFidP`>&(xxPj*w{&wKRc9#<>~X0wNjyZLM>r`MR{29YD>U zHnAocm9R7i_lcK_=^Fqxv%B=kyylj0kMtx+(o^CG6(z*8~E2%`bd}6@W~7jBQE~ zCO%TC>V`zpxCG9}nKp|0g#ARrhA@iIodWl_&A)IacRc_RjzOxXbjYKH66SX2hweNQ zeJMdY!tizKO`vrE?+mFxK(x(bml^jasnS#YT(xWZHL4Sc5DPsC&?dZG!T*%7`dRpt zIzh>hAsf6Na5?V}E(h|dvggXV|4s5>?WJz9`IohqqZ-P+y$Ljx?&H^lsTWV-DPkH( zFM2+sAMJo?b6Qk*#8@YWdT;e|J+QViH-Qa=Fr8DK&Ck0q?t3D3!n?joK^ReWonPgG z&?>eexf3ntk`i9XoO{Bh>!2ttr8fEMTX@aCazQh>ye5PblXnC>|7f95+*vu6l(;^y z-4QOWDd)l05Ksw+(mbB_Kx|L@4VwhTWAh7D_rBIqu0Kqmrd$pr;<%=Ep+tiida88_ zDJ%8)$(t|pAgAlwm)m>|gkffgnlSW3Hk_!B^KAnuWicH2tU&XxqX9jyF(DkOxJe;Z zwGc{GYNOvrJRngb4uEVx0o8ivvdVP;K6;f7z+h&2{bNHo^_#um<`?~K;mvBoDHLPT zhH{&a?&hb%M{XZ4v?hW;d1t;+w+19zE-|afAvQ-h++#op|7ej#s_Nk1RFbA^ZrBq@ zQKI+GS|CNlRHuLp`1DET%0>sR6zdJPb<%69s6zYUGY4oj80NE@$8!w3%`er`I2VV` zfoi?u>r4qpF`OaMc+x?LG+M9NL%D`DoGanVk|!yBGQq<{;e;dGxvvNO zTSFWjK1nbMv}X%bZGkEANM03KfPkc$x@7t@M4(M*ezAs-^5ujrla4;IvR1MNAfVB` zjFX$xy|;M?hdWOjT40pzn+ZU5+QhL*ApTLF%Gih zV5sIrkSK6fuQ306L$zDaeStP}H}n9Q;eC-QAbB%u_G82ON~eN$)_{m{EQ{)avftaT zjSVQO8yHB7c4xV}R(t@%1Gu_Xe4)HB$x|n=lc=Lg9SffU?KioqI zH(nTS0x?A!PxBw@*E2nqGXbhU*EmGEpp1l3be`gw4oNbcCJEOJZ2%w{a<+N$Ne9%C zY6BX&npIxk{*_v5yc*!!a1$t`&ryID5IwM5(p)qj56yW8s788Zw={$Z8Xw;CiMeh@ z@nHA1gq#29@X8Zl!pR4JmlW1QAfI|mbqa)uMLinSCGBl3#MXr2R_2MK0~R^D4G1&d z^J{*!3M#ju>llixBKL?}P@*5}M!|G|9CgCDfngD~`a}){WbrEiJ>jY(tvxqMs~MhO zQ~gh2(a+~j0EsJ;N7MnI`;fqNzz3d{bl|YHuldy)h`5Y$$le1GU?KN1r4A|C#pP@B zYyQefG5cmg5kPk?G=4*nMazXV&Bt8PQM>CPWF%*ZV)e>33wdMKNyEu=3MULFXYOJ* zfV6#S$9q}-xi%{9sFn%b1X^OzHTot<3nkLs)^WrskQs><=H2sgaZ2iD-Wm!8;@n~u z&<_ds)g(8-^hbF$$@2fY1>E%jMmWvL7e>m96)jY9VBLK=iC-aVAbhV3Z;GB4ZzS?gUjvGKcAM30Tl}YP8WW{{=YivET>K;Ac}Ek=E1y$fDV=%tw7~~ z=wRme6@g^qKl@lG9*?7te-9{jbAU0b#6W+b1z&&Y9j--2hWfI@aK=_U6A?=W_f`2ucrnc%~(6+Y$@Y zfHPn9?*LIfSNG&qTMqojoD-k>>5@s=xLt!{S~`+aj^pHksT`g}$q6ke z$#J+&jS3)ACpwra+XkU2J1WF@+N4l!JjPHkrV;$FKE}h|@Ku^ib5J=78b`ltNZ{T>pRgtLJsF0E%Wgww`gh zg``UNx?~zaox*Pi2!9OwUY)I>paMVt($N|Fd$i=WVnJ zm~Xye)DkZCtb@8}9s=y;xIveg3$@xcVf5((Q)>S`I81Obl>&h>ku&%3&F{E~eIIlr zoQmKPX*KC!UCZ(2*)(yFH86LCDV^=wdnge=mi3lk&_zzwY(U|gLoXUYoB!p=>|EI~ z213LGjs7iR@?<`b6jMMdh~>oGE9oxZxu*{UYkRSZrp z*k*3j{hk@mRCoeN^k)`QWeW}$RFLikP9dagDHl6ESs7C8dpWrVqZ&=VHEHRAwFf*d z!v`c-r$^g|$>#r5RkA*ybpg{{iu)Rkw$M;7#wR8cpYATNTLB{zbB{gkoUfro4BTfN zb?Ao}bH?30hp%Gn0JIz*4a#N$*sXr z83+C3*6tyMJ!;CWXEP)@_i2IV|JE4n!NvBWgaI~E07t}HU_b*EkI(Hb7~U^vfgsgZyOj1 zbH*N994ND+Tn^oV0>>$u!J(K=69@}92a!7AQ?<$sVWeJU8gK(j<*>q&H8(G;lQd-qZKcYCs2sZ?gJ_q zVjuYUNtKP9E+GGHvmVmwp)y_R zxvFu!!3Th=xlH()a7t!%U5q|)IU$9nZ-(!=ETC5ZwpHvq_F-*-v5b3OO2rc_-8V1C zN@xhlo)(w`_AMkzpY7Av>?G{1gel<|5zqg?@(Nsfn_L>`1_JXKy;pb%A(`uvxodMh zWWqTI$qkMccjwGS;JzPX?D?3-H8`zF+sV0+w1I|1yG-tG2&&(w3NY;E?TqM-HJ)sN zQoaWBOr;=Uww&%U+zuqZ_YGShqO*64=q0R4+mvv)Oh1b7P?EK7TsW@|64kxy=fN%k zQBLVm)k4F7-J2B5lHt*qahe8XEn z_RZUu?%EF%ZGj@=U9ajU;MAA43uFr0P$CN_qvQ>2lYxd;INJds8RwW-yiEaU&u+*S z;b@DAFikqC7JF_ym!M>dm(UX!2xOB%;tCM0@J>%T;nqV-c0Pc29jJp4;$j1&U@s_d z2q$mtG|&Y#FRJ>|0sXR;+=3W0^vh4G-w7m%;seFo0CGIlgPLtVuw#lob^ysJ1NT5rG8uq5oQ%!l+U&Q)1)*>|IwE)&$WND$l{`tE9QTVC?M) zml~Gu`Z^G$$9o8bpYz+kPjUmQ{gV#tQG3mctDHPlL7%2+)fO~ZT516_A)M@e=6xFo zCma|_qin%&%nX6Nc9QN?4)Q5sT7;^%n9eFmQB8R}864dmbGe*iFGDCtqSd4!q{V6! zE8gcTh^obNZA&0NF!hv2Xw@pN6GfGvE@lHvV?FSu{|y+Pb6K@{$NDkD<`y_A(q_JW z^AkwK-?J-x8-Q+LnaU;{-5hJ@ZU;y?9LUeQX$p>j=XoG%1tEDe5h?dMlLl6`;z0*P z6wd5$$WdAtNW%%@D@f^X&Yt0D9Fck`(#o0cwxEF_f{rZNi5rSs(BCmwylApSC#bg-vo-xLoXt16QtDf9>Z-%&@hTi~dL(Q?U zcd`(!(=EcXRg_R{bpnNwI7a1j?b|?dUU>VXO>Cs7lg}L}!*HZ$vZ;{i5Q~dSo=mR> zineBDryZ0CaPE!yOK@osIoh&L1a)loiClq856@qNh+k?xiF9TqkM~?dpn7~UXhS$N zV0A#xmJ<>;P!We78;fh+x#mfaP$qm^&|;{0MDjk51HoIHy z2O8Hj>~6+7ps5aR4x1sL;u2?m11ZL}=l>Xcm)a1RPS`QO1qD9cmBvjsL}3Hg!}0b& z!4c^F@-~Su7P)&6ZfA(XW{j6rQz(&1&)Zo=D@bt?=Zb!@%5KO^w}Hv*5<;4=#3&E0 ztRZFyk}vWKgxL9`oj%~glZk79nz4I_gl9V4cayM3g63VF*Rwi*%eVkajC z{p#(+E+_R{5c&JRp>~Ft9RI=Srr>m7Y(Jjs>6{f5RgceOc8RXD>pqKj2~OSk!0Z~B zDlc0*^b}V^6hAN311TU4*tK;%B*GO^4xn#_MC?!^IS|d&7kXdfLUZ2=;ejw%u66-p zvs9}_?rjJpooB(+2J!BSJPtcUP(w=v-I_m z>@6S+aU74XfG}lfL5wz!>nrpi8Oj?jB42i#RNFCCb3!wG13E_)Yu8spVF#Rn*vLhiuc&t+AB!X zsScxe!)L8r-3eSmLNPP{3Di0uXTPoh43-}E38qQ%wh5EQYcM?O-4oQg0a1eAOTHl- z1=dMI^B$GqXEaUP0@jYy&p$D^OcJHeUblfMjnxyD-GZY7o>5jU?FsRKbb^>ofiBSB<6$MVf)Z=26x%M_twH~u*K}BM0iH5%efxL5OC@IxC z_1x78NXCOJyGvAWdtDN)*5jT1HH27cJhe7~D~Pbde<WyUWtOBUrLyUWD0QLP(Kb?WxMwkdQNf9eI_shotQZ!YgBf z__zkudXCe2lXMK&#mMWg=Dq94s(*Iyj87i-Ey(YFYm1u8HDM39Vr9Yy ziY))lxDBGReR!o!*xl9+;Ud51B-4CbXmfQ%7-8_@(M@##DwHc*_FrEDox2|hT7$?% zZQQ(G1;Z4MDcW`qCa3waCgIn^NB-7}mbw9#seljryH~t{7#rcnZq>m1q%I^i!LU6$ z)m&i$h8-3?`UA`Dpz_jP3raOHO^j=62U4n%rIUBYAtb*8pS)RtOP8bZh&t<_AU`7Q z>cr0_kUARad<|6#UwF78h{C;h)f0w0`Wf96yatgIo9J#Sn?&^EIX8sE7boN7+iKp| zio^3snOlGe!d-FoO2`c?0$%_ zXUAak2WllS6Z74UEhwcD^nlO=D826Q+gmq8XZPIR+!jh2Qt`gnICq9j&gAr)0;wp! z!?sG8?G`GWaQNp%5M6A(3_*i>o~&3yh}QBbug1H8=GP4b-7=wu6ItZx+UZ8bzx*U?^;_n(e1N}UC*-;XNUqFbJzJaPGrW>? zyTm>B-vbVD=`;MVkvmXQA-)Vf1!5NE#>xs50XVfwm(>lJ3Q`*fr~0&frFz#8Qsd=z zEJ6;>Ub8Z_5vdnD!-+&^Wj=k7j{ehLDsf#DHP*Rt9J5Je7 zpi!6p|N8BOml}y$!sK55$z}%xCwF{WY6^%dnA2kLKbpL%MXCu5F$p-!4c;MQdl|kzh^cd zT%+j1vp2i9AVoS&CG-cJ6DY{ez<}(8lL1qH7;FcOse^YC*O^ZVBiVs>23LS^NBfc% zItXZijzoRMWzcE7z3U+|ySZMQL!el6l)L@?5JBE@(qAXN=6C37ljtB^05tR)u-Z1a zMm67QKEy@iL=q!gL3&sc4NX9b%8O5`kZoY8>+*jIEtFahy2Pb=*hv{UME*3z7y10_R0TSSSMP@+bQ zlGncf1?(nYH8rE8=0ly1D*#AMn99jRg&*&PSO(&S6<;>5AaN$lTC@K zInE8?)QR6+%Ky)QSS^cxB&4h@aB}k$W&+F{KW>@SHbj_b7Z|A%AN$E14Oj=-!}KFl zNa*DxEq(=xEcISsm+#ntRsxNu!DOVT zA`l%MarHP?oEkUKVEwNDMB=5+{G9j7Z53N!GDlF1e*#F2=_Lj#Wjh3~RQ%U=Ed-g= z$)OkJAUO4WRSSSAn9TAL#41r4M>K?XpxSThas*nxgp|@)@M?c{JrI}!=&I-y1RFD2 z=?TX^&b-fi4W_I-;j~Fa{)~ePCk%_bIV0M9cr9d}ji{}GpnuCi#>qfnCaq?^4Iw4% zyMt+m2)fckgdGT!fX+1lO^4`t!(;`nlKPZ4!b6HdX921 zd*N#3snpj)NwvyPmgj@6p`o3b{wr}qIJ#WmPe8dh5F*5FpF3zi!Y0mw+ahcMiC-L{ z(n7u?lkc=|10$m6Eg&1oGqAD)hOlGP9q*7v2Jp zNVlgXc0OUxNVXlo9Xh_WfbgqrQ?<(-C=`GTj!GvSPPLgsM?T;gqYl!7DTM`(YVQ(M zI}}0$$g8g2O}tlZ{5LM0~zXdaA%WLz9M@=m}+6YNg2({E5BH& zd#o)esRF!In$u1oNOU%QD1RFilVU-^SzvX#Ei`yqX|1qBSPd-4ApvOP&91JjphN@Q zPLRZj&VdLCoak6n&XpnUKw+@+eD4ZMZ6|QxTRQj_Pr|HhCueG{&zLPNQC`nRl5U@1^AQ`&o2F9udgB~iVH*$tS38Yuk|j2f48l4t=?1{!tNS3pY0gW0}S(GRJSP(wI4BCFj~ zYa0kK>@CHcgb(BxXkKCZLN>BRIHmX?>STa(JY@$Uusf*b9aXmji_pIo=;wL!pQ~(R&3*mGWMi ze4l;3(v4v**8yBJqgUAg5Y*?uZt?+lG9Q|IYaR>Mkt~U}AW#DPT2#~ofad2+Th-1% zVNlCnEfd>BmYy?FA8jKfpbU;Oob-4%5Gi_B9Z_>%t%Jo0f048WOj*dMUTye% z?2{71XC_2^Zu5D@!9Y2rc7WOPtp=94>X1s}QPJR3G*{9x)9N5akNyuJRWuazOw-qa z#r1&Hm0f`lfD8GkfH0MV;iI=}K-kyIJ&2!=ngwIofGCR|k-z-&(Z~C10@08k96o&F z0R>GGgA1MI@FO_P({6M8x`h-aTkJgUyaQ3=K~8c^LCIIQX}lPP665#t^&Jq-^Ufs2 zUV_tBnYbR;YdUMF#Zb=u#KIK>G;yJBq9;s_^#2y^8W=`>c2aKD4W!a-v%t2;1+_q| zu2CAgOa%2e4^+j_;nP=cfg?b66HyG5s8ctV%03IU@W-k|OZFB*N}S_VEW1%VLo7Wq zPGQp_s~mN3hX@TTzfI+L1XHVr{^Nm5a3ss+depTZa`06JzoBsjO?{8g>GVS$xS%nH zYjEi(@-H|W2!p!Pao-FicHQK%M=cLlHF(&P#Xt#j`qvlK0#>~<=>ctnN)N0XZf!nt zI8#0E4hRMhe4%L?kUwQu1+Z|S);d5csBV!ZrXxS`U$=9NPSG8gf%_V zniD$P&>}fau>KQL3lUAt_R;Llkd2OEX?-+>f;1o>OjrS7gCmU^A-jCcc_y?0jCMHd ztVRGPXU~zZhHqy+m*0a+&%qWFR&@+Fc9efWO~~W!jqmurw0J5&Ep%P`U7+umYN@8a?e8I zgG-Ik1xs`Y(FgORddKN9S;Psr1{9sDE4m7_BZc%mu$qg;!)q|+vd|9IhH&M@v7V+HIzTwbG%(IfFr4qp7}tc88-E$2p8W~}*69s%ga(B(KT~iGOil;h zf8T&9hM5tJ-3%YOg-(q$A5|OqY(GbuwxAF_Z!}E;(!p*6!Uk_oskYjDeU3*h5E+o} zLiHK>a(YYIz#SRQEtfuvfrpqdLXeB`s}3c6A9V z_MUTw>mfBSms0%$T>5-55H6yS>Hg&^c{=LaHVdf`IuTOv?(TN zC;>_|T_*6-0kkM4Tc$KAI)Wi08Jy5i7~%tH3QiJrXqCn&3?9N(i7wC$onWO2+_b@zb)N?(ayO>&&%N(tqQyb$CP-s z9npcsAkXM5a@n>|9QQ7$=Q>%)vFe-h8*o_S_EA|ZLz2-k2f+{?Gtebuga^P2x0tYg z!qsHxtPvbSQM{h!B5gZFIn2M^??51DcG^(X6j1vWZY|_*x>ku1=Trw!#j)8f;X$@O z^@4@fBe>V80$2dO+$<#{#ni9!U8Y2{PX#34cSQ0Ur>*nbsTS;)JUC ziKv=Ui&$Hr@W8wlTdbRfv#N|F@> zq~S&t8tVY+C7Co5_i~8KwzxXhHH4IyO#z0DtAWH-at0bQ=^>?`E(=(sxgMewUm-ey zBsJPHZUAVX9b5G9WHlmtd$xe7IQ>;$IjKK^7(r)8-aXugKyGDw2cX=RTI>)G&-ydd zTpNH?nyelYUI7srf3zRLNm@?*p#Z{^GCQh!qw5gpDYWx|tXB}qn54ZuXWv7E6_%~2 z>>7ZD=|7S;1mS9?tCSLu-8!(HposxCU0UI>{dvY#^l$ zXwYnEd(jImENRypnVOHUf^rY3KFG8MG1_4Bg-8={>gIrp{JNRF4Miq|_Z3wH=rRm? z7}B2kCA=vZ0hm!EKVeFzIw@Q$=newy_JDSKNst`#4=n2d9f3ydq@uAhkJ6@JKM?f9 zh*|`MDDaNXN^O81O+A;p0gqYzU`|k}sUKXr0_aM}c|*sQIKS;rs-v+XAK$ z@QA;*J|+;NAkN((7WNG#+~IkMyRIEb!?BGWNYNZ`{~$Urs$lhjDpp`Pcn0h4-YoG6 zBA}d0NGVj#O#R>18VZWlKR?0c6$rb{k?0uXKH;h}9nTgN_+82uHz7>QJTtltNLgGlVy>y)3=+(6$3V?@C$V*>0I>m7 z=u7znxD`Qi^wH@qaTb0B;u1h|Y(#_eI-l}(^;H0U1;)|?qBe#mVqSwwCFP{=CWJiF ztFGfF$@Mov)bFR(=!g8d2j;Xj5VSm$QA3E%SVVg(avO?z!u(YU06yjDAiJF*wcceS zyXl(|Sb`yN;?p4S5WfRSp_-3e5>7#EvD&oPP|(u{+=CSpjF{TxMeG5`@p=uW>SXgO zb^|0|o}fiu!mx`kO7ebMErcfS3E&o#^iw_s6~l&>t~6)$V^G^rQ^|*P+cuvT66JUT zzC$?H<5zE{`Q*#JE5fi)UOEOu6#XG1a)VQ2hZkNnt|6ox9f-2|TqOghu_WjLDaiZ$ z*FbnU({B8RFtvep)jn@Pl%&oGEg}3)6`oV_zS+73C0Y52xCxNt#h0=T3Ztl%ey9aU zQ_r`-9cbaz=lErjQ%DFK1+4%P4qv=1q#F{boAt%a%OuE*X+0$Hy9CsHzOMX$QV$4A zyS|ln4UWy6VErn614T8W=TmM7V?Vp}iOr{1LAmFHl>^~1S4$><$WYF9(QoHFM^s?B z1yHC?8~ZlX4j4Ufi%t(lXUaH*MzKDTz9O8#6}91#^ktMTnNTyDepBn zfuRBxuZ9S^uE+4*LqJv*bI0W|&1)zsR3`v7L-ga>bLvlTAi(u-*{u1DDy7F`HSH}( zR5ud}4dxS2bg65Qc2EqIJkEqFpkOZ~NUsKr<}@|57*j|Q-haK4BOWV=va7|OmT(;? z1@WY(T@RN~s1BC6nD-h`Y*PPVas^Bo7)%`FJtSq9|3$wB#U}19*4BJhwJ!Y;G?HyWD3e3Io?D#E6R2>ZKS#vo0TP~iR6}Y58L|AT&H7(_8Tw-%5eZk zEd-d=`wxt0fMkth!PCnr6lAogbVWG4G5OMi%+xZZ$ipQF>gjUGfL%+ko7YfkKU(js zTtTBu1f<{U0rXpUHJn$vG`<0Zm-IbU`#`}$FFYmx91G4hE6KJ6Le$!zCqS|8V+Vx8 z=WI`Bwp(zxl6~a3-VQX`SpJ=K3Mv($f6(=A3?$fQZc9aV01DIKBj3zr;9@U#7uG<^ zo_Q&jWX#$apL;;Ha?PTz!6_!o3<(Y?ys@cCVP2C%OKAt%xotkz*)#1%YQnWfxs^`4 z&PV?v**4*%yY0#brgZ*Wo3uOmTpu%;CMBL`nl$04A2)&z19th!&NJc?MpsM=)OyyS zaI9H=F7;hOLs_&~O<(}w5htc9_ZmWS$JK4G!)C}MjtdRpH_)OSwou~1Yd+5fpRt5j zjf0Xe=BKIS1Olb%%@Nu-Akxllzb!b0>KdnyltZBaE;rC|WZpA{gvJgK#624@CCp## zmzlaD6ZSx}1(!^)1dYnR5xj;Jp9KRL17~?E=n68XeoRB@3BoA4J9V(z3hx~`$`xgvgX+aq(DXs9~)?)L=v5Q z%+L-gZ2FHM=ltunDgdw6%x2wZlLrL~J?%^5s6{OPaj&!B0 z2P3j~JJlG5Ja~%N2{r?TULUTp0Vp&$kooMD;}$8UZNs z)Bo(N%W0vgjm&($9uU^JYeJiu4)~6@OIHKt2oJn=U^0Idj~BtmB`BHK7ctgQ$XtJ1 zz-X7q#i`Q|k@qwyL#S&gaDB%XPdH}4<5tKUxX+7s>Oh(=s!G)xwLbN|1u1o=Kea)w z354X}a;#8i zl+-(4LPZYTb5uX!u%$U78oPpkMmQ7?=o8b!$u;39FMm5hs}H5TkfS5UH&DWL4iq$B zT#u#rGB4&pok~Gle4rH0KhvK~07=E-jBaupTwJKSuQ5z|AV&py=pzjw*%VtyHoCLr zkEQX82-NE&tw?DN=$Qkp8~qby?!2|eP} zfQ20Z{8?o*4G=xRo{S(q(TsF~e!*n3nFM}E&q)u`&5Wc#Sg|61n_=gBQB!NYs zto1-zakVTj^Fj)1bU%!h)cz2OsTmCp1ztmvlBIR zfDq6;dsD0!;BXSG`+vUCz6~a?)1`e+ZN7R!pdE%CaOtwe{rU z(IuTOmv~IN9MT=|Jf7r+5Fzc!gORHwp(yqO!96(I)E{TK(yj-RPU6lq@7q91N35AC zPb5gsHxN+{$7JN;*sUe)b2C_D$rcno=l{1RfLP+;h5lP*8;qsZ^G@xMnkS^ale+^2 zF&Axcy+C2>UqM1c9CIxHXxBj~b96Sc;?fhrMH9(C+Q~V8KRX+<}RDV+t|2{${&0eiA?NO1-76dl~Kl(ZWk&U15*dC(t z7@E4(7DA-c!ZF{$cWpu4KL zQ*?FHpGFee(T;XTE85YHcC@1%?Py0kAqx`|Q@?YbyjkDOweorH`@WfZ^5n_wWTxr9 zG<`R4zSl=VRLAKKl{D3?XQk-@Lx^Lw`3YL;*F3oxmr}^e$IAPbV&Ag0CY&hj)r(ij z1}cvmM(T8K>?Bk7%GeOAg`3w$q4!mu@p;f=-^QMfMR&+X74A##CWWF(E^x^m&Y2cd{`#Y z*SaZM!?aRSnGNB*Y%b5sTk~n%=bnb$$Oa&L$x_ZI<+KJ(%VAPS|F1@;>_AFSH>rRMSPCY7gDu?LjGbo?Db z2uE4R)Xd8ybnHrdR}MRllOMWULJV*+s;0OXvMTKcjvNk|oGn3lHVCQ8y8~5Jm6APS znm|L2%JJ#|j5zu*W6T1RFPm@p;;v60IK#<%_5fZ0$(~tI8)-@zP^lV}eCb7ZgDkmc z!R&6nW$v@KdXfOD^{J}c6CjOmey(;ff0$^foF6Sycp|Gcfw+cAY`p<57HtD5Hc2ky zvo&Apy6UAJOlrU{b+*PHm@+G04|0G--(G|ZVd>K8_Qx5J+}iJ+S)vP+s7%^d|LY2x z3e{beo9|4WXRc)K^2SFqifZ4(_y23mg?YjZnb+Bt&As5Hs68oGe(V%|eN$KRtO3bM z_mVfjj75f7>bJMBsh+OJlFA)8sVqmvnj7E4MCUA*m&^|GO5JSHs|AbsiVf&pU(YaN zb_##DnHTd53mYF~^D1Ac9nS3yasv}oo$4aOJ4l9l8Sx=cvXnIkOaS?>Pu73QTlTcq z#o;SIEtBcqgPK`bL!|U6zPz-j#|AQ`l!-3JWeX*-b^&+Vcjk8_zZ!@=L?jf)*Ms>$ z-NRBo+4(>2| zl!NKJtXTQ9EIY1&pWbefb<_eaYvFkUp zjyHM5Tvof^+<}o?Kn1EEa+yUaB6FTV zgu3!&nYEvDJk}bbvepZL!u7U!Wd*u5A7zkh*=7eJ&CbTk{Nf%Q6{^#)vb=tPD=o-j zzkp0srR@rl_dz8_O)v5{EA**W53ayS{f!Nd{=Y%w`?c4*{Bh(QCy90CdNl};6wSNa zo`OqEZ^||1W#W(h#g*eu5Np&e#TqzNYS~O~pc1D^&M|GUg&X^xzS`S1cFvbT{85?h zarhpPc|ApQ=X^8y5KJWdvsg|Vlh=-VOH&3!z41TR?P1PO71cdwFRUodk$d- zl=N|n^IFi4MqDy;H9u0GMV7C6Hu5Hq z=;4Ux4wwrH{EK>5GBKc2otNu-PZ%oBG7*AsskqAl)a#W$xqQ|VXP-s0hVg$L^6UQ% zNTR<_DQ+QVoV^O)inN1XRJj`z5eDC^@aEol5UR{6Nd`bbqK} zTThyzs!9<@FLrLgEwCfZ zzR*^S5KcON=K_=9T7czE8nzLTORKhke91n_c7EKMO!u-SW)CB7_NP2i51{0QK^Vu| zEpXK-@bFoJU$h1ndCyXNyDviOnsI$+;Rcj_2ZOO&l;92~V`Vr8G!J=1yyZb8*KmZQ zzta;%h>;ZQ)XCk-&&w*^JE_*dDh4f{oDYx*L!cy@j>=A6d;4CQdvXsLm+AI3`@0%aQP_u^Lma6r>uR&ac<>vdfLK}#b=$jfPK_bt^;X6Qdz4lag zQSIe_J_ywIAg{WYp%ifeC3hT`y0V_-VP16lju$|5{L{B`gTI3Fad&6eq93)-^PKX` z3fuuh>@|%>?GX}ceXW|#Cup?1==BL44N zDzn|Gtna1A&q5w7a&`+(B#){3; zHoLRuR*ozJ5bgp< z5fWRkYMsq5agZoSf^yEhKuUxGlIouDn~%ab!V=WOZsiN*W2U&+ZiS2ym$8SOJO%vR zjn~V3Hv1Bk^Y1#2?hdR8$CUl3mTf@3LfmTWR{k>kUNb&BP&Bq1%)R{Cu8gc2H1xC9 z-KBSd`8`wB(?64{Z5YXIzETU3@_seH!b-Q}@C_!)V`ZaO3GXIRYuSj0e8J5eg`Vc$ zHG`@YzUvCI1}F6#<@?n74NO%TcA`4%Tl0ZA4}gh&8e&!LA)!5- zqU#IE5AuO(j>}(u5vVEdSt!s#`_lGg<)!wLQ4v2iHp(al%8-Mj&nm-)1w5P(#+ zzVfxJI=s}2>}P($B&u>|Z-e$Ssm$?;edSN}cNpV%x2(bRfGqsE0i+PBKaAKyBo>EV zq|b+`I$c};_Jm^qoU)*qIhaq^dC!%~#bm5)^&H+ACdy?{L>TiEk}Se2IMV(;qkDsk z;w~8GJ4k}R^j3w3{8e5h3hxO@YBQ-rDZ*lYt76_ekN zfk6y@HJUq6k`_m(%HJL?O4kP}r>G8)L}a%L0SHG#E?cTm51b^n%aG4rU{Y+xedEqQ zlq_o9s?R^ako#fo;8IvlWqq1epx(=;_WuUQ6i)etF1N5?eK033ul$w$fx7aw%6ttR zDXP^OncBc4MNWjvRLT}=nfK$LbK})czEgwkRt7M-Q+L?;@WCW%Mb3C`Fw6Es=?6ReE!F8b)Ue)d}jC}d**3AQ|os00UrAN z#;l_PS^29X4dUy4ykHXXp=TsF;K;%u0XbQA3ng8jtJ}6CoJ^Zys>8(g5K)MpN;j8R zlE@qFm9$9s$C}tX6HaQ(f%Gc*1txN_gf(XFuO_#gza$SFJvGZw1IarmiT=fL4zJV*?nUVV z@CGW`?n7QPUB8JWC^vF3B*H$*i=qwFHYhnYO-zVDGF>-( zRA#ti-JcKjc=`Y&hB{)aC$J9ZOJiO!9q~L>U+!zOPvPXRoEO`n2@?VIElcO~0UFbr zA1^RcZ(@QpE`g{3hsxfnGoN?nSu@v|2&_kVCBqw3GNe%J?9E=Qy2Vz%PWyZ(C=QFW zq<0UbfF1Q&1stqI81i7aTj& zE0kN9VL;3G+GjkrjTB3t@<%Qc(2w|4n*lz&2V~qfBp|wojyP$na22C zfQOi+29Mt1Vp8SX)#Hj&R3c_64Cu~+B|LMf=Wt})uDC7M3tZw-)V*2f5;F_`mwxC9 zl^oR4w$i$uUmB}@r6KdZ%rm2I=VQwTjh^H&sxZAx?@?8>W3A``l#Qd0Bd3qB0(oD% zBcFsR+)oA)&k6og<77edKl!2W_I?ejnP}2o`D>PGx1+cE&#Qacps(JH>K?4_4idmyTXZ7 zZ_C1K8Dx|%^2VtT>_aJ@MRi9?IWS!hvKRjllo(fPYp;aWm)uBv3iu8SQTI)qK}nfe zkQ$MlPoj31lhZ|>KVkr&%lWrOhf3)cB6`12VI`%E!ZlLlKbF7Ijj&Qvb$@8D^;_nY*s2nF^F|ey#G12?N5szl{sngjSc!e3c{>#J zxw`VT&y&jsRFxWr`2HzkNSR^n6r7>t@5C|H3LcQyaUSvol)AkK>uWfQI~YB>X8NUw`{86DntUp`s_{^ZCCzOx7wr zCOLETv@-{)^ifg3GI^FuyXSJ14^{WTGuIG_+4PZ(csFn|VGuMf8NJOjiv9ynLxm5= zfMjW`2WR)l)D7pyPy;X-a1KG6;75$?f9;m}2@ow^cQOS%dCs3X`TYgRr#{yOTTkbJ$F@QFPs$j zkdAD&^MRbEND1ZxCGzUb?oMWTRN5X!3QWya>%Sj#oZ0IGD9UMhfrt)+YkyRm?FBr_ zyzH6;oMNIDV^nxH30~v1zH>g`={+spxxhpZj0*IcQJ1JuE8f&*&MTPAY+;m&Yd{=k zdVj9T4%R>a<}Bx1p!3t(d>3RmTN5ewLE6WvMh{RTp?~c3BRs@O8%x+v-kRZ7NdiEtHkGP#+j)QJE$mQA33xOrbuek>$7`^IE2OO{8^t^03~G2jMq=+qc*IXTsni36%mvtlu}1Ye|EYF*O%J! z>Jo^Kw=-OGS9zc&qm?$0s+(#VC)hk8Ec#nOVr=xN2ILNsUA`bABln1;rv1fgdgvjq zn-esP@dzZNafN+?6QPSjm!8VcNO9C4lWbm~k-_(0DXnNRKv#ggtOmLIO(a5ouY?RT)z(glr0RsD1U}+!IX9HhuHz8LVt$# z?gIJH2|sS-9$e#_?$YeTF}7uDyT2>#2Us~}I;&%K2#q@No~yVTu)st-Ig_qui%;{? z569e;l|Ll2A8R?`%jo^*0cVwztHlMBlr6rNPxB=lB}m&)b?<6E<6KQDcMV64%R^E^ zZx9jbr{!%(xAQxF5>YGwj+hVY{-S$fdF$>JEr6t==B4|putzw3^PGj$b3TkfKJc*? zXP$*44mPmQxOkb26^TA6@fDMF#luhjR?a)#aY5!Pq|83P&k5^mpk&M=#@68>ayz%G zpSF?LYo$>h?t3+j)a6ttznt4bfS9tDgTPYWYgO_7E*3^VIwLh|`3fjZx7z3fE9t*!irRXN ztgIruTvJCqGra~!5Y|w7;Jl7W>Pj$K#ooZBTA$R~$|fjjb3{P?X*=XD-8u9N+n z`L|(m*Y8^l+5GW8uH^INvF2P|2dWImQg`0V6+qoiJYvYKkK%tyxG!XQ7EWdPjWJhQ zUh-<6#*pi#UooQmfpK2>n>(vVC#e>zU|z1{qoraE5u0Lty)Rr;66`%%typ+|GyEwWoT_cc{u(T`7E@Fn2&ry>KGzW>%*UQIZ}f z6FlYFzVQB8cyw^y(?*FN$!01%w!o|6dB9t$E^58*$t0koOa6Vk!>b9PX zN##6S@1LTi+1ZkcIj<*!=YZ_;5-MwO0S(=AzGxhyFL9$Vy`rQiu2pY^4a;bbc~{n; zl$w86({2G@ugP=~-{nnR3R37fKa z4G04drtB^6)HKEmLMpy~D-pc{iN~BtcluZU_I}+8mlKqgIVjq(dA8Td)UC6vWBRmOC7<4yTusMherpp))_u~#mauf__j`C7Ajvt}*@2gnWloqJ8id%DTa1HzRcTi>XNOigb%AncP*@pNvihT-vnveN;htyB*9EZ&3N4*5^Z;# z?&trkXDQ$VLLStMczr*@QHc0Bs7X&KBK^_QtAfw-bxu}yXjHwxK;BB6|0}Msr|SA)2qip$N3D==QZkMDn08 zdM$K;i5f_cT%W>2nO$ku5E)HL$ z=e+a2Z|HpiM6$g5^L2irwxagj`O3vI8GYE;DoD=kPz_+$porCsTVDsWjv2+Q=J%fZ zjY-vFLT7XnwN#**1MY>cEhORf1w7lrBw+zXuWV;N{o&~Aax!!msadaDxhEJ&+tFLi z#y&#X*4Y$21$h7u<2W{Ac4se7(R+>pYRm#A3+@bGt9yJ3`l`cfMi=@Fkwh=AEhB+* zBzcqzg9iEn67k$Ub_qv5Z%&f)E6iv*YTBp=GS@f}$8fU^^O0Y2CcJ2F^N6%A+V2Y# zPp@BAIot=EV@v&hfJ=P6iTe?#YW79NcuIItr@KBU+?%&vgk__bU|9sn=jSmED}U#* zP??_`&%Qb-=Yp4;3}rz1Mt8-T1IRa|xb_q=2j5><`g?4mB8xiWJZlkK`G^y5%X45m zA8`a;yli%0S?MA2ZvGR2K93r@2j^?n!RrkIa2!iq}mH%W#9~dAG`#jqE6^23x0(d zjoPsr%0{_e=bb-c(S#lV{V_9oxAXnc1-Z+gd`p#n!Y6g-{6oUMpYBmu5jYjMycZNc zoQxVl)#fZZfTEyTiRrsBUN8|O&)b@1qh7DLbd=lEu*S1me}Am}UCS@_%^{@)tEge8 zmMi~M<2!j;D<54e*O3{i){dBkl ztG0tsL_G_ozX*+jtx5N4a`sTmg4GAlw7@=+DXCqt2ZFxt%p!V`a35Q}5YDH11+hML ziV_$7E1!w8pmBHTIh2Iey~9#^k-tZ;;Sw5VlMyIebcMw!YD$QnLe!R3G~TwC=tPP zP#G|3c*&zwhPMRW023c4-XoKoD}Q(Ez0EwO4ZtX&S`eZMh~(b5jBZ62@{arMs* zgp)8jZr?{+hWvk0m%%TDeOyiI`2c80`D^#c&JYox&a!3!%-3=>NX2%6OW~O)C>O0F zFOeyo5?EUXOmbO6<4g^rq|B{B(fF! zaXwH3jp_tFVIrn@uEwkS8C#X2P7ryS@6-|W-F?L5qh@Lp|KGDrpOqStUxmrd8K2eV zS%akZhdy?2eLngwJw`Y}wVqZhUYiIJI~z#h=S>@D{ILxr&Yw!!J798J8e8M+T}*^% zk~CLu?9F%HR(bD(kz-v=p#1Z9ExnfS;~`kqyN@?m0Qs5|-ZQc~MX?!w@W)hrL8*LG z`;E`@S}C>y2TC8!^{XhCP>g7|vfmf(uJS(DBz?rZQm5^BF%Om4LWhpIJr2gdD55p4P)CQ1DRJu z1k9zlJ?&Ti-hSNStcqAz{94c2SPu-j>gDPOz>2EQW2!w38{iNnXRIFVZ03nFq+QTp zGVntVY1^KJgHp>5>>#404BByZ&uX1{cd}{CvO=^s*;=r1v|NvW_OT_g+#CjcZxhBu z#Ni~5my3pM0FsP;>QSO_l4*vwdZ}mmv(KG67nW(0O4d0j0!R_60V9G-C?e1dXKmyP zm8Q^+>8v|cH}4uPK2s`mAz{1R)%k?m3-nGnRpCpWYj6*yX!{1T9+^Gl@%Obj`Y4>3 z%3EeSmpo0nrgxovms#9rbeC*xi+cg&qPD>)^j9dkFnJ%R^p(G_lj{mZIw2f2u=G|H zAEvtRG)U`$N%2Xgcq4D=;HmJ!;yX@D4@AN&NlglGPr^-y)iCWK@}=(Y?1B^?GzT)pkyhl?R&%VjFPaK2}|t>T^!=h78zJ|>yg-?24BqRvneIIC{2 zS`B^AalXk(N7>JhltNaiUgqD=rR-P2L)o3%AkORgxC^5CVAY#E-e)u33MYPDrVJ?V z^0z}M6~1uN((^8MeCZJ8e@Ejed|~4K!F(wr04O0(##iqd9(@ab=X{TA%I9U0b#BgZ za(G21nz$*}EaJ-Fzs!isWO~*B5lR0vW=7WXh^5al3$czE(lmtVuD}LrB*F^jm^|J@ zN@?{Q+Z|`?`+H0Ozim|ItljN*0(?=PcY&cJwFZ?_8BkpYs>7LZ|=8mwW*A*6$ zzMFSt#2lJ$?ag=l09w!QW0HcpuBi3zU{ZPoS!6pzrA#z%*)50!m*)*_gZX+lc)dKigG)Y#Ck}l{7Iv8CP==y=j$Ah^z!$szS{+uq}Q=D zAKS(1GLMMVY!8zG%6fT8NYOPU6V-&52Lr`Gi0D^Stha(9QW9Rf%WI|h&^;7&cFvp6 zJWRa!R_EeA!X;64qQ-nrU?frf!>SL@m?X4%fz_sHJ8+^twMOq({z57;Tlc>9XWZd%h+q{s4}EAHs6?^_^J1*Zsu`9;W3`|g{}FB5xm#8 z;q-iRWdGiB@NLVK8*IlaN|NZ<&Qo1k((`2LD*{3W47%=#bs zbE}a0cDa)yXA|Tugz=)v%xnZC%V$7zGE2$Qe2$t1;zKv;lmeL)o2;t)N-q&g@5kfA zFTHK&3Qi7AYAaR^D=;xX3{8Kh+Uy%_l}fFjQjhsoyZej%4il}Yiv?&2ObhBB!0OdL zmPqz1m zS1N5l$&BMjEr#iEU?Q+STDb>{Ta)|s{o6VTg_yT-WsB7vT0jwED7!+e*ff4 zirv$wdw2!)<8h6__XBjzXv+a0S-m*QctFXCLpcq+7VsGFI-!Ry_PYHkHiE3}k-m6_ zNN(yPg0jHpd8DiWqhH#Ah-jU`q4^Z`B`P_tz18{j6;AFfE>aI+i8hJB(%b-v)1x)7 zc8d_F=BoSxO9AgN`JUpgb1O;x9w&Rcz$Lo}h=S3`_JVi>^Gz;LwR-auqEGkiUf^?H zSl)B}GG8#uQY-bZ2uWm!fR%rEsTC&k*$gl_uG3MwJl9Zi_fvHl*M+658W|GY2$GR< z0NR8i&pr`Y)3Alf$LdzKt{&TYMJrhu-GO8+4zUygj0%pc!D>=x4_T@GQdczW1H(*o zvwQ##0kigmbvOz2vHEKZMC9AYf%ieor?^oC<1|odIYY`??>?XkP%=}))EdrTV4_vy z?6%I&CANyb{Q9qirBrW{q^<#<(_O00`wb+c8X$cdY5w{- zPOESP>tLa3de5Uy@8R8dCP#DY@($4xP8=wOCoUw-G)XU`i7t# zSbE-S=WFD*o4>~#0WE+dv7V+E|2{?vzn=ZFi&xfMjZdtIa=$hG=V=q%6P%D%#U?8lSR3HE)?rEn!_j$z07!PzG4$XniZq z4H)6}$}@dHBzIg0dTZkyCh?X{>)LjYl4R3067c}k7ss6yk6_X?I{H=KPe?hbyPQb$ zIj@tK1il3PbX>ak3M4tTR!Oug|7a(lzelup70%b=tJ;}ZL$-)Ks zyx|XNsIH;PxCxDno3Itr79wTlev)w`z&4J~EZ(soI|(0uRQGpSnN0 zmErwKT6XyK?pG#3Tai!nCg#I@!qJEt`vs81*kstw%9E&FiBfll7{&7bv^{*zG0S3M z(pnu*n(jl-Ukb|Xs4-VizF&)}{=S|s>yy0i*|Zx}6os!BMY)9&W8eGcr8`VAI?PUG z^d2dqvXan4(6>+RwEH+eLD$>bJYkaPIv#`@G-W`|Yg$d=mc=e5dBKx?cB=`;LVPMC zvvSq&$jW04N>;(-&PLjrKU_nMnxf%oBeaehLx{f8$jWf_vNrOn)@XD?HxXi$2pS=r zSldH$dw!x6tIsE_$?KiG;W!X_kvD98?(KVf;3%}Z7udu52AHQ0>GvrWUO)W8;PeNCw)>bYMO0?H^E5b`S8s5E?7Qhvb z%DmNMzZMi{PnKyWf}Ac}!{b{h%JEAMZ-s}1-+VRrDn-Z(?NdMzAhI(vqJN<$36^H+ zl{^WOa8G$Z3(GTCBpG@ElA*Dy;dK(f#FcO5AL~~?)YZMK3HL7HHQ}Vr5SAp?;bc^c zGh78WCim8v8i&0{YZE7tIv?#9umwuZv0F)W69;Hl#~q+VQ=Yy`8A^0D>4s$ia|;%Gm)T(yF5ed#dOVl_KqSI5#0LkYlQ;Z*Pfn+$mP%RW+*Eq3BdKBQ+mz zKAG$sRq&X6yp5fURy=lzl)Dc-a<~fkxi9Ly2BPPZr{uv=4q6Q2yhS8R!sxZYJ4_^C zL9WU^U!aHW`|%K18;Glu_XsBYedA`CvnSLjA(K`1;^3Zf5|~?WU-Aikf}SgU4eZ@A z9$opzm+>oLcon8o&T3crg?+}98%wc1AAH*mz#A}GR*zS~ZO(sdjHJTD1X0V~rP149 z1W|{#_E>hv09U8}nNEC0k&2HXKlyH*kTWa)XybPXecA51o_3zEAd zr`dX8Wz|lnV@78)fB)3hUU(Gldk(R_vz>oC+8tp@>JzhB0EnXGRxL#iMho7!9j7-K z?xT_|b1q}o%faLvZE9MnGIxj@1H@W_sw*E$53p$y=dO*uV&2z)l+4e^X0Vu!OEBtFTYp4zg-E*flkEOdg4ZZ97`yxb2B_MSxt81cbWXgK z`nyT>xl;o*)^o?_nw9x6h*9pKlxy9ispb zqpR;k_J@;DHOpi!5b|9Xu^RW&`2qI<`ui+@*Kr@E9Oq#4uTRbIgC{R=OC6~t%c#{Hgt;)@rvK&LDs#;0#U=p@_m|Iu>@AgFL>hQK#UD@o*d`eunb(-+Ewn4n(=W@P5!) zfTPuQWw-QQA`-)g(UG~rjGEVH!+7f&rEuO5{EeWmYY>&x?R-=@ed=>`?oiTylmig% zp%LA0R>+k9Bn<7#DdiENEbPcv@WP`X>|m+7q`iF^vB?6{UqwGZVfwH{{EA2xZ0zh+ z>6L$GDfBv!&oy<@x;nYOjH{Jy4Hwye{bnSz4odC!qX{<<5rv7;bl^89#iu4cr!AB$ z8}Rq0M<~g4I{-i6or(YCw{pGOg_FsuiDnHl(TpYGUnu*&APLw0cj^$ZI4oO#2qd|E z@$&*qOxbI83QNbh^ZYD`l2y$)UY4I-L$^7j*Q*f z(MQ1CgvnNqmgX%cV{MBd<19R>zC$Zvr}MA{!s4*%>yHN@Rm13ZSm($24L$twhRYJp z(|pug{_rVz#zl}$Zm`QJ?ZyjMY@A^zhHwN#31n*JdZ~@Ht2~KTL)0X?-X1ZnA*8SR zt<+&Y?o83{Hy}oBtfG^J%{#ULI@@Teaxf5uNZ2$TWe^@k*yH8B ziR=0&x&Wu?X3z8qyZd&K4}7ln{!mzgEgaWGb3o0~Xe9vJP**gH=nSGd%boq_pm@tg zzmJN!2zDk&u8X}4wiTHmsmm3doG;V|-@hQEGXM|+ern5e9;ss8!n4?zWS4`D`TG3YE z|Ljt{yfNWbI3N6Q^r=*EZ4y8KdQ_9G>v>~LVQLTHk?}VY4Hz3k_%@`dl+aJHin|F8S?6C)_ApYXAzj8tT3AQfb$|u;jX%YAE*sf}kyZsFH_( zthQ&8vJbMLRn_}6ZxUo(ZBylsNS~p{;yB z0Ms3YNm-+-BidL9R8*N+_SVgOgixfxr*BaaLRs#3W8F=zSC6`He~(Me?vA&eCiTGw zY$C`K{v%A$4E_40Conm&R@?UIc`~l=X*ZJ>OmdPrit0U}QV3R;mp5Ro{BtemkCI7g zfPA%ORso zj3hdpQjcJlLbv5c|!SY79`u& zJY(e6nmv)d0Mej3*VkVm;-A+q|MN>x7%p~KViKQ^jSVRTBE4D{Yw?;ks!H`*_lGYj zr?vtc^Tj&aTe3D0Be(D0@Svj_nB>-nMwfOQSGx1HC!+!!X`xVd0cFq*EPMGQju!hs ziq93UWlIha(R0=D!*IJgL`t4o*7w&fz{v6E-UEILCoQagH6&E+43`g^+U{vcM6zJ@ zvy0;bqc8J8te1jPUe#D5GGDcZQ)#-ML`?|D{0$-z@&`PL9l6EL{?fcDp<0?HB5BSE(5k*gzJ;4tAIE?X-#=;E=V8UuE9PCr!d}@jk}z%+x;srDbd9!FNAAElD^zsi@ZUm zWtHQ;kv=_5w4yVPt*s?@(5U`>1;PEq<_`V;5M1t&efrZQ9I=(lQ4KMkFe$963cdRI zj2gP1d#-)x1t)jM)ggXO*eFIf<6rC&GcfhgDv)e5CiEcCDdQ%5=m40w`$!wlZ(z!d zmN({R9#-^U>yq&;P{z2={nR?O4JS=UbplZv?H%0Evn~!_-2MKTg;2&?!SCV3@>6Yl z*cT?fkLg+;K#|_3H+9_mA)JDt0@dqXAn2pm6!7VM`UhK8nS=SXQyR-Qo+Bv3sdz2~ zM_w1kWxjp+ov9TA%oQfdw~eQ)CPhGvI#KtQ_ZaYIa@|*t65uLJFp^m3%j|-wR+go{j^WhaBy$c^-xalw%-6HwZaL+7FhA(vGWro{ zcG8uUv}1nC(ZI`Q?-WU7Z>w)|CQRFULX>(0nQJKXcL7BP9MKfOB`obVFfQYC1uhln zeV_DtjY+B1oLpveZg3*gkn24p?1mLOA?#z;0Ma&q`IuerY8bnK8M=2|b;0|O^Px9C zXC>(gszB?!@|MK!*mrRurT1CvkvC+zKk6+7k zCJV>X#5y90Vp{5Ob((>a1a7h{gs>!(;}?AZtwJG&!KB~#|oo6_? z`j#VCUJ$8ad~R*EOBWJZHnuMP(m>oasz%>1eH9X!aYBzT*1)8NPe?Tp6T5&?t>1uC zY`w#Trfp89CMwFjZ((FkudNQ|2rvW@^$w6S>C*&u1K+uNO zswM}B#BchdtJ2|o>)rb?QMo`x;&D5zM4jS>=`mF1gjS+BLlW(~>g%2hSAF77#TEqA zTn-%oA`@rxQtc~PE{=5p$^rEntm3cM>}LEm?q|CNsbe+0cX_IB#JSH?I^yP0ls8y0 zw{yzl0R6eAfZw%a?K6<9>*cktz{_O%tS=dQ#U$rU+}GdQ!C>WISvDlv-`arETD6Lm ziA=I+55S10w?EguCWVirH~g(?=(zz!3=&E;Hs>pSw`f#t3n?#kJbGk2Vg$ksQMtRUa0!xnLqa1@b*+ zs~uJHG|5%dsI=kaS9@5y<15mOJS2wd>>&>6n0B`BxiKimX?INyta*6rx=VQOi_m#@7+iq_JK~yEY~_e6MhuI7zg3zdABo z`AmDj_FUd}{^pQVUhF$i%B;p%v|wTtwVD7}qRqdxK=;9+@yEYrc?v_C-+FTI5J)Dz zxS@}}E)Y=~$5Z8&^HY?{*&U>_yll@(DL{ZkQsoPvGLdVOFQH*7`tp|U4P52<&%I!N z4M-4Jd{82e9-Uyc_Wo!5? z6n(Ecsu9SOZ{O(7+zZPEKQMB60HRRpqk2CfcMKIDj1a2cne>#7XIzxjfpp8h_X{%p zx?H$yAWy_7sT7f3EKR85kM8ecwygaAc5qOGW}C z2i!ZgU}dT=5lL{(vgQP*D_jzsL;70~*Lm>wPKl%_xGMfM5&`yPE7?0>=t3Endytqb z`&8iplFZr=)k|TIaC))T3%5(fVb?X|^MsdeV+*R1FI2-R-!D)Kg5s3x^eZAsW`y4l z)4TF-EM=?<^aCyI-GHmuxT4g~yWe*sSo4qgoTNxugcHN(qtj|vHxQ9!ZKSD<%A1%` zSeX?_cW(>r5r* z14L5wE06oCCx>}NkHhZE8geZ1$Oollg`n)w>3pfKmZ@jXFj0KDjZeinDiW<3k6MBe z-33zG3`^zz5-2nA&SDCi5>y-MH1pR`(#0J}Lq=CHiK6C4TPnACXq@0xlR0-`7T@M= zN%xpY)2WAUCaU4v15U)X&MFy?P?Vl^rE5?5Mzw3TIwon)7(LiuNarQ5)^S~>H9_Ss z=R+(1=2Ez_bX=1S2w%3YXlE}ST!Sn3y+pU3r=9c1Y`>HCjSw}RnL8#o5u*JQe}OKA zpcIcgQMaL_n7(WBci`l4@#eQNZr;U3{L^7d*8&+e(*A$?`OqpUaqMG9Wr(UCd?&a- zDUCzK(A^_j8!!uuB=q`BRkc&NYH|zW42aV8aa!kKzOQ%R3tEU2rJZQG+4T~Y7@YLk zDs?p(3)kHvxSp?-b5GB++@QLKD~#Lu0#Eur#rST%;BwrY)pw7IsQLuJ-pBla8%Gv!gOi%RZ4|SQk(=6m=<^4V^Ldtl zRhUmWlquI8zfLN5no8QrZAWOJ$TSEi!nPD^U{d{?HncLjj!C3UjmxE~1(}rnK+(ZX zRAi-3RB7EpkWYJqRcCh_&WC%8RTVyFX)Nlx^qQ{Ro!l>c=hJxZ;gSGNOQtdQCsn(3 z+n^tylxl72WF~;pd1sd>w#9t9`vqmgPf=Nws~&yLV6nhe7H9d4`D&jz2gs&Qnys?A z`2HR15?!R1sA$0_H`~wThZ{@eL|fMR!Xj&IsamvdjB5YYcWb zSzB;a?i{vvk8L%tK8SD%u$|T(!DV&TNvX^}p_uPO2KrgB4F7Q{*9(xu>w{@;Ug0GE z?Kl)<<=-Avr4(TmAk@Ckg`t4FTswN(yL25+Y1*JVwr6kPB-7n7?1OL=WHhr|lc1N? z1K36+<>lYiQ!_jH02e2!AuONW8ztAh`Fh>7;`Mz*Qg1(S-++D)rj2m=DTkP}MI_I$ zx*PAsB$Xla+4~hvk&@nLSe+%@2e+OJN7{=TFD@Z${ifBon#NzeIEEwUDZ<92RKq=B2CSZ#3L%XtUZG| z#}(-lGJQP5$nN)b3ZK!jNp*Sk51DrLjTd}rSWUmZCfs)#to%Ez5$0$-v>M=$H5Gmh zh-BPK$!vf!V%0Y@>Kky1N929%+va@XT`9;G7+I<2b93J|wwv?H>kdG0W9fSrifA-S zHGjB=NED88@1yqi^9Wgud6WZ$Y&q(mkr4Lj__KDT*oXO6c6_#(xkkzsrD=~TD7h%-h$_$9$@EMU z<_WYBeNq618;3zrc*_ zmkU%~jCwgKQ$)F^=?aykvsYtb>>8IETZ46@FzKm@s~ddcsAuc^;@ioz2_&mKjPlhm z*;83B-6NLrt-}afz7I%wG43c6_Xw#bA9%%47$71_FVFcJmEULd(4)SB@db_^IWV(k zD9Mxiu-B$N}PRIWq$yw*}SHsE~zqgL1Bn|WXy zjlh<`OX-@AscdXds`eOnK6g-&?)MJlE^egU?l7O;oA22n!)HqL`;+3*6L0W?`KjuU zeCgi&;pA$`tkyquMJ87ZshLze#m(+OtzoAlf*s1Ju*x)dcj6pPN;L_u^#Q29HOHQ^ zxdfAIb3#-@Qm;_tsD-WRifbrxeDkils5kky&IV!iZs)(mL+W?1sNTMLTZ$4;mQI@k zL`b}K%HQx25#8e{*S7cxqiz_4(+**Y_1TLE=@E5!f^vF@m^@#79i zMvia{xK>5Nfw7n zubu{cYrm2+K-3!I>gcf=aHQ%!&|k`f;$kw!KkZO;iAi#X-<&c>iTZb?_1D5BgU{tr zb`!w;mMy*o(p(?b!1@l1oa=-GDZ58VOmE?@Vtas-7)G;?w|m4y>dS*Wd#VeY1o2I! zxjyxbiji@4TOF@mM*IaCT{QnHaj(JI1?Q!e|Dbd6V|7edh2`R7{IG$MQfK~ob3H^I zLlCcw4a889_UI_J+RP_Rl6hkbNZb}-)spRefx#Pl)^{f1ko(#&gD4?L)4a$uy(~=!mbDoU3XcVj>j|Z>@adSb2Hh_eTR~Y z-VbpvEEPweO1KX|)k`&%FPy9X!s_or%H_XPHFWY#_017Q(-bNtrtU0HgVC;If% zMZ(`csvxJr;;MOKeeg3#GH5uFodc15)9uwWr&q%NG0kteQ^9q^Rc$_2`yZQGxLYPkZR{f(S23eDeXVmqoiRW; zkX2Q>!?_;t%TZO74S;aR!F-!ga;QDyv`rT#<7(ON32$R&1d6|xA$O1>_`!gajR>eE zT-}EG0q%NjxArlSa+$^G;Q=lmt@gS!mU12jRq4jhU_U8BIK17oL zg?+8o`g}5rY7l+&0yBz?J5MM65|spfKkVuXH;U1rgLRb$F1rFQ-R*)tH*0coG(cGi9^_&jB{v z(V`cC&T|K=%0xwA`g-E8mH((;QgK{CybAal14|u*hv{g43b2w)^fIs!#CNSew+W?6 zabi)NTNCFt=k4H=-`MinEV2Wq2z1DMR(%%}p()m^ic~YYhnCKT3cfE`%A<3rzYw4= z@1a9L$}C})C0;;DOWkV5Fifl^mew;kLV!u`Tj%do zjk?05fb@FoVKg&%jmmdxNyj*zV%^}PAIAJg#-~NzA|uysP45CtsB=74^npBRtS@LQ zK)L?VThAT?)KC4-Ctzt+#;oG=q~i8K8ZW5IiN&$r;0Gp=&eI$V?JNKBGCgm;)UhC7 z}P~U2?e&s*u@UKVGu`28ypd*skKD4Yf0F!L=KAs*9 zJmAEuYR~}T2=D#J8X7#o`J^Sunq_~U5AX{o-4{3?tfL^uMjc!-o-NoJf3N(fo$xn( zBHk*H@7DHP5w9VVRA)9=m1G()qk-;~@lNXoGU=0Ny0AIF^WMgnvJQEm_DQq_-wyuK zfWQm_?SS*udVaZDj!c5POvZ3@539ZSg(~k061oj7Y|a5tjhEey>hNLyQaUxj1fpw} zX6OV+vR3+;f)TE($mSOM-3bTBZB~lvMgbzZw*e4jO{tTOVv6mMO6lC z6xBvPQpe&deV}qt^ID1?l0{aeR}{CwsFgQ5(!^o>ENM6T?vZs(ns2Hl5pig zS)^4&bsmKHFA$0U*+OkH07lpOendoBkSvZq3%ZB_*e@R)kWOQpLP;Db80KV*Cl>CPZATCFHPYNEW<}N zo6)n#yCHHTNwpGth-9}%`pmGh$5n1>U$ElOEA>rDli)){sv}>H_NiE4VnbMMGcjXw zFrVS%UJdTq{ETj0S(WqO`wVXbk6d6<8tlx^@7sBDQhnBVJ#~euTKGFDy%r`WeIjDH z0aErSwe9E@jB0+!*&TP2;H_L8#CyatBRTE$0pr`ppMCC8IG@uHs_O+YjbPn5E4FIHIaa zv{JQ0ObV;Es2Z6w1}JHj9?3feBEAo0D9!@?)z>ec2lhF7WPJe+3*QrzDnBNlKYsh8 zafdHnp%hNfEnLqR%mVjKYd4tGZShwXx}9IB4YEBMcZZ6yn;$$(D{-I4Oly}*q{&?k zKWeOG+JM-*+u*m0@pFu?an&;v?f6}H5?|m*+{{z?biX2!LpsnKbyohX zB~z}cr9`V3FXaG}>fo#)B7`1}m3dl65kjwxYzRxjw`_ddgpvfSGV1;ooJv`FTaKf) zvBNjFj9}H&VH$S)BX`m6T_7>nf>+nhy-DTD6D6`gDN6+r=fR}(daG+nQ&36BIIE>K zV1cdZ+CSkGh_d`po9xdf{G~cx=U{1irWr4wh*np)#CC}&k(6uPm2ef22Ag%5Px7xT z%XNbwqPnT4tH~|AG=9|sEVa4ACAD>K7mM85e2>*K{-gI|*a#>^ zU}na*o+oyihyP!KN#>-F!F+{f#SmhY!ODN#gd3xrd0EwZ6|t;kDp~t2OsYK8+nlVU zhW$M~bUMaZXT(O0YwN9hVv3gLO}q@$9S5`|;9F0wZv!L!Z%zns2abI7^p2~O$#mdi zx>PF-+Q)81kfVrsNU-;vtbJ_hXO6l&fKqPiYaJwX7@|&wu?)<_E)eQS+yVOwrJgbPvq1GQxPR8&N>w8*jwPhVh zRcjU82vm@L5BDaR)KrI}v$Tau8T8C)bKAJ8a&;rgPQrRhltWldrlzEI4_H=erMnMC zrfwq6m53N(=UJhnQQCfpNbDA6I9jUE3siE{w`nkSRgq7z;xX^%kuxCi9KS7V0TP&| zugdKrf7W7qpEC=lq&dIVm#z@W9dV2!H^O4_r-yIn!wkr{+xRXIzp1Ku zpRhJw$>+r!yZi_Y1&jaC6C&CAP<#GZ;j;J7dEd;7a!mFkhKPx8bv-f|5RDW~5~%+78KcMa+LVYi@^7LoV)!P}F` zwnZh6N!(hJH^V&ON;9f6z9%f%Z~Lr(eIPaCm!r3`aR3jId~*d>I(6jue(qm8qGU)$y29Ectq*4DiXFe27REg|$LFR{zzk2-vIn2Ia3WdBlQ zz-wVKl*^Rz2S~-?7B!V|3nkIcM5>)?cL=$9^KZ;)6j0gNjg_en|MNws@euRl$y&y26ye}+=5 zChFJ);VRZ{nJ)k$H#t`A_9c{3>T>q^tI1RsxT@VXCTXZKPmknoP*Kmixhl89-QvXJ zXg`K&rUdR+Y%9GECRyLUs;Sffk&GVIi7h3@ zO;j@aPECt<@)j=Dt2}AT39KU9#*(X;YhGk0$SvyCv)YA3ywr%jz}`YczWW2(lUJp6 z0478|fOX7p%O3uja>Jhp+9j6XErND%;edtc{|Z3&mvaxX7s-i9PW z)!_ImjXQ86WEc&SS?*m_v}n2V!4lx8^B?}j6d=K`K6W4|PJXk#bQo}aQr3R~q!RX0{qfgB|U}SR96JU=4BqhctKrEVrQo&~^ zS^Dw)*bDxG8j{p=&at(xdFdyoimJpb|HD#FMv9baH8{t_iee2;a;(jFA9x)jIq9}; z0P>>O)Y`KNM&UkW57E|qy|xI}0C*b{-RWcCEzU^#4l*N<(!*M}WlxA|7hjgEW=r>k zOO3e5LHp1sUq{u_gozE$RH8#z!fI!KO@=N&3H$CIbc%?u=4{y+N@R!S%YQB`lgDq% z8RY`-t(q;Z&h%wqEsm8YU4hB-+v9dTyGBS_t+}`1yn!Ra8cCHYxy1~fZ+{eWzMGWV z6o;LA6iJEa^PfEaqIRVNXO_)m_@O>+ZIj4D%&cQ+BEGI24m5>pcm zI6}3`;f?KpGOo%W@NtuhC7xY~ytC(A!Gn=sxmi?N_Ax`#mn)J7lNv`gQGrR-ftW|t zw_Sv?Q*<86dy10K(e;acJHwSGRLAW+VNC=@5SD@3_D>B0+!6^~1+es9xd!A$eXYda zK%=m0EKoCYw&l-82! zhOm?xp#q!0P>H@3ehW^@Et^*-XB)BnR_3vG@`#qEwlc}>E+T1|b1KRo$@DMeZ(opP zxqZ|e&H)t3rdjlXLsY8A7`$YFzsP&4$L3$C<%cAj^mN3}fF!eaBpQC6PvXSc2y%RZ zkTFMdWVPoKPCm-7wATBtCgT{T?(tn?qW(4HupB3CH}ffn(X~^??PP4_>?(3M-*d_6 zm)tgakCR-}w+itAAc5M-t>5$rMdn5_67mEmGh|xM49|#&2=PPamhuJBRX^-CU&u|j z61nm}FEi(a?^f1TOg_P#Mf)rfYbZKkNrOHJ$2i)*P9?xm>Oc1(dYg!4sasf;rmejF zuFeDBp8qbYx=BbTomHr zWP9PL&eW9q1y)9wSX!|R)3Y;SQnnUfL%VZmXf`)83K9H6oqtsta|tJl%q>qB3V|77 zuV>oX02S39)~9U)a;DW7$m|+Bd5e{*Zx#2QaPrG2x8kRnq@vXOF2d4K+ zAHXEn29i?SyGfYpsL{whBKlS-?jigGiq(8e8y^K}^E+dQ(g{dh4r6W?`)4pYvtVOv zN+mGK2iel38Wy}x?!5s}y!uxD*X46{f@|j6R&i0MqtDgZYOAz{9OBM?lXZmbt17XK z8*q|uU5Rw1auZdW>Y{*O9qm)9f=ROKyn&pHy^TwBWt(igD(&4t(gNGf=!vkbo0|Hj zce4j2A(rE7jd_0(25FV|1B8^ZJQW2^fg>He=6ZO)$SdFPQ#?f^4Ru$i^DS%g`wSWV z?2Vx2NUOGek|q&xG84|S8M?Ceg)3Q~4kVp3&|6ze$6bdUDYw5m5c z9tcbMZ-y2meh8&9A9jaq0V~LnX%Czt%YreH9J`5@_xRhv8WaE zhe?dhd5?&aw(euMQcrM7$fQ;~d_L!ujGoJCyda{BeY$+V_BuHeW_7I!{p2xKw(5`Z z*jN=TF@M%JtqDi4)ih~a;Ang8rR}xG4b+Ii%24`Co0D_ft33cc^uR4;@UBY!sOAQ@ zu_SNyr+U^qK~?irtz9T$G}%?1_C18u)FO5d%lF}_p*{LN$SZXeEln!TKrGGjaD`q> z>VqlI9=D(7L%kT6nYc4dQu~X^ioV##*Uqs~%y&m7!Zmtr=r7PDW{r=k2-7RmD2&@x zz;PeFehtWKU0gxsZU%}WzlA7JuQn3%T_C&9YS9cZBCZymG$rK$m!w$jbL-o3=L1?& zKEElmAzUhMiyNZ>L`Chqpch~YwhS#}k-tq_bV02A?@Q6w74V{4MRa+mupY! zm`M6v-NjSAvJF(y-4{l-8f@YsSc{4L8KVYUSjp}q7q%0&rbHoxWx+D3H0%bb!)(!B zKs6c~5g@tome&s8Xikl-ym~kZr}os4ZIL$!+;X7&pTfzHt6=-U&@)VIYMD5|+)4?$Kft9f{Ymcsa|C;b#$E=V>D(}dt^uH>C6z$vBgH-WrP zu4dg!<(P>7qszh;dYM`U5=qr%h5}Bms)tu1U&rLLV;dqXv+;YJS*PBrw25PR>SWn1 z!HAJd6x%RiAMdsUMk*?8&O6yfm3ml}q7ec5SfsoU5Rh?i4F(RNq&62z9m4rk&YYFp z#iXcAPJTK?4GoazvH$rDNmEU+&<0`AJ0eNmE`Y>Q)0vb29@eX_jiLv6U|v>u4JQ$G zY*t^Nbu+0C<=#_LzePzz9j^FYLGJP?+K|0$_up~v=6Zk=cdcjWc0D4bRL@sJo&d5j z;H}BC=lQNdvZIY&;FM%l+UoSYPG-4svhsf}jgQW`k;ym-U==--qHD_<+*a+gdyQ#C z{$|}5cD&ODI5G7|n?N@w<4CV|@!P@>(tC0IHg;-nqf#t}^*oH*D`f{6vtqed=Pl!< zU97_TQ1fbg!bG5VsQT{%U4Pnx>Htc*xywk#;l%ow61sqgIk9(uL{9U-7)I($z(Z!= zvY>uGnG@3l)ls~_q&{fg>x4vwdx=X{bitPXTvteGQ@9`4xofC&?5O6KZ-7(~6IXO2 zPySegbS5HyYWlk6y9bl#kMR?GKqQgxzkKamx*jLfzG!m(gc&+%Sk~jGXH?RCQj5Pm z6ZwLZsBfRzXBMcc#v4f?ul!$2S>}egRd`ebbL3>VhDb7WA$lHi9h0)Nyg@h_t#4q* z-oD!JT^m}yRmW`;Uu9Z`c1w8F>puK#8zvu4I-T?#FiE$V@N=%)*~Llv$KNXHd-MGo zjZlk;-Ei*z1K9U!({EYULoi7{JQ};EFuqpXtY}4^R&XDCOC?AyznYLxI_FT5T014F zLmsZYR;zFcE|q$m^21_Gl1f{&Yn{2yi(=`9`vx5H&Asurh-5n5)wjNWhm%Zm-*o>T zkglFw(sl&>V&0RR55KM zqQsh_cuVb|WMjD{o2Kt#A|?-wMdCeN6j76}PNgfjkBuV!-VC)b9N;CNBQecLxc4qD zgbBMG8R-O&Vt>=q-)FFV_EB^ijr07?ZSTl|$Fu@Y(%&7|E!U_^l)NkNs`6KmkL#29 z_;vnbO{)IiKt5e_`_(wzf+arZc%v+qO$`j8KZsl8$zK@mG7 zzvnfBC}K@yKj%@duI)_s0w%6RnNA9xCmCCU^}R$$H{94 zN;2KoR_=AXh$_NbYS|MWCDmQV{a}R9k!tBbSq){*}HfMB-3sox`m7C)1Z>O7#sUW5KF3j)=>A$t{|+zg6A76Bg|s z){iy>y-zFn0LkW3%#YwGY@INfdYVM7Tgmrxh|HB%v+;sR$$Vgax|w=KB~87mzU-0N ziPaj=r!@&nQ&~2GS%XOFb6*d)p1-VQmH;=Ps7kFAQv^6NG3VGLpe@X>Z3LSE$Tluz zk$Tk0+Ch%KhVziP-6$j~fsLp2pRGxI{wWJld+qjxrLb1f=>||bH4ZR`h_FsB>d_^b z1ojzZr5Knbuudj_+u1mq+?eF{+&NAhy45Q8ML_8h%?J>m1YZG>hzh79c@3AQ>atY< z-Q;hc!1l{a>@7SLl@#iUJIts{F8b+x5%;JhvEDOolAH%zWQeQ9ovI{{|36{x9$#ac z=6hdjt(-JTXr)m^7-eHCgV8nyg;5%#Y>duel#NnKOsLSL(JGa-a!z5;q0vgCSLK|Q zgO!zpBrA+E7|CEO8?Uw{lQL*!!d3=bd+qP_+$*d5^S*!Vd_KQihv$Co=RRE5eSV&W zl80#%t();J!TYqwHzt1{9ob+eO@sFs!6fz*Yaai?+;4q>nrBF>WpAtmSZ@n5w?x6s zdQc0qK%kth#R3`$BQY)3Z1qP9G$EXf!7h+ZrmSzqXg)`xGMQrOlrtq-tmDv`Oj*A& z!16x(K;!L_1A=<`$!}y1Ad3N^qhIG$4WRWRT!&lo%mD>W57jKIvyfc>tdDkX|(jBQVt@ z=0XIMIIE$mFrwApP$9L5lpD9LvDRo$AqR{>duuJ)^Qbx;N`ISAb8s1*acLY1*93ky0xxsOCLR;*=TiJE?FL?(0A3c(A+K#?;SCAj>nl`eiE&+HjT zAP3TgJz77p*%cK}s*t^o`QDmX={%lRWwm`cGE%!W9cfO)W! zY<_`t3c8@p)7e|y?@W~14(RX3&Vwhz4ED(IA|nu;t~(*9u_ju-rx$M|mZ+_bP9sqt zT25P#OxAu*`^aF7ZUU7_i|*x6NVr;dP#3UhN^y5k$6WSPof%^Z=uFkD(&LRQt(Qfx z3Dl&TRuq$fH_pO1NjnYXKyj_Mz6ex{B2X4)*Oq6ULZ@}hDls-;1YJSH7!b%R;~m}6 zx`EW;iL10tK-P+<$JK7}z7h2WLY?e9Ml_G6{np*=7v^>y6Z%%Gg4#5L+BylDes0EG=Vo384Oxga-D(gi-*{tX+ zM<%}<+O`&l;}K0}nClv&UXRK&N6hD~pN`=fTY^gK%kQR*a-uXQ$7gVCr@WYp5zG?lQ3R^NI(US=AdxIqLao)c3Ph7E{iz+x*J^{R&DJ?@3iK2zlTt0{*o{s}u;B<4#%{>F)%kWsWAemE$Gqv$3zaEAy-Q!O)J3=c zK`p-`$VkmBPYprK(Lk+)EzGjh6$NfUR$_8iSo`SQ9BWM5|eDR z*@+l%XI8Ow5`w~%4O0|rrS?Ej!%NN9DW9W3jV>Oj`xKgJtaWOu52!E-zHJm(OsYj; zN?3O+tyb|AD&uozXG8n0^XN>eTWzN<08;9g(FW{>FLGF?)2ycU26_tH`jlP^yylXP z4Eh)&1m5XHpc>e;)PDa35(PjzX(dAbt)l?euCcy}9}!gOr~vosY7`xgP0xZN*f%mS zQ;X~Wr#hAev{xB|&bR=R5d9*fb;1TE$>db_yEG>Vq<=*ML;5L(XEPV$Fnppem(4Rb z=!a=B3|QY^!~tKQTAs zf!ba(Qg~q$f!34M$~`N3nvp;+6@h}tIG9V-!xI@}Y8|HaGC_$9Kh()ie-x(VR;=b* zSJpWhRDn9T7RaqKABmDES(Tv5;VHKyP8Lz;2-u>&FsL2>Ohl9U)$EySZOuaya0gvZ z4V-3KgqHYtdmalgJTql#-|&fA4(s80&M!=*Bk&wj{;?hnbs~~2{o)h$osDFg0A7&M zdERam9Y(|HA~Xj~5UN-A>M4q&C``^j()o}kwGdU%D*`jW3UtAN07Yh55@QV1G?$-X zAE#hRrmpPa{0O8Am(p*3T{hBc&LNV78%TjvuH+CBqe>Jem3kma{p~&)lPK%wFtlBI zYywd??5SDX^DofwK)wS>V<-15LBDcHuYh18)K?NTJzde52=zy0tX%d&Wpb&%!C}m= zKRT*|Jvn?Hp6Kb5Fx`;>ftHtw4+z#8=@bNVs=q!>!e<)ES_h`ijifJoX;*+`^3ATk z7Q}KCfpWGc_`2j0AJpkF4701X9-Xno#E5 zK))N2^s(!sMSZ|Xf$hViMoM7T&5eafCVhORUk?mcpupqJ$3l5bEwU-r_$1m;a&RiB z>Db;mkH!=<-_c@h%x#y0D&76F%Kio_;|v|YiKdNGOHfNKTdQo=%NJ-(emE3`9Uzh) zrbueNFOW>5lw`foEDx-}`W12Qj3bOli)uCFT@g(VX+y24CJJt6pO*7TR9}A+;1fne zOjyX^5EQ298R)W0W{ogMqBN-we2c`&(>SzL2tUwGYXDD$tY()Tn?NubSeMA=);vTq zphqy(D3DAB)>-f6EjNJ~d|P`W@#ZokC=N#we7%taZ<5moAe$;-uhsf4kTa;k@;9mN zW}`7xqA#IagR$K~744ttC=LWweBg|)!IJr-r~*xv^_3xWyr~ea>A`8ntrujBz5+dY zHlI1GMIe(phEp>^GO5%LSLn{q`JhHk(XtXWrldI<&c=;36ue=SG-jXvas1JOl7b$p zpP{)AZ_3tc?;a!aSY9F27W@U0DO;Uh(nkEgL*`Xy3ML{bug4*2B#a#VT6QmF?_~mFR4_@k{`G7a&!OvA-E{H@)1Ug{cVx$DtGLf7} zrgX>~Jy^Dp6ZFF@y7RFc*;I)6PMBrn2T+nK4vR2Swfb4i1RW2d4bgqzLL}p3e1TR+ zd=){3eu05p)S@u`d3Q#0YQCX9|q<{1IQpq>^Pfe$)R|H~1m3*21oCEiF$yx@tn9?9gI{ZhslTaY{2 zg}}m|6P3w@9)&V{C)wysnd|I{lP0=!zZ)eVD;&_$lE zMzR*>8BQYc3|7UgMU(!Bf#!}SI(3R6CiuD)j1Q@A6I?T+CyK@)n(|``sTDN^1r?>A zHnxU!Gm%YvI+C^qlW0r{f5M`+X&X=(FQ7?JWYyW2>T=Xn)S68!^u-(AYMs`@Q+m8g zrws!Fr4X1lIg!Xs$CstDjcEP0G>O}d7|0wSPSTmn0W>Btx)N_4YCjrO>3iVjd>krM z-{jkx_-We^RGPhOV+1IT5p;aFRnvG1Eg9i8KY9;%60e`Gr|JU&XK$!_Bk=K=4U8uc z$QENtZC>vii3F+7Jw^&N0oK={ULc!FposOw26@zxE6z6HLEIC8%6JDYgPzN0tlTxI zvwmRF+JNna&SZ5nUsb0k*MR*|GE!d0>emoh&59b}NO<*YYs-5a0tvP@)i47jlcb;V zdMpMx69sA3Z!uYO=6T4b;Pn?*tfMT8P!Tn2__PIv=PTt@Jv=cpZHT9o)*FdaU^;8u zVkE||)*KSaJe4&OYiBd4;9Er+V)q{v44Npa13?9^!s^WUXi&ihgSNkgD2&mVuV+&8 zXiVC5tZeE%sAAzx_p+&URHnpP{pMJaj?vDWU@SW6*&(YSE~7FQ6F9zNdGn2+maa*U zeOPwbf|knp<5Hj+*L{@K!Yb>VWRDRjt4|WCcG?6ayg|2nbXqR|-cdaI-D@=p6dlfK zp3oJINi>}lPu3Aoub@tUNQ^Vrlv00m9ZMmaZtKhti5Y^PI%L@l^MDVSDV-5*MNr0J zh+_&esbg2qD#DpYIVV#1}T5budw6n3Iy^qQi10|ygA+3+m1?-OPpN=0ER=* zkfpuDLRN78RSzyipfPdjZ(ZuC1R4@_9mPjm%eg^Jp%==)&<1NK|7KX+A2n^sN^Tos z*aSM6k+75sEu@YF#~F#lTBA**fFAqOmhk6>Ek z=jEsZ$=3I1tYV2rXObQKb}8)(N@LmjBUD-tW;V42J(iuQ!%HVTE=K|2=h+A(omDGc zGuv$>J$gYQA3!n=#!9U9g^Q!+){m@Zr$Qs(RV2E?2pg>f6}1RNYBj+$B}U?vb`5wQ z37gRcb#3D=Ba(R<1v7KFVPxy4m9%nNkWFFGm#pHn0!@aALB~YusE-jlnoXw7xBj%1Kr|23 z*XB@+G~xC8Bi0aY9I{COm1Wh{6f`Cm47pZX%ds;}Al(QI9HW_shC1Tp5jFs?iL+k& zTW$m$2hg7&kQfqYebsF}qDc&=thF)N5>#ML>u^G+QP65*4?5P$M!|DFTFcmKH$2Z_ z(UylEK%m+f$*FT5H4+0f-AOJqlJ(&fY8#0}@I?jd>_aW0$*_*Bbd-83sIfjrgE5^) z6YzC*SgcLS%cwdWTUXN6JZ_*Pan?sHTi{6?8}Q`oz7hBiv(`BRDdkam=><~20`&b} zx{4-947MTG2 zl)d?Sz;a|$z>Ey6&6apHG(>~{9G^GrM#f!RV70`o#5)mq=FeCGr(m;<#ECgwnb?h_ zCcR0I4p8%mB#0TZ`TYM;Wb-_}Id55TVNgS&a9wP`L_zi}52_6Y_=r)IKJnBaHLJ{d z$n$7Ssq0EIKYon=2UU!)&8f5-s7ys9ex%9KDsDk%N?6@qPsC9Z_feADU;M(_0!JXX z7}g5x`UR3H;q`0vyXx}4I^?8J_4s}SGWoYYbx%xP5lsbfh*!JWUPi&UYpm5UECEFz z={ir=Gz|%=^utY7DUC!$t!TqG4wl66TUJCj1);-Bt<8a%LB-ZJtE{g=p}_7x)pe>x z@FXsQg+9G`c`z(L9-y@qkAlqc{VcPdx*plo*6KiUZ3$|uc@mL0O(guxtM0*(`fL<| z3~@fi`u^Q+RHop`kRBAnD-NJD1>f-%G9o*Q()7N8Gs?z9(30Z>^EQ?d_!KkYYZ264 znKkg_Q%0g~)t2Hsl1aS&h>I@hUq;i>x6|fx99UiPMo=5i-eO0^wVVlskG_#VC$R_b@yYW_y`L&pFCKQb}r&&qe6l9*sTQSyF>r6yb7woAdST9S> zLltNnlW4cCCCNqTOkJ49$4cOGl#~G9n!qsNDFK$z5Y`)kp&HrPf?)jN=hlY46PW~9 zT^i}hHkYX$^ZESUMh?6KjTImVEKJXJVuMG~P!QI#o3$rUh{EK7@?UMObyfsb488(0 z4OC={m$3D7v!ewMiNT1fNZKmhiGK_ zz6rpVHJ^c1Ge9=QNQ=YdSIgoBIy1D>(YHC}N>KUU^4(9U)CgD-!79G4)^tTOnP-LB z`fg$`G^UbiwD_Q4;6#LpK!1s52rQKnzk(KSBs@jUXZYPI{`?I>k!#l%Ej z&9M-T$ttVy=5sj}L6v^`$NKsC+CQpv@msr=Q>X%iZw{rH?{$-c^QbA+6}rxU8J<+r z5vW-p7&F+!+m<}J1%*k=W?tV?d%cgwq{aGXv}WkBi9%1bU=&7R08jrc&pNWoRE6|J zAoH+9Yx%gV5omphqgOB_g-OW%L7DS)+D9Opl(F}vZ$6Dg5hx{=a<%-&nIQT-A5LtN zwJ9h}H5r>_H;Q00gIa5hK)a!XmU(DRrEIbm(ifS?uoQjukOVA8Ve-w-F7YdCY8$AG zPXvA}li1e>b^22}X0v4rI%8ORO2(Q=(2t<}{~eIfb~Bp`YDU(JD7z7`d6L0_Crkwq zLt+l5v+tuuvf4+C#01seUu~uWkqq&Mu(8=%WK&)g7O#0@96^mbpAPrV2Q}IV>v4n2 zXgX?}?oJz)8>oVA!Y3Onk7z-OMUhS{1U}*iHf=NvM`~Uej!KdLu&&Y^L>P|Nr>o(} zd@sZCG;oZ`1*3gu~Z&!IQ3)0QI%T^ z$I6z|a1?g7;r<+tH5_fw0mHpZ;Tw)pD>U42%G_`x>8K4iic&TlL#6YE`wL3daJ(>a z15V3LKa1C5w1X)@!~G>CXgF3rUl@+DsXVtza402cxY3lL;f7Fxh8sf(8t%uGpyB?E z5;WX+O3-lsg%UK}I7-lP11Lemu|2iOaAPS!!;PQ>4aX1At%tMH|0QME!5^j+JNR76 z(S)8rIU4RiQI3XthjKLBAj%QWifb<=X|(^Dl6-#lhjq3ex@;q`b9OIQjo7qq{RkG^ z_`S@H8<+7xH2t@66<=mq_VZE8Fm8s!t3F<)p9~tYi7Ai%Ed9@x>AJ|Lf0Q?l_=N78 zk3%dQH){M}Ec{1&lvh&a2zlEP9 z;!^%Qqu-@`F5)$P*%68bqQ& z6nwX{!Y>YrbZ-{<;RTT%*F@gBBGU7=$d7J{^u8+`i5KbfNaXDYA~DZI`aThfeJ#@O zrO1kLog{&DuACAjA3ctcRYRiXV`r4Cz7Z{-JdTz%>$^z&$S#ubxQqP#jV`ioW~8j` z8!3rhe<+`>jg(pBV||zIvaztc{Or5#k~E~NZ2F+9Y`)x0e*RWh*?Rd$bjV$0CC~W9 z_dk+t(2XD^*(Nz%#5vVzm)6eWoYqU^uTF_fpDFVGa*-Lk!{jIFHkmafOlGdN$?S3A z^4E{UWzPC=`QZIltcEwH{B3fqH%&VZ;Nv-W#e2$T#>{%NhB@C zAx&NF(oDWmhZ261L)@+~xky-NC20Ux;ayzIqfT$ZH{Ewg^K!ey5mzDgwgkEYtOjiw zb`A0TE<{qh+2kSeI`mJWjy~e-3|_%kQ$|fkL!_B-O%XOZfm}iwGr6Z#hDpn3gQYEd zru|>LjI+DZXYH{`9n=e)H*J&+&-mIdzkmN@`yq7QiMRd5YP+jjxTHR@$>Uy`_MAA% z^{!3&a4oCVMnaGy$@5l`t)$5Xb?&oC(>R+nzh#phr)`pgzLB)Jd)Va6-k$Q0U%n-0 z(|gL-^Lom;?|RBVIe#d*Ja0SUE@DUj-0~yt{gI5L%rDjVmw)-Pw_Is@hYwv0lB@UM zk;?ftd5>!ax5MQO_x+FhcjV^z!E)o9Ka*Q82Fqwo+oN#FAiky-Au59dUA*g%v+!DovfYWq9TrACTeCCsi_A{7rP zKdwC`Zud~^W)Shk*_+@MWE_cWBdnt`D)xA+EW}cX}x0 z2!EGrUTC2Yf7SbVZXLAg4}?F3y}@5peTGBcA}_AH4tc^oSq~iYFItv)4!O$p*XTUR zw>0cQeEGIRb~I4-?~u0sK>c|nzDrx0g)KPaDBH!14Y)6j{O!OF9&&FRxI#RxWWsE~ zpZXI=1^%DxkZ;htyE)`3<>?x3lgiih7qls!sSbIF&P|)-=!Y-EKP8?5WGCg|iV>-V z&mJyvh&&YRLr2_Bo}UJFKSL%>9+$`~?yKzNkQ~Y(eG<>d57VhH?X#Snw5=0y^rd_* zhf5RfY-jjAIfVNlOqxTm_w~?C&?(_^Xn3?V6=FN3;gU&RX2nHg@6pnQI!v{7ku+o{ zb?2h(C?H+CsN)@vME1eAk%#T1D|=nEw)t6TWBBh+Z_x81Hov} z890ZMC-8=Qx|7EAt@LeB*Bor;80`;w_iNHOj(VY=9jMnRkLJ(p*ixhn=K6=kUw9!} zvc~hp63Ra3ShR$p-_uRx7}sv2`x5yhc;|^|{5o20k?ype(ee`gFnum|HX1qrc>;7e z)ODCR2T&e9EgznLpJ!Heq228w`OVbnS&;>V-Q|jw9XZjWemk1`rlFe*t-Qd!#JdD} zHprr$Rzf$Cj`3V~hmw|;)JdU35;P3D9kZzCC-@igF9Z`WQ_~0ycir+L%1o80i+a+y| z$VI|u-K6e@^KA0y!j4jTRua#=#kH%TP2G1`&peyBr#mElJZ&1` zo@&_NQExNtvXS(EN7-*9uI=z1%0Tn@iZCv8?km(a`fZ>oJ49Sb#0}E*nveK}e*dR- zabUk+pi9S3c0)TOcLzs}m&wQr!CZark8SdTu+H0**8`EC zXnXmZv^OKa)G`d0M8d5GIpncv2;&9#&EQ+oulN5?sB1LQ@ zLvx-n_T-*j^xZpd0^)l`m=VPH|3dd1Bc7?m1JsW%@r+dBxemPp-1`_GWQ2>337U3O z?=7KH`GB-j&Wnimzk!dz-vb-STNN(%xUZ?!A%$__(hk2K{c~tI?RhkK10RCU?P1OU z-7zRkexP)YNIB_BT}PQP4oI~zKB1fnH2=^>rQH}Gcx+Mw-!zZ(FvfFHZ_W#pB~;sg z=WFWjnoW)nW(WO6&KQxO6TjM+_Nke}DKB*C$;>%OPihAD!aKEZp}mMjXSM&WBA;ehH-YkW53c$z#^RR)eQIzK_>a4fvnFa>!c{{!B= zlRUje+jN5Z00kM$6$ziZih3cOKI1<$uJ8Iv^BOMHVA0nlQtcjg=!y^xzr}lDXVm7JG7ZOWz%Eq#nxAHw%R0|c5+u7eas4* zyy3d*c0Y0Th?OHe!?iS4cDQ0CYica{r{3b}8|K7{D=$`t@T{~Vt`m00m3}e|Im;JI z-^BHUfO1M}K+b^AiIwNXr{k>Okmnq(zd_bMwG;UZr|ot|QeUr`$HTjC#)^AuKXFrc z(Omb8iIsvC{iJ~BeZ~Fhge``;(f>Km^$>sh2I_i1tW-|wCzU+&0N3x4pB~(Q5;Rp& z-xYmj0s2bvvlw~=eJuI!2}Xdyz`6SIu*y@!5?^*oo8wp^dyft`)qQ^#kFQ`1iIGJ{20Hu z%UXo?WsmUNJZ%^8&kvc0$FM#@9EC4M26O#C!!IMgRbT|Nj=#o2-T2KrT$`+M%;)~^ zx&N|FMxxX4(?azB1%3wfuT;l<%i!mNtS5};Zvq|H@4;`;Z6^L5FY$TC_gN>oX>X_d>-XT9m9t#QxQ?QtLStH2Cm)!FmQjTspaiy(`l!C(|`5yGp z%1FY|Ki=S4fe$@s(r40sxUPlEKPk`k{|=o&@qYsK(ZBwTXS!Y!ALV`o{eDmboCCOyUAn2aM&xtg5@=%Z*gVB9 zHGLhjg?qD@Ti-!{9So%Wo3quHn0MR`wfw5LmUD|;?r}YvGI7lzE#Rlz+m7zM`)s%rB7eb{-dg zjxFvY{1^23+Yd5+*h+tQn0A)3+qIan!@6*JL)=YH+L?FkQo(p<3UOv}&tTG@0?i^G z=T6FtI&Q+Yn(v25IM;Tfch3r!Txc}1yCGb9Li>=6-;=z~(692szQrS5v->#y~{ z1^>OlIsx{eW3GFoCG|D;_ZIn2T+hO825Pt;;};`CZ^NbWm_s6X-v6Wdp-`K@?Sk!gk({aYHF3;)u!%1~ zhL3SQ$lAvmpvxg4q&*z`033LibzABqpZQty4s5?LOfuHcUZNWXV4Y2m9TE;51HR!| zmHR}7ajzG8987?J5Bvm71wRH!gv}?tgP?~<|G)vPv&A`N7T03o`+(K`DT}otdsjuu zFU2ljQqPC;Z1U?I+G%j$piK@!oBGr5>imBS^g}3x*34RLQ$6cw`XC!~(#U8@B)*f} zlMyPi8Gae?XJ8vcSqm5xEeXh}1EM8ks7Mvh$)K&+h|bGA_;^ONR9>)21#1U2%#EwD z9UrIyu2r-_APrlr)p<3IaXNP3Y~{JPqh%X$tOQNxu@ek8gEldv2Ym*4oD7=o_m<|W z-tyH_n|wh2e>=y{d^uW<#@XZ;_oq^)8z`gqz*BrK5&8=H3+Qub71whku@Cq+ApKew z$#@VYd#RkP+fh=7?jZXJXP_5B4VcDr8=*%wu@(m1(~Af1WUl0jV*DQ^`*vUxIZ?9z zefkv2p{Q0^9Kz2ZF=itCT!KG@W-N}Dv>etXs8rYVDCRX$(vCilxlhIl%Hr``GLRw1 ziLY6+!>OB@Gg|Yh)7S|5#@(S-Iahv%AwMI2^aYfa4_h0}^&0BbNgiC}r)dcOOg@~= z$SNwF4ygJ$~Qy|gR2#MewZ zo4Qa|C%Bhqq|tul=D~Z2;~H@sh8EFh5OxvI)4FnA06z3C+7t9?YeHCuqn@j1Pc%;Y z-BQBl+CpeQSZn&8`XjHJuD$fB*Ga=Ia0giPcgEd1PSrKEa_-l%b~3)xd0KNkHnf*G zx^`kc^+(dQD2jPLcJWpu?feh057x1m7&W!@1jyq$OY`>;mVGTD4bJ)Wso-9J(zP4= zp|0BDn{E)#cZ|tZFX(w@Gy6)-tSL1Al=$O$_FBf5T>ltf+k23<<kb9PSJoi5&&g=ok9-r#-8H4E_O>>oC z_oSCG!x73RqZ8}V&{u?UfHu$>bO(JvGzbS?%B&|e7Bto3FN{63L)A4Gwf}+WHLeWo zAe^w}S}yecE;=~Px6X^B(Pu1Xtrhx)@Dz%b&s_C)>L&dGYgF)=cbT8+b?Q>>*R9uM zMCPI^QJ(T$sMq7TzRo;9cRKY>SoN*RT-#4L%m0Y?NZ|h2=nDe(FVOoRP<~IaUE(WO zy`~>5;o3C4zK-jZsoF|fo=v)I`pK_8zbAZWFqm*n=P8fh+2uHSzN*h1PrAS@_%?7I z+)+Jw(EQwkAExo2MZQLz=sb;akQ3g8E`xj1F3^q?Hh&Fu6-vBUT;Zn%86Jn?hx;ZKCYwS_)`p47y#z0j;M)&@?cC z=ZsLOevnC1(+l$XH9m;%)sik3{kn4}^4F9R{+uzNaI=#dT$(s`V3+7D3A z*g|tX{XA>^O)c8r<1g`0_F{Fe-n<1pYyQow{Wsl*o@e|!XR@Tdrry{)bv>}lLgHQq zoS~B>Bbj*t=^ah^>7L&Rx-b`gx|2Ra%P+MVN*SeUAA_Aa7X$iwt^e)BVWpq>MJoBy zH9_n8Q1}g$FTSkpf!?1>SZ&kQwsyhq2b+1WwMX;qB=SLfbDw*<5I=pCu5YAbD`|Hp zOEh77f>uunqIXN;?w>w%>{pXl1%82|Mpk^ zrLAOKzn^ep$j1?A4(lLo8Xs|JzhdQM9eY~(OzzvmGk0Taxwj|CFKr>Rk96F(5e4khu`~*`5e?*x4KPw?$Rc3yaONaONzYo+bB*+n|5O`e48|-?ACs0QyZm(^zNRjuCjwvXMqFQV zy@LCyXn($jo`O0}zPyCjy@{p+v=j5t_eZbK=%Vcg*aAYx*J6>gR(dG6uWd39{vx=d z_rJyc5%}nN(y#T7j^USq!^KJvHl6=46VQ27;o%D!I2 zPyOdEVouC6vq{tL7ZW9u^7&0+sH9I9`Bd-!#xB3EBJ5I;1Erx-@Lh;xYhBzRUiwdW zeTY^58VX*1$ZJtdfl7K(lOvTyCfhth1g{~Y1+m0 z&B(iYlTO-+U%>BV?w(DY2iV6x_^w0tq5HM^(g$|RgZKmGcR=F_r1ymKswcjB_^fA~ zOZdhiMf9g_$T!I|$Bvx00Mk3?-Z7NtD3P1-cKM&kH;H>M`TiDu-vHVf{NZoAGFST% zQ=^{ZyaClXCc*n|IHZcXv&M0AkzJOf(|d3JlsI_)Smc&XtZOj$uJ1wFKwF%4*@@nD zlzRWpE?=Ez-TFhW6aI*XPq*v(hq&RtdST~KF6%Z;4P3z%3OaAZ4f^8>ou$aK)(b>GHmi2=5|?M+Qsh?`Bk3C{s!v*$E>^U7%NS+Btf247rayvDqn}FuDQ)G4A#w&eo&9j@6<3YD;fn-HTFylwY=6JxI)+0WiiowcKHJ<59LxVPi#yeSoWC9fipHT;IQu)Ek>6 zqyMiZeYan-A}--0|0X$W$J(-6-nDJ-^||ef8{KU?Tk>o>9{-(f*B>6*{$bHBTgA1W zQfZ5q<2w^rgH4dCSqb9XmngBs-LjrFWawZp42%Fy{7Bc6)IUcfPXNdA?DAXI!`)NE z<*TV-l6S3-&zS~hfZ1R!SO98S$3Kx2E+>=2r4Cw;+`zhm zpS^)bP*D`d{#=+G-x@AnP{o?N4^%G=mzub6IY$|LPlw4u;u;I8W5Q$!bQ$oWKe;AM z>PXj0pNeX-#D1Q_#WxxYUK_Sm5 z8g7$f{HSCS>q$@#av8K7zs};G?O+$k1^Yp219bqLp!P89=^pGWk29mRt@TORS2Df= zZA7l1ov3_G`*NDT0aW?$%?5lFTEl&Zi0=q+1264C0n`Vr1|{&B*j5(f%9Et6ZVc=B z_;3U8C)=bkiFN-+q*Lz&HJ}dUqss%0@FiQx+Xl}2!Jh-=f{W-bfvey; zxCL5ES&xFYEf146=soZdwEr|r+M!Q@zv^vy4t)jQfXw*5l0}>Dn24`{&LA3etb?e% zc1P|5V!=Q#7!=0PPfvA7F(?70z%vVf-+;e^^!_n2jBq2sXfPH`0F%KqQ2T(k?U~5Q z*UTHAQ2u);f8eL>Yy=gp><@zDzzeE??~zFLP5K(p)+Jn8`-73R;Su37gE(h{xnKcU z2$q0lU?pfj5H72s3Cd?<(>ZoYgx?7K&bMVVv>KfcY=f^`McsiE_%zVS7{ia;K;6}Y znn__&4Sb+#0OxSW;47dKyRGOHCXLv#e}0%WKDf5i1SQ z`qTZS@JT-@0>z*Nl#U^O_Vml3<-_~QIqta#E`h7yI=BVyfHqKjH&#wOij|Ylx|hWH zfVlVY4A8hURx08+AF#Ba9EW;e$4V99e1xxt*4)-}2eERGcpidw@D$`X#7Z7`4qtMR zwv)Irv+2`6z}Luk*#Oe0W#M8HJ&U#gq%Ow34)&8*+~dpXC)M*gUjgr!gg=0j!|~ri z)H#0K0QG~$9_;7MA^)VUm^79^OOZW?@lnPRx2N{qfNLK)nsLKkz={uil64Ev5aq79)d+XBZd(MuYqq#yr&j zSoo4Pb}3$AmkIEbLHT0FLFhc7bc0={p_>6_gSlV3{Gb8U)7BKuw@VRuDkeXg*HRyOYaox%au0LyLi#nm51avd zcKjUF3Dy0ITKxG0IN6E%1;O?iz7bShV~_VX>%vg)8p@yc*axcNYgX|7K`rMrBg{0{p5A7g>$Vlhz8X;v`0I`q&s{a?Zipsp74D@<1^YLp;sp>2N%;V>HI7GuUQ)6f+@p)CT%k+e;rv{As* zi8hP&u$*>q24Q9c%m2D^{tSKrSO}JY+HSOky-6c&XC0`2#<+kr3_rB-0ppK3w54&h zsY_{Fp;gGfQM9?B2DmxPu#9+Cg4G}a`(GGC{s);cN(r=-e0#`8 z8R;!2t^2v>5I6$dpa7(vU`$@gx;wP?5%!mX{q4m5p!J9Gf2be15n4ffm67D1a`oQD z{-}2!v>Mc0!~gR_rGz-kffv+){9~b#=L?la_>v2uQv5PhGTH3U>Wfbez&FO@Bct$@ zE1_~m?|WjGbI^;Rntk?5(5t}L%6OYTyl$&Qt|Q+9{!tEToW$54pKJhygYdsG*ytqu z4_Z2$xcAWigK}_(d)mM~@DP+Tes7091Gv<&_?8nCl0B6 z$=Qa}v;m;1lD46NbpRjp&mxDsB91q}F&Q5Kok29{4%!Rf7C#5GTlaEy54o)`{cR-n zy8-(JebBj|IV2W35TrAo9tO?F-}3G{WCVN(epsBueEB@(_im(=Q+8$K(-TKtLF+vB zq6ydb7VoE=>LO#|CxFRd8khlQgSlV|-hR0d1T`{j(4|nCqLtHjo1H(dETONg8~KD~h>8lw`qg2jv-2 zQg$$kI*O7~sP &~k#-o^FyUY#Ttx*r?@Z4)Eq2()!k7jZ)~Su4t_XH6Qu1QZfa z5%CmXh>{W(QcGDN^_=Dzp!`{sq_b~PPB<^90d=4eoB`)R?dfPa0ZxKCP!Ae_A2dFT zmI}@tR8EhQ<18q9p;b$x#7F+BK@BLJ&+`}ae8QC+=K18$178L$Z-|nMJmV6$3a*1& z;0|a5_dqS_KSBCWdN>O)o${q@8=!um zr{FoLjwB3t1z*>RxNTAL2Hr7+yl;pSKXL>5`fgE@${a)M(Ao!G#o0INsx#rDK^}bZ zbn-Qaa~CspO+us$xqJ-yj3b}u3zy=5E3na3_#bv!x(?f{#703mNN0VvJ8|^{eLyT2 z2nK^;pmqcP2P{8_)?dT_ZsUK@#%BEQ3jH5A4!odhKK=))K@Au|9Nq_Z84dM2Ls*}) z%UJjcz-m9b*`*FlMxF-zdx#g*+w9T+W}ur5+{}UJLi6ae7eE(+{H=Cb0$m15$Y(L| z%ps4UoHUk!!fWLJHrIwT-x@@pIElUyT83N>UCF(xK>|nw8^Jb^0@6Tj2JLSS?eAgy zYajJrN&N%=R^}g|0(+>u%lNC6@z+b*-v_k6xAD(ww7(7X{g3f2;%R#;O148g=9RnP zQ$4Kh3;p_cp^}Td9~=TlK>PA2aYGA03CQ>`S<0ba(DpHLLtCpz6SVzklr%zXXy?*j z&}PG*0~djpK4&xZ61WPkgKBg>`rcdcb;sz#=wI)^w}D3H41N!N7pV7fu6`ZuG3VPl z?9cz9L+%kS^9A;{miM>d+rd-t9J~T;?}x}6XzLE5LO5G5;ryBKpW;XbIz)(!<2PlV?=Y|AGnJGZ{<+Gl13pSJK{r*~qn&&k4%H>gP#w zJ$$fzorC{hq5lQPuhIW~s}&8prqp-?#i9S^`Rehc=+>Fg~)6IV|_2fHaT=wu4p}XGiGyo(pb?}``AnQW{y4rogUz< zH1UH{cuzC&KOuf_hcIp69(V{Y==cNr6g&sDr}2Mq64c$MT$?G^XZZhX%K0$$KNbH6 z$AK4A0UxLaHK27|n7ksMHqIHmfja&QUjUs!`rc&mmwqPE@ZCXA;IB=QKG62Ayjubt z2nK^;U<7Ezenvyv-sfy2RM&2upxuehj^|tqx@jOB|C<4Atw@n``uuM`liA2)!Ca78 zi@mWvzW{zASOS)Tm0&e!@Bf)3K)pH1k_g=hHiK=TdS^28hh#~Cubck4oCIm`S)dW# z4;tb=XDsqL@2VurcJ#YIF4zz9=OjxWI0Rpk@wpU(Bk*oe4qpa5`#zUapzV)$)aSwW zr}{~Xq_ZzkKv>@Zp52@xbyreMJNk{T%camRr!Y=PVVsb{Z?vRH(X$jOew`vE(9%aK z;u*s;CVejD&=P$n@PZmp2O7Z{a1PYo{9I17@*GeH>hF@b2jubK=h6r&=6ue3*`LdC z;00CBJ{RB1&!w6$HNZ`JE)vfra1~q!`M%F3Z&ixif-l)e{&PN;JMe9woOG3uUk_za z3htrP{6B=YgX*(3c?x|Fd{?l`3)m%ih5QEivCqaz>=a!CD7>9a{*%dnGV3qNQW}H( z+pz!f*gtSg!&X3N5Dlz#Jg7B~>&E=I8~)!L{~v(=gZgXq%eQIAppDJ!56;pv&iMag z{2x@!!T(3$|DXo=SzmH*NVfJMwExe~uuC7}j|F*q@XH+h5@fQko7IIrd#CZsQgj|L zkZ|eeId{|)%KXM5gOPpE>PHS420sFf1}E1sKUl@MA({3G>IZrzcr4c@fJO-s_aMd@ z@OjYu;jC@I&j7PQv4?cz(LOZrTxc24_Hi!P7J!9d30MZ&njEqcs_Uq$q3z6@5}=7- zBiIbKffUfXK1CL)|ApWq$lF!Nd}9~1ZIO-5LfSKsItU;B_H)TazaO->d@hHeUK@Lr z(9HY1bN4QJfUh1y8xzV}GQ6&do^0h@CVV+)q~G#iVf}+R>(Tc_mj%3_1{|}6$kA@Z z(_7@XXG5fr{;G&Rs`vr@<7@hyNAx|{=!0(4Khl3z(2rKeg|oiKTIf2~_mf!P@4@E_jQojElILKI-8mqJL7qG zcav;C`!l)4wL9P+ZhtPjMkULRNgHJ6;SKW3-pP{N@C#|kl;yXyrV_ms?qH=DQrnGVoV)ZVv7Kmh<0vwg}$a<^7hTPLkiy zNs99LAPsvpC7}3q5AmGnCZ(0#q?~=7vSVGP@^xRSxEmwh2fgLEt(W+0??_d0Z%Jnl z(hscnC-z}Kw3DxRPUbw`(f*kC$Kf6C;}_Scm)q3SY3d31M}Z6{a zm3h?bVe0i6^$hRZN8PLmXWfi(6et2^pd6F}eZL}{XLJV9pgZUZ`hZw45EOO`m7*Bd z7{;>>0ZNC5iih$mn-wbMQ$uAiVTOSbU^EyDCVR;mDz?vuc z7=Lu)9aC_;mG%czCDHyQ)BZqf)}gBmk#zQO{k)hkgZO5H>MJ4Q%L|dY@O8~0^dTX# z0Dd88JVD%*JYx*c0!z@jpM}UW=t_{kFGLcci6E~vg!U_h_KUWwFj6v`x=7hG)-_(U z&#{4h4xsb@)*pupV{2(!#CcwP({-Ery2gGmacl?uKy0NrEqf5*HVzHSeF!9m*m`Lz8x^Z`)+V%q;lky7+JQi`8NO36Crg-Of{ zlbQcQ%aK=eZviL)<-iMSKpkiVwa=LUzGjXJt((gH8#K&f{=I?uH>l{r{5OXAFVx%2 z{P!Aj+uO{4p*6_v=o7l3hioDrh#PbB!Z1#GuQ^a>l~5-)wTEPB+gdBXMycN*Wl|wJ@(&#{rex# z|F-fDA=h%jesBmJ0d7zLNw9xp-UdBd#!aZ;C{w}mLhz8w(7dz_-?E_-LKv4af{vQm3*Ez{a zklqp^Baj=9QSS}dD5wX8q4e)DW{z1h+{~#xljz^a(7z95j!HUGY42TkX`8@c;usAw z`{Gv>llqg`K>f&#(2B*(f5CC$@`9@I%zp~{Ue-h)4WX2&b`Zs7H=_n#y#iXkQl)`&J*;ewO6efw>vk`0t+dv9P16g1@sJ%x1 zZ}nUi4MZ$Uot#p*6_vO4d-ApSkc+J!3nA^3E?J z|Hs%fApR2KFTM~ax$ypY=KrLzY!qpPmYxlhL+HIvIBOJd)(5JWG9ElY3#fO3C1iswTu0}RkCCU`{~Wvm*17IZ`0GLTSCLP&VuM4Odq7V{=;k!HmGEffsa8E24 z2nGYQ{?7YLm5gtYYv(X;jiaAmN~`KICd> z%~o^|!es>Uj0WkasJ9!`O)Fv0k40zY)5iLHHh#x?P9ypDBR4>`TqY34?cv)_&}kq) zk27S@*&q*n@pSSvhc$lW1)vOB_w`HT$S1Urycew^|LX_`{r~1)&;Kmsz9nE8SP52x z1ds@7_h5hYg(n%;)q(mAwC%tT8bQSq><=7&hW%leRcEokO7aM5fcI{wY$Tq|U>isQ z)vclAKUC7->+s`~APaswXiUO?ZTN2|{C5Cnwb8kcu|Eaf5Av~_L(n52&%?Za1AYbE z$o?UW+o#jFkHU{v;m33E-)s1<&Uaf`qauu+|0##+`5!M-+oBq%b^aip@jq=%9daW$ z1H3yOat?YCq&vIFC1`a6a{w3TlHu!^7n}sw;ctOPct2jJO#D*^9lU=B+#>d_4s>356ZtAJ{- z)7E)G6}%5r6Rsu+|KGyg{{ZtazAx0u_XVC4=PU3AIQ|-+0-ZrL=nlRk|Lwe!(S9G_ zK-Ruj+dy{)XZE3aId+LfKM>^mn3qi9+`-pfq=fuw9zCEGlmo4c!kx^2_XOs@Ma(Ig zpL&>|mO;yzd-o)s!C)8|0Y-zdU;>y7YLnQ1PiFreTF3VU>d!L2Zf1TBZG6rApExU- zOCJZ`4CcSgdwtB)t3eI$GXI@MJTt&-Fc(x0Ck$8sU$=p^g>_M~5Pk`0+(+Dt+5eu; z{x|dBW$4_cyk{e$`p>VGHok8kNZd&Oe>4;+7_zJ~w3 z#{Zxl=Wn(Xhwjsg-y>|;OS+`XgZ z5cG)hpaA5*jRx=pI8rH#{DZn0`T+B zTOzcUx;+6-uAy!~J!RSe{P2ym2NjGhDtF@l8TkJ?>Yr=D^M8cN>`(bb&;}FNX0Q#U zfHa_fuK92MkTU|vZIt(R=q`{8_5*hud#KPO%7X%6o!^WhKky};$lG}G0`CRE^M5a? z|8DsI8u}lgeRK&Zh4+B6!}L@8!elk~)POqB2+n|W;3BvLYM(LxeaTp0Cv(OO`b6$) zfcinBkNNK-=D&nF{+hW8v$PE%N+%?>}$gARF&QrlpQ;F(?IqTxC zqpIq|c6Ggejc_j-R?F9X`6Lr7w7(POf3oAY%0KFnKm(G}z3o4DaUTy+@D=Yb`lA?a z@|z$3cdkEr{+{oX@&72V$dCVb$-lH)U2CeJzL9hWV<<|{(ntQqkDzA;$v;NX#~|k! zW!K7oU-`#4ak07Pkdb$luj-j>Mfwy}iLaB_s1G(>cn0!*ol^OHSw69qjU27ycBCEq-x%pvm4)f6e8E4;&-z0d2F&&A3o8u$Q{XpC%p z9AopR&`eLGWhwukx}j|~{~upu2QvJFo#G?keSRL|Xg)>fOCp5~y1!8ndVga*=9i6M z3ZFpF)F^IJ=>DDkY=lqaEY73i)#;(~Z>;ryX}I~{`WfOL&o8`8zlt7o=x1cmi7sT3 z!*##iKre3NF7D$YdVkCQ_?xMrd#X7ybEU&C^Ay=1#Tba#pn@=%9EuW*K=tH;P&MB8 zCq1qXt3&qHXz3u)wu6_Kp(18p(&rM_7O$%A%&P)r9OQ(gV;nTvg$BxPYY?Z z43YLAaaGg8B4cI;mZ!a-?t ze|=hbdxtULjZ z>Jp`%NolA*Qp$&3YJ9UaWPeo{V*Yy+aU_vKD>7(bTpHTuml}I14IL{=jlGoW7nX*c zdz2-cww8uQX{W>=+fl0REe*|UOGB%03mF?eHRPmo0;g~q)nliIDx9UqU!JOMoEpy4 zFCy7KmEUqIKju_*(bRBR+*PDD%P$$Nd%RBGK(r3Bm%NS4Np%pq+SNg6{z=a?8b`eA zxBGaA-mj~RqqdMQDOzvY``u}e+W+583kCmkRQ{)h{$%9;D~`gS)9=4i5C+mK-KUDz z3&UV~^mEqqozszx*T`LaaZJzjK&rk?r8tnaa;70w(0}Ec~YP5RrZuU z6)S@^D$*)L+d2KeOZtD^`hV!$qyMMf%I?$u+oAo}=WAN8|G!!PpKKOR>zlWr6)ElY z2I*`=C3d1^v-V&7ZhEFk+O_t0q3=gDcP}g4b;P;IgW_U+J{}H764m-LN69#<#OM8r zcl7^-Q^@-j*XsM-WCO56x?gY|;(M&w&X75rz$u)@S)9j3Tt+naJz9g>YhC=C8-0sF zTo1~Brhn`GzClluEqK)bXMcYA8F|qDR(bcklz}6DLvEjUae({m`9^aY6Vdfnc@E-g zW=;;ZYbS?Z>D|U%+{Z%{%=6BoKe9KyM{~Uk=e!fwyeoV>nd9EOiTY(1yoc^bYmt7J z_bg36HeR2rRKIk-_ZBt!#qi(+`fvF$SIhP8JLA$dl&Zp()^j(9}9DY=V1YspjuoNmeJ#*ACEGvQ@gNk!*A6Y``Xzp+#JJqV)&#%ywxH zVgJx~BDX`@Tb+aK+*2HOi;KO&W+wNentqTxjH2`__Q)Q8b(^4#_j@f7hEJ zYtFI%FF9^9dl~hkl?yZ|!_oTRxc?-PLM!tA0Cq~W1~?6b`HJ$N{=NS#Cyf(GwR>lN;@wcDPYIt!vlp6c8sZ&ueYuT} zAE1p_hn#inb>-qbc@fp`v;Wmam+5h3J=!buD!m8E@$$A<9_P#JD}BOsaW~M5+lc&b zcgY^M{CzU#Q(h0rUjB}PuPS%wk75kOVDx;;_b=EF-7lGsAY6hG7=)#);|IxdHLiXG2lBe}Ap0f_!e`a77=Ai9P zw8xS04>GfhO~2LsM;19WUbFu9ruDyMip}3lPoJ~?_k#7mXY3~@jRjbQ)N$(t$h?2< zmhwYij+OX>^_!kUcBE(ESyb0rBk@=Ei*ZaeUvUi?J7C`{aswJRdyigruUEL|o>|>` z@07Srek;Q^RAMJ~qxYBgd?EMaAbLLWY&cAI+mA3tZW;Tbzh#g-Hk$Jor6<1fQ`Tr{ z|C976Z)xXC(K8?3o7PB2?uh?+ce}2*SE$j})TY(DUFu!3KA|4gJ~wEeljPCpKRAI? zIE}M7kBey2rnjTxJ@qa+FR6!-J)>Se$^Who(6mwiV79g%&5livR4?ljv?8XSzAT-q z=)rYV@3a0+eSL!-@8UyG+xwq>8%gbVgMLHekTx85#ofn46wLR2BHHUqw=Zl#f8k;b zMEN_~op;$w^!)mSU&#MS`M)6lh+mWc2l5}45kzHWsAEeo0;4bn<1hh}(55`ME8`v4 z9{J`v_kE=O-=+Rn&Z6%!9YYGuca;AJ%0Joaf2pO$5Tr95GcXI0Z$7==oCkWerlbS& z=nIgW?A;sh9NX0cs2XehM*I>)<$f8t9I?LoL1e=9)g$lsQxB{XH_5$Bq{&p&Eax3sbG z+b8OKmqxxM{f|ESAL@!|4RB6fkkUS0cb*&Q#ci|j81V&*D z#$f^`p;te@{MW|+zv&$j?)fU8CpiPNFbC0?={#})7GVjZwG&n1m(d@u-(F77);t&f z+x6QuZ`rj!DTOO2zmC{*-{QMt#5Vf6rKC~Op z?jSQ)jER$7!dWu6*!ufY>+kzoe_v$%J=r|W*t+=xE%eq0;@Y1NYoxah8?Xs2{+l*7 zUPjLtFYmbbeAq^>M9y#7OU`l5xyYU3V$#`7?nU)hM{F6!J_ecFFjwDiOMFu&XK->HBf5$aGkbkoCrfYNF zEOKZZCjZyuUwn#ec3hfl5pE?@SL|^hozpmr^LV^}Lit~-Q-(`hKC>?vyY*X6I6K)n z#yc1J`2CCRGZ(@8SCxJ~Ty)%3RMRUxzaDyAUg}&=Vxv5+eU!HbGVjOXpHFR74jh*z zZ;-ucp?}AC(`|ajJJoU1dw$m$Tmqyn9AKlu*hr&I<9D?7Kgn|X$ z`Kix`-hVZR;2S03DdGMo#z55e)dy9M;$$5XMf#;=a-x1J%3tpr20LyjN-zSWFb3n$ z{=RpxTiL#+jFVl$on%g3kR_W&oBuOX-@8;Fe7JJ0|J$tZ+j@zOGQhe+X-vWt#J1~S zkX2+gX3*zg9_pkUzo!0qp#G6=1Ckfm>$9~r3-t#PQzvA9SjZovUw{SvS7G11O61oU zs=Mn4=m+%Gz7(mG$wiJ?f@SFaSNo`uE3qmHf5-O-)RW!6Rz`$-o_jv5BP08NLlpjk z`b*xnf8+lRn}o}-4K3>RG%D$t^rOD%PWo;{HfeUN_JfSpiq8_a7yEG#hY`b3#8K17 zyl&L3H_w~xU9T-pXpwAz{;T#!_t!z19&ZTojb5v7gD>BHTrl@bI-R%C&HAn1-_PI*?tdEi(pU_q}ZfD2s zVb|5N>-1ro{hs!F3)#BS9x1cgf#_JwKaWm&7as3V6t%x6oTuUi<9RRYd(&GENdu?p znH~Ns-Lv%b$UX3191|*Iv zYwuYbRAugt_?1|NHK;zsza@SheFHY34BJqNkM>`73h%~VlxuS<{@L|^fuBI3S4nrA`5jRqkKiT=#XM_4mdmqxnM( z!b$g}o=iA)ljF*;4VBo5-5BM*?Wef5AJIy2HNk{L`N6 zF%O%k)D~a-!S`Q;WGUy^8E(kXbh~AjEye}J>nz(!40w()h`!?s^LZ9Ha)&a zzhH-c2K_#g+Uy45guY3%25^hIE!v0Y&AHYz`MvwqPlg`*aP&OgKSb;A3jSAV+PARx zGoKDm(JQ`cf0n;02>t2B$dCU^>ks^Y!VTN?L5A7y-7$kP6eSpeQ5b{rw!Wdwy>8#E zjFXvu%Cx#Xzdsv2*T?fw4>YMGj*%(HHNWQhEcJXAvj67V6I?nIFbPvI9W&7T5}V>D z&xUC1zxyl2_LnqA=hrO$+eFPm-26oNVd*Er{tE-b z51IyreSJR_-nup@>`e~|-yi>(uxIjT!uN`v3%g%=E?jrtQ|#wu(Ydvai^%0@(Jrqf zS0S@cJBqG_d<4iLi#2{*hd&x$*s7gO8&j^a_paYJAvUsaC?l)L>apgh(JQeNyRjGh zaS(?Q!%@VML<+ImzTwB`3d500);X^5{o>)n!nTRSLaX00sMu#L?#ic(Zw(3C@ef6V z!p^2og`D3`;2%#82)i#l6TW-q6Jgh|0pa_v6otL(i|j*bpI5*C;AB5*5evgxyPh$= z)i3<$W%J=s@w11~7bgLQaecvF3GZdxar+t5E;@cj3O{9-?58 zwgCN6jDcv|*w^}xzSe*A4H>f2m}3{270%tXH~E3W(1c^N`Kr)tu74UW^j5@%6^883 ze-{QzYbZ)ky|^$`O)Lx}=<%hl4WsB|kUV6K4HBrwIB_Zd^a2chyOzYAG5MUE}dsf=`yHCN<+wmyjQviNF!4t~ePK79~$02jh+o%uIbb)^YDZe+1e>^{+Rd|UsmLYad zKZmTkQxH~?t57}6x7NsYi1TIC^>uA)U1LAjMmB6@Z@u8VTz>EQrTs3+?icJu6@{%8 z*+y1kCwl+d{ygMf?2p1fvv!26nfri$hcD`tJkJ{{zqe1JC~h_WuVVcjp7+*$?#3AB1B_p?Tp4-q8=# ze;=s-KCmX-yDq(odhH_{GEQ$<`yiw@IzK(L@ImN6ir$Lc0cr1Xj+qZU!v`TFE;j5z z$dM;dJ@G*}MV>}gseBEPFPs%VkA@2$grsY%?<1e6IrA{o%2yrY!^I#(2Lt>+a>?5r(>V|C*(iv+S*-Xt^A|0`vYb41O8`uz4w9nDe~_cT3kb`<5G(s zhP%=^rF`8dAEM=mWBj+^ujPNFwDALJ8UZ?FWz6|2^%y(ZZuJ2IElv z=f?m4yf93l_x{xfVG=n7(cV4N$?TVv<1h6MGw3bKW163A7Cobl?&vZ`MW2V8(N;}$nBuj7Vrwl6RDdq53AJ=tB89ZlhkZ?3UANk|fNT=uDtY0BF;G_1#CgCz{ z!%pnRUi7}G??lGlXG@Te*Z&=+=k1dNY!`a8X0c9va5U19_rJ+kKA(G(V{s|8B7+=G z;1qiGBg@|@G)DL^oEGjW*H)0{aS@l1(q>#GxBq;I^~2szL}Sb8du$7Orb^$$F}?KL z$Q|+?pi6vbLjT{GYi$ubb-<%>*ZT31|C@b1TKhXv-@$+0e9!v7<4=Vx#>?*t-$&m6 z_dECgCj0u1wq3iQAv=Y;=09rNqjA~BrR?+J?DNU&Dd{#3VOyi+3j6$m^#cd&dm_Dp z#o8A1M|G3+1MSui&?BFGwExdQ`d}peFWP$`AwJrRV5qp1>nb5fpoPz26gdWwA0og0 zZ-qJj!V{2R7a-iZ**L{|;H}P%u7A{;obp6 z`;TsL%qElR67eI-cfxyLrF?6blgfUyhsJ;C|6{+V{J#|$R;<`=dS7h>*qHu)&D0`(rccmY);ny zr?+|@TRe}gRi0l$f8RWS+tR#?`%(JZ_Uw!HAGH4Op>U6H2N(Qro&*2iQ)JIG-YNBG ze|qJY?Jw{Za~0?Wa6whKL@?01|q=0G2cT$g9xptPfhu{$!t{o}NHMEt?RHm)O@g+1K~j(s$U```FiP_ZGCKi$fc`zWo*NA2R4f z7qZA9wzTN?>tBaiqwbv5JFI`_LzqsFA7K0Lu||MC3&{s;-&<_pdu-vY&xSeTqVrb# z%09)N>U7sKPuv2eukj0P+WCjG z|LJkpQrE@)r*A;A`@{1)cRl~cCUIrhhDz+jZtTT=9K>P7a1{CR@C%;z75+ZqnwRbT9a}@I@xb3|Jv|V^p*kosCVr3PtTNUS0~yxi+&!t^^QZ=AnhzJ zii;g$-;-BSz0f>3@;a)Tl&iJ+bQ{_B!o6r1ugydKe0^EeOg6rUf7k!%KX-8-4^gm0 zJAwWv#=qz{C)xk698Ra~hJ1~nr2f1O&A0Dsgm?y6LtJb=ZJyX4>$RH+vH?lX^ET>WSIvg~jRxG|-d! z@A>Z|_EX-5c>krd4x13|Em`s0bNZj||1J0bj{CpM{om*QZ*~8X>vI45MK*xGzvEKf z>OZ8>a!vhr$v5GowJk~yDfQJ(GJS>rOn+cEeJ}Q-Ltdh_%$?fruF3A3`2u+Ra4&A-F7D$Y zYK#%r_O<@rI{G+S-^cp`H_!T`^MA{~;rou#Sb|Fam5R6Re|Jv*?~*dst$&R_n*Tdj z`Cq8~FIE1@{QB=(`hUpJ|1}1`OnS?)606YinmGcm81toP_DFk&zC3*c^7DU%yS6$P zxhabGjAG^jl+mk3n=3(9qS{=6o#bxBU1QyG@8EmhMKZZr9Y^l<+kPCxVZ?A0aU_vK zD>BI81Ww~DqCP|SFO5+f^Y8gpNr=|nN50Lih{i?EJLd8Fzl-$CxQZV1e$Kl^-iUPE zj-+(%lK1ft1^>HqnAiRk*&oFii2NKHvd1`NNXpbS(D#(Mj(@ zcBA?M`T4)K<{osp*0i|??4C4Q{1(msogtlBn1gv(fJIn>-hX~Rzy6{7>)sLJ9&^u^ zlPj?bYY@%(V=wk2-+$F_jP}1hD4e%{*}xt5KD7VYyd&)7951}>|NS5>WRxOmTjq)|uf7Dll z#W&2BFGPDUB(TLe`_}i&UAw|xb;jPCj*0g0siv3ED?ISZx3teEBl}qUtn4I)X`jiY zaQEATL(gaU2tGxBNnSosMaTVeWSBya_To;=f8^^+ zuJAtgQ6BqwXBRu)LVb!~7lrALn}J!#{C|?E7{0i&imtqQ^Gyk)J4*XzQeJEEdQE4^~vQ}i6ifSEW9&_pT>NHAIonq zAL=UcYp@O*unF5=`il_lMNmdhA8>>^nq*FRr(yUxJTLv6t@+NZs|SbgpBWtXof{b5dhL_;sQhI3!I^7orL^Lx+%+sz?D&_t)GxesZ;-JsK9&*mQONPVWc%=4 z4C2GU7;&*Z#bF#d0oCc^Fo~Rks;%;MNWNY+k3x6`8fMCuJl4;bPtn&)TX#dSpK7RE@W}o+am-jpE{np;)$i~&moA>qn6$>r1}(vIdXbdzhzn8^lHoLa<_GY8^AUD~wk>9R`kLse>#J*vr_=~uX8>rr6 zP3cJf1$w;8+`F{BfarITyl4FNj`7!h#$T~TJ+l>Wva?gW>}BqlXzpC}4WWnhf>)Fk zbj)R|&t$VNWxK!3hDQ$l{Z@>-%GE$}FovQ8BarW(Xd~-h(~hBDdzSB?MC~{qLgTd$ z=O2>McL18-=Q}{lV)GCAC)(Kg?dU)T(cFVB*AUG;h{h*k$N4~5Rs?OQ} zhaT_e+|ECRJ{?JIZUYji*G7MI|Fao>&;F3jj#>0Ms0h~6|J3|_`T{J%5-h`V)M&43 zS38IN{3-v#O2@3i8mz+xY(g2fp^Z(^&bG+!Kl2{@mreTz`=2$jKib(JJJ=s+)-Oq; z#rxUn{Y)97uawSC?8aWSXn)h#Pk;RW9HbvcRHn0t+OSS!*+kKtikRPyqFPyxlSx$3 z>&6Rv=TgGuztC?`9-@8q>xU?Z$bM&PkiB157Ram7Z@7*d=*4aHeAl=Z**&!| z+$SHR;Q#R6qWp9G>trzoVlaANe<}Z(fJAMXcA^YysFo&Fn-sg-J7?)W<&*#6e zG%TX0Jl7>;dWh%S&sq@ra;!v0ANC14(SF$vKcIUxS zX~dC43a!W>hZAUfMfpPqGU!AXvdE$FjQn4ae?)s7G~0_HT`T|YMJr$(&GbMoA71&RU}utHrJV0>stFv3q9hl;|6+h8+UOZ5Anob-8Y1S<=&$< z^%MHji}45JU&>tlMCDHzY#@`u106FMLs5be7=jZD37?ERkpI9lkf-Qv_C6Qon|f_D1n_T#7L_wVW|2vg`8X?5%|$Cy3?(YM30 z=tAdK`~HVdg_wF|mfz-}TH5o-1*oFeUEo)|V*elEC5YzCBvF4-KFOZ1O$*)sI^7<- z?&VZ_M>r-bU(3mrScRy3t|6oG&)#oL3+w2c(EH`-kM)+(#ZS$Y( z9bD#^ZK%Xf?8aW~$3YxMn>`8I(Q&{Y27ByX&~6U{WY5{d0F8aUKSkajGBwQmL#BmW z$X3VI9P<7qydzcK-!AV2nGkLulbh`yP_Ivbm~$OP97&|miW=>CE$T*Ve-Mo+M|;cW z$Nz;lI3|M}PT&+y<1Ef2Z~yLL|Kf@LtDS$`zCYRj_a$((t5Mqi=>*Xd6oH-t^(y7KnF>x{~M?3#M>WqndqJ4dhMZlmgscJ-2c zA^-iW*Nv-bYm-ambF+NXYow9ifAF=B^e;Bbzi<+FrE?z-QShp3Kz|ftApZN`-&FpG zu>WyvtNf!GX|$jf(YFH!OJ^ubFaj<29COQ_KlIEM`NtUgION#+S#&LA|6_u<=o?Ct z$SH{SPM%KAK=cizXzoNDvxMiMp^tna>boS6|Nf13D8K*hpD6$H{AU58y$_a<%di|P zu?l}s{@J*lhm=j_GK*aLKQ8}UR`WkpDL)DIw?63_X{MXh4_HSp|Ha3`2C_rCnSIjV zCH?WzM-EZh-{kl0-+V2Uk=s!Hz&Hdd>G4bIfOEzQqV?~c%Km0`8=^JF^~bGebj)t- z#eVcQdVhXuP7nPsdcMuK@fP1kn{_z8jld4 z!wKZS|M&HxaGHJ==W!92aTPtdjvMI3?MSls%j=7*|M4CQ_vjzoCm*6NqJ?4he^W%@htqGuyK#u<)i!NhAony8CW6dw0XpFm*?Lp?}|8F<0h-CY- z;goZY^1m?{hiLDIt%$}JCJ47JHolMiIQvF6>1%9LWQQ1MM`M-w|F!1-ODol6{{LFG zEu#JTTTx|fJ^LMXy|k*?%ah3Ih$W2qlhL;bqHhu;6KrxUay_%e$Hmn}_H(=a>xC=+ zMmztEx}P5HEtps=cb(U%~{{>`GxF`X;eziZ_mb%>*WkGTbCKoZORXE|14 z71m%KHeeIl+^6xybwCVvDUgBn$pe_pXoeQnEj))D3k-@^!Kx6blbT^v%Dda6OsO zH`u}twDqQOapk4A&He?B>HcYC!x_`~7yB9uXMbH7Mv%QvhcGJtJHHFr^Qv_nf&KJ1 z<{=BGy7Xnpbb@_frO!{Fg6ZgZS^p57%4XMG&xXw1DEX=_?EWjcxLGLeI$Locm z`(M0=!adK~=aO8BRak@Acs>xa{Ljq;CO4sawDrHkKNia9kJou`qwhq1-Oqc*fsPvo zB5U5){v-OvYaI3T1fn@+(O$iS{pZaykLC(Qa|SAfccbDnkLCxoo#X$$q>a*!Wze}t z+jK}9wT~Zuhxu=_wSV}7@9*By{-G6(-RyT9>&Jda^APqsTIj9F?=MiqeixTPr?@UW z-rskxYdPxK?Ht+TY=> zKMHGWJ8*)23OVg-7F~#Z{insn^drua=TSXGe}=q_suhoX2XXD~RpA~q+;cwEBY_(K zt!1Ov%{KqQJ5kS;Pmm4XktBKDe?B@t;D&H7ZX^4(KA~-^@o(ea9ow}x2iX5R)a_mB z__VPSebUBK_J6neKS&`z{{6o3Z}w#?GJTb=OXkPF7_EO)W{Z?BagXQsMBmK0?_3X2 z?)QR~?$0&lQ{C_sJ-=_}%huD-i_t(&&Q`wWD_% zn)bN|?n!E{dq7Vwbq~-g+=3a>n1wl*hXq)KC0K@!+E2@cS78k*f9sxodzgMfVaWdM zIp6R3cvvTX12&-y+fa#}*p0o|kAsNzK|D;xa1`+g9QU3Zc?}z&5fP1sUJ(};{Acxra=fg_J zT*PHuMfGC)|IdFu^w6Vk($~$lPW)-*{dLdw(DNZ$L!UtX%;&>($E5m|gd1cpTIjdQ zyGYYJ`kEI|R1)qBKSWkIC%$u#F?{ij*US&NX?_5idSHG4nSS5=fFb4wpl!7I1rw!H zDjl+Ofd36M2S6G*X%+m9eh}_@Kl_u#7>L0bikfr2FM#|!fUf`M`vQkb>@D|vC~@2f zjKUaHi>tyodR*S?UVc7Ipie?le1mX8e7$@0(e`(W-=|{+%A4I|~9~pGw z5BC4}yqj=rG+!T@OZ5#${(k;xW;Op5Hb`p|p6t(mg?|dsy7@Bk+fa#e?_$eF<$tZQ zE8#kAO|-|%}M}=*>{xVe5Xh*$Y z6$d^aD$k4yadL8b2x$S&wVnyeM|fPxlf0~!%Mx{%u^?(=;x8 zZ{nD+JMkBx=kH6x{}TfLojv)Lu~W{m<(7U0vNcbI)52$Q9?|pfeX%55q+dq&yUN+m z`TxG{`wE@r@{8-ib(Fu&1}3ww`ww{=chUO^{=dTK!+rWg6uhPkpg)Q+5Q8xkTT1oU z`>7wadnLlLd&V*K%Z`jM2=AOx{${TYO?|!{j`jO?NDbc{n*02HNDun^FiX0v7~wzJ z7xZ(U(ZBw#?<)$A!8lC7Bpgwn{&*ofd~*oz+FP<}&@-X>&NJbtJt=F3+i&vZ(|qXP z3h{p5;P;E&UI~$LfDf)c+{e{~(jXQ=~r~GcXHtFb@l` z2yGAaKXz%Ox2r1?+T?a^Goo)4=c@EC7W)3%Qr~|gpUl5M=N(3V|KW{=VTp8>VL4W! zrK`{wO<`C?&lr#FFdn&vz7DxT`W}<@LB{KgY%L5M#KlSr!zQu})%Oa+HnI{`^tua$ z`W=N~r|@nxY&NfPzJAC_>+IH>fB2gI3F-E1Jmm-_z>)R}q|Aq2DSDZC| zG3gvd97(inwnh&rdZx+RzQxw+MLKc^__p?0<7dzR&RT2y#KqEu;RJaK)sy>%)8tuH z(d&kJu6O!|^THR=aL#kR;yD|4NJP)R-58vHK>R&@buyt&sK?{|KQ8;vRrKIGZlD*p zaTodd|7`ro_n6=R_@rx0CJ^lipOAJ^)))8fwJJT4;NADQTVQtw|D zzKR}{duOt!_=0}OZ$r2tt`{}O^})}5=>OBN&wu}FA|IJ@ktAnwgAdj6~L3X#$L|0U!yEXPW$Ld7@qTb}kU z0D80+$T~8;`O*052KpwHp+jFggU);2k8_>{a@gj#m~vQ2?nHH$??8}yQI(G73qP9A zzhC$u8iwc>qJF%7B6(O`G;bhA_I&DDJGkkGd|TTag}?e|AxM2-eO^@vV`{H72&2=CPUibWl*rT27kE&OGZ~qqy z4@C6s!sr`@dHes4{{Kk*|B0TnetDd%AM3g9`ThEz!TvK8B^ZHGc(aSo99z)F=5J5v z|5xe%uh#!xum7(fmL+q-e{%g#7uz?@2ClUyr}V~Q0?M~&+n)1%4SL0Izim9u*mk}0 zB0K}LFbDIn0E-aKKUqR9!}q;=%gGkyFPeX{lAhVDe6RQYYx)}GCM(|)mG6bhH)0d{ zuf%V_CX}Ii5C8p0YYypgd8|uEwx{xe%W7-hY>@? z&*UelE9gn2(25LlIDu0*jk7q9=sN)Azy5ZJ?Cp!fJzppYm&vPo10zH%2d5?OfW2{Q4iyJ@UUlC9Xe;5!C|&$ttos;kd7TJ5+3O-L9eT zruXBT_oLhU@jm-d-I2sl$9J<6$_vXvw0@xHUp^j6qPQ;ZdnC;f z^Wr!7*53#%j%!W75o#8`5o%v~Bh(>|dL+<*Bqlh|QF)&fK_84Mn2s5!k!CIG5J&x< zGVgyG|9ct#ds&#}m^qk-1z3b7Scc_j+g_&sU#9d~u{I^SD&iD8`bk*+1HpgYZXD^SXzt2~txSeQt*?mF&J~_6AUOV4$^!R4+?svld zZn)?9-qKzxQ|O4Fh82xChrjk)De61f63@u7AM5T4r~sm$kV8<-5k!6 z=TSwk8>DYP%=d4FFQegtzWo*bjK2E%~dXXLO5ER9JRC2*!d*KO~G``z%0zcJhT;=|1p5?5l_}vU+}*x+IO;1J8{O-* z?*C%&Jy8}mMR8wwBb1RnfBr_e>z=Np$JvT?$j@sdliJ1TTQL>t%*q$k z)7$O0k4eZmn|0Iz@D>BI81WuvteeFLw?rHzgN$)~-srFy{{-l4?hpjtI|3==@ zY~$!#hpm^)|7%h|e#HJbEzRm$^Z$-K70%M*5A;oNo?ia0cj7&LljHg*Wc>x_`MGB( z9~b?887;N?6J!sfeF*dGLe&2ie7lv;crWPDy0K{OT^F9@f3du>&GX;?TkYPbmH%4f zKgNKrOX~)DaT|AW9}n>#&Of^3{X63QLki7^_84#3XQqnd4AWejK_ zJ-$NOP<{u~ha#!GG>A*AS5{CWZUjbQ48~ysCSeMm_@DTwrwh-(9Au{&t0gM}yZTf1 z2c$>y@fVTlQue__d*jfTA@|)sQ@+>==+tj{GQOeTv)u1J-|7=ql9B&x6}blMumPJ; zhHa?CPDFc|?hY`b3#F710QD|GO{P$Hx z)dNrLKjEx!j%>VS{1e9&Jdm|0SIx zg$!~yf&Wi?aEk0DdtX&Pzv2G<@|h4*C!ZEywee%&EO{Q)Yt_5tWkhQ!>X5jn4{^)8 z^}ap?+3L4hxQZTJ#|`x2Hfk2Kj~BB+*{X5l+Vu&x=RP*5aqHwA_QM|bEsi0DC-!X< z`xb5D+7Gdh{U@@+JB`bC;qm@_seaFfyUueT4^i+zi+mlmOc(`?|J@+_E^cB`_R5~kFy6n|9zhScK64# zf3p9Z=bjeMKYz!i#y%S+NM{nJU^-e}mVbGiLC+|w9m2EdbC46xD#Kl5XOnrw;$p8^ zKTa+}b>C;h5^@=;T>*ZUh~liZ6e ze@@S*ltJxz_ZHte5qB6dRQ^nRKeaGq`mirvv%ikGICA|wXK_!)FIUKi_!L@^L5sNb zdcFqw37o=d#C9k#2h_o6bSexN6>nOE$$tH^%2Zz%st z-_S$9jvMI3ZQR9uJj6%m&lIdxXMM){p07Xa`=3yc^%qx+ff$VZ{>-oT4MXWA7=cmf z{r0n=x5oRk-n%F~4ihj5E#tjogZhLi^vp2t4}Cg)26EkO!w2e6bmA#-voHtqu-`Q< zAQxc?YIbR#wQqHYv`^QxPY?7F&S{^JL}L^GK|BA!Y_>F-7qhR;|7$_?ecdPD|Cy|P zM*jOhqqUnuw9kn4|6k@j%drxxP`%dvHw%4Njvm(!h}I^qqi5goEVauGU601_>&Y$i zy+4b+KiPPortn-g_{!;w|GK>o~-}Wr+)JN7xlnn|4w6-`=cLljLh#(OwX^6 za8IN4pV8jIyAN#)JEgrFdl7xZr)vDi!+v_S<{?_cc#wV=$;GZsT=b2`Xbs~(?RYvI zKHfLH{a(Lt2tTrB=U~?}#;)w)*j^NVaAa-RKXYw(Yv%f}uW4=g{*m?ea9kh0cWqYFCneOQ99r-?P8rV&iL1e>Qx-U$MDMe-<+0d%yT;Ys2IVZ*Bgw zuy11jP`m5Xp=SH1L%h!?)YYF3iM5{$_32NBWc%ksL!TGJk$7 z;uj#>R^l7L%K!WPFT&9p{|esq58nEP_<`R*g%h44_Jpy^fDwj8viz~ni%kWsGn%9xwzEol2A$RM9Yej zu$$bAwD=C=s~PMUK8Wl{**2BvNQa25kxR z1Jcf8Y(GPG3U@6uKVZJ|%r(9#jV2sJ3e8pK7u1?xVC=tjpYi|E_RZHfjpj6ZSwc>l z(YpC)9sLP6eSpe$i|P&9~MT@t2Pb`)%!jc#?Z%M0-mhTzoq?sp#8n0&hhR~@mu$A zv~Oh3xBKM3#j%yYE1Hi#LtK9Ufp-hTEczVG!vZWq?{|$o&es1&-BRtldaQn<^wH2I z{j|QvJAK0v$1cNiWIGGPN^%v}plz)C?>TmOt{F1&>2+Oj|F5|JXWaiY`u`X7|F0N> zB%6h!wz*}P`%iY48~=a3B=nd&(EZo;_m}1dY(g2fp%Oc>8}CQ`{@?hX!dIRTdxcZ` z%&{O3;xJ-pX?Kn$`Jq4Fe>wksM!t;P9@i$nkN01m6#ZwuFQ5ze7OIH^6ZCll=dEj!E&#G41S|Ju(Jj(I$OaEg8!XYuc?|Eqaf`#bkh8~;c9 z53KljIPbp~aT!<9gX_3~UbHP{|D)qI?K?U*YTuDv@u+{%c#{AB4F5lwx}<;dp8g41 z9NUW66??8p=PvH!A*!#j|L@pq?OV!!Ki9^0@D#m2{%HLlAA5d&h5Dn|?*kF7|BL*8 zgXtANVfX8Q45gP~1V&*D#v#A{Zx{dnA^zQc`sM291R9jt8|M8ky%7@y&KgbO*l4N`NtFg6TLNx8|3>j(pi8-Sb~?2$N`T63fpiVfx%Xs27`}7IVz%0x`i@5ZiXTv;t<_P=U*!KeZ zBIK&r@9gZZCiXj7Gl=~?lAX#9j+6DJ>~FF`I7#O1!w>sEOZ0QV^FwTF+`h%tHZ|NM9Ui7WOX6dJg8-J&_4l!<8RUB4Hr{V?o;w$#gq*sfp zvW|WoJ>G2%{ej}JfxZdJYt|v$vv&TLHS}bexYYZ;RY+E%Wsh&Ple>|==DOrFBcFSP z_v7)pz|I4%6E&`**0trozc5+;U3+4o{JZXGe#AllIgA*NB90_dXhqu{@1OGBfebnk zecL74<1vRP-(Tn_|7h;k{-Xu0+NW5eC}gCQ!wH;1^#%Lyvm;K^;}6W;xKk9)($6C~ zQrSS_igyEBmMXvMz&BU0zvi>O-)38z57RQlSefIa^*dL|{P(x`^g6|NA$y7ajhgND z|JrAa4{^SMdgFc3y!?i9Y(3=d-9SfwDvh#13ig81<4EU9}Y1*#&X=BF|Nl(iu$qK%#1uD$Ij(*E|< z{tnarl3C;?^1o|in{bRxo^ot6dogYPTMJqd8_WKc&R*=tK~#&2zH4`w9({AD4l(*s zB=@oZYuW$&0QKZY>xX#sJ5nhBns@B$#<=MjRQyUi{1xvJ{RB?oG|r-@urQn_yPNxl zi{xcojpA_~H_(gQxEtxlneLMhk-BMbo(@o&`!StcXy{v3% zFQPfvon*~O{U3R)BcF_a%xB}zcK`K_ws?mh_b*EPe+0^3l!iHgW9V%Ulz(YHSzkiW z??0t&$?rd;|I&nG%6jUg_Fum=O-A2{Y9%Y)nHr*f*T+dKW=`J(vMS;G3g+ldqEA8g z1$(dXF~kwQvx#BeVKP}{oy*JS2RLR1qCV#=GIhlK8FC)7Gu`vK%H?|Hp6uXz%dF<_ z_uC>Y!7^M__Lh??u?lOj4mH}T+JztKpKH4k%5+1M`hg#@QC-n=MfoOEH8FW{O+XhMskR}4{!}+ z{c!!4*E}EO=V$u;APysjXdQ9+>wK4Q^$ACXdyE0a$s{WN+I|13?~>D7kwJO6HNWHu zoI_9G_G|ZWu8q#JT2;Q|E-#;yIxS>4oOq zyXJ%AdmcG{f_~rh&7FlJ-zsHx&AlvmvKVe8MNLte;GOR z+uQD_JMOcmxp*p-r&T{U=8|4}VUG2`=Y*m3T&n%Q7vyj3_a6sw7{yQgKZ&Dc zHI5^R_NDUItMc{h^4A*qYpZ-t791DJ<}CkLoF{hhf6@Ae{D~ae5a<7;#gjoB;yysR zrTmZS#`pQo^6KQBZ>YQatr*FQ8&B7-IVCL4|2spTMXkJfp1g<}*ENVIIa?dPqT~1>3L_ z?eFM+Lgy9!?*(<5hw3&c98&*7^9uEkvFabl%vS#pXBN4H`Uf(*>~SET-Pnu$$Vp?I zqkoW|zrlVV(*90AisBvP7tnpn_=Wq%H3&P7B+|&B4SDo^uPhYFo`2OYLiYYnd&xxh zBKkA>y9(N?;y%4+{Z=af_#fx#wbRvqJl~7-*e24TkKi)>D&pFOaSek~`~OB`4TNQs z@4e(r#C0WalYPk2OX~vp_U<`;fP!aJL}?x>StpOw$9#e$8neEC`L(n@R!aNV@175j zT=xY1{%>VI24XOVqFvux2Rik;UnDQ1cHf|ImAsA`*ENVI*=>(E z$2ZZ$r%m&18=aRz9ld@P{})L#;@SRo^xLlK!#zB}BRoOBExtvxTg$Fvv2p&$qw5Xh z3s682%_skC{XNH7;`sQ|{CW0dr{g$=JdXW|V}Kf!6)7A&_I&u~ z&3_$^y!O}O?F%0chg<$r`0>X76b@bZbof!ve+&m-|BvB^YyP9YzVh&cRdd4rH|B)z zPoEX`z44#J_cs6Mu>FAY@QCv8nErqN(f4=i%ikCr_R@PlX5Tx{azDN2H@@G0P+rPU zKlQwiUHeG*Nr$$a`hMY{-z$Dz7IMe57yRD%w0|p8u3N)Hc=sF1BxOyJpHiS76?R5h z7TXFsl|^q}GygC#%iIV)>(7m+8B!jqMwFX>I4kU0@t?y#syFQ(KP^k`B2UZOXKkh#`99ADs`z?bu zXr2gft0(2@J)ifTl5uYTDe?@?;yf;*<>le}u!e_B{q4kw#BUSX#P1TVLvAPfzWb4I z+3)qI^`Rv`rj1qq>Au+aGBh+=<1>9_cxdWBG#tBYjDqv3-x&~odQ!VY;-lebYX*el z<~i0r9BKTS@%byX^DpJ!z0SYetX&^X^zC<+b!M~ZaZ#&tAFS~=MjAb z`U$!^%nd_<4_id@jsBtKj`??F=CS?X=vlsH?vgoW-PVnx_lf5oD$Fr_K-NA~|G3*f zJfbJvU&C1ID*cFmpe;N-!o3Mgp>b;e@WeTpRebPoDG!hvYfW5oFtTiDr?!Lqo&jN~ zK~2P(RFbI03li#b;#lC^F!fN-Jzku`fWPfSFMfz2w=}mar{#W^N&+iwb!gas(;wEmR zFJj*BJ+kL()+Z!;zcC;@BA=k&R^K}^%9erTU<}1@#!ahunFxK^#5Pe|9?mS{|){BJ^KGqKoQO6>$fZ(9Zn#F z*1P)ukwY8m^yk;_@&D42jB6T@DaTD6{y#f9Z1Er4uoG3-jlI~9gJ^e6M@m0`&FGMi z&s5zmKHJjzF6xBM7yLg?AcIzMX1hm+9NFf&%%IWXuy~H58pn|nmPL}DA0}?3=@}Hg zm%9g?qxP`RfnJN2uLs zEg-Vrx23=4f0rWz91q0Peuc)Dr4w~mq<^gbS0vGh6q=C6VAl-AaK!r0NOBCuVG`Q+ zsQ=4@orv|Uu5Rytqdc~ipG7v`;Q!y@|3Bpa-{t>bkvEZ(Kijg}{y%D+3GwuP(jFJ& zj+)@cLzqE-lPvXr{J}hU$K{xddFcD{=&*oXgeBPZ`#&U>k;~Eh6@3C^EXP)pab1zM z)Q2rts(nuHDR z=M?=6(xdplNR8M2Fq!WxERH)oPhLcB>ws{Xyo#*;hED!}9QSwKaWCTDK}EKqbl`G;2!uDOjq+`|Ku*1uN&+1_F88PA}j+cV0t4gX*H{~G)2 zS)FiRM!H%N=LO`nDYW^&>Ot0Y6Xz53GmR*&JN0z@U(CnJRqy%#l^<4)YA=Xu62?6Y zQtWqJ3wfYxGQOk1ahy-*8@|iU%74e>P?)M-Fxz*!*msJ*=>Jw0r1r`G zNMnL)CSeMuVFqTQ9CPuf>tCyDZ8hve#<$v@0kBh69txu9^WYC5@D(b8`K%T-G z6q*Nzvt+CroF^}$O?h{jyo%nx58*l)$3OIvecFz1lT{y&`vVRRrTG|RJ)iOHweZZj zL17bl6Zh}{xyStflY_z|dS0J#XW}E_3BBKT`QLHDaW~ne9bka4UTqcw$(}!2%bpyH z;TVZA7>5a%gemCz^7CODIRn)T2ZdQ=Ip$&>YPafte%U?`^wR!&JNUo!B}l);|8<-q z<63>u{$-}?H+|p!<@EVi{Hr!c=Vpxo=)9m^0Ht#DUt_t<-&&)cfbZYE+4_I_^Xu>E z#+fw(9RF-=<$q@N7oI{=9XuqZ@ds?!y83Kjk~Ek^gU4pFtZ>tfR(xdZqTY*}nhj z^4AJB+cj;-qli-elE-3u+bPHGldXU6+)jBjkFHC`ms~Txge=Ne%@f&wzn>6ihHQ0y zb{PL)l=bh)v*M_DTbn&uy+b{jta(fRZ_#E?ul#tK_7vj-dfeY*_jg17yyO1JIFB&) z^A+@8?)vWIMC|vw>Y5CBo$N(HSybwO)E5!Ur8pNk&Pl%Mx7+B$JyeugmuS3t1U=3@ zu63V}=ugmZhw>-o|7-kzxBn+g^8*i91I=#(F&INJ9KHInMv^^^%3ZRL{ToM4K*@i; z;rZRse;d#FvNn#-$*Z5yZuiZPgek(Nq3{DXpL{z0znnf7^U(K4WB-l+UqH_&!xxbQ zi>vs% zKZ!d0fP0 zTt&Mwy(8u`fMlWj7A%onXC<8JiWMy+sG|eM;LGY z0eaqZ=w#>b(I23AOxzv*qb1sKbq8V9>I6^7es6kq!;DoV2coq8z-;LnWSp4ep=e@r z(l^v|otL`rzAt$%*Sx2b-V+K_%R&)dcfBW^NO`YW<2cBc8t?Up_j=!Zre_a$H?L{8 zz|(b%n&@dXB8BZ8)+*o&zo|Vg_T^Om{T~yty`b=EdBpwHtzmyRvcFs9FS7A<`75h^ zzgs^YImS8TFaeV=1=BDCv(Ucy{r*RN33+r4kn?c>B=`hFb5 zVf3jR_UTuuo~}*PaW#%3iQ4usn)5g}$FX zPMjgn;yil4`8aWr?D^au6PL-WxE{lC6PX+DCO+8yaa&kUSoZ$Ai9ULM)ssZ$m3I^O z=oSCwoxJuWQ4rR>@kyd<^OHo~<98GFwAN#ewvCy7kYlSC^$ zOV3?-l4wJ9&6C7H@eIaL3`ec78gY!IC&#)s@s6R7Lwcfn6PB9pUT40Wm>?{3>)pg8 zatd+>-c3v+XCN!Qv*q1HzT@4*EXU<2?F*v}=-T&gq8oLiq+ePaq%|q+jbv)E|3DgZ zT{900un0@A49l?+?N^jP=)9x+=@BowQ9u#RQ>EX1o=Ci#$e>l8&ng3QWZNxuh1BE3 zYVj1lpr4gokJ<~56E(s&(36iJCmQZQPHduYL0Ub$iJebfdYov)_J{H}-dq^Mu5Wxi zY;#VX|Y18jUfnG$jeBGi9IB{BgUrPUDhxS0^=xsF+M;=9- z!Wo=J-*2_wkv;m)E|R@pQ@0?ut2gh^ha1PlTz8y#XpSOT`BnWs|DZ3Oo?D@ggWL4{ zeB}mTrH_6OMP*6>-JWCDV)YGSJ)iZRTkEs;_tp$?`~>|feBT&|!5E6+zWd?iNQ^<> zzv;&(C!kuHHi@jcWWP0X8fxh?$XQ6r?+tgow-)*Sin>$QdqmH_%Gds{_wUO>-*?LF zw_-eh^q*!Mf9Se6Ze=bx4-2peJzsF{zghG28)a<1eMp3r=5hZOY``XL!8Ytf`$qov zR{uffNrVz=0{0$Ft2%=Wg3`@|9cH(6T$j@^tlwhh&zv|kMSU7|`{yRjFwuNeP7 zeL&bxPcCNT<_`!5>4#Aoe|Ugxdzp`EDw>gNX)$QZC{J5X8 z>owL~{MofZ;rNB;LalzO`oxRoIDa@KPrn!%&J9zy84?=x6{YnVHEkSX9711W=fu85 zS{xZv_S{cY4f!95ikbf{v1`wNOZ?;J|B=|;b2ky!pUeCGpB}PB%9QWU{_lxBtNwf9 z`}5yT>_74@oBn*L{Mz&3hb{jraqwaP@Pk+XSKW!*sMT*g&gM=x&THu})MQ9sRA{ZG&7w?UU{x)b`J zP@Juw<_-NcIDrgW7wgAC4s8pCy;2tLiRS?x;R$kE%R*Lozg@oHY?-#5vM_)?5XA#! z+8@jKPU=V)EG+I_H`(LRmB&9X22iShd`;b%-h>nyaojl> z_n#!wh~xh=WE-+Q+6A)G-3`Y@6xjYaFYnp-{~_(1aT&f_93<0{(O{SN+oCrax(X4(IZ?0@_pvU$4nYb!YMocls6+aLFa$f4~L`(HmO zTo+F-ZsImd9Onv+x2Yx{~+%}TW_?7kNo}w{VKh$@A-zm zC9nLhEDUr!7(+1}BQXZ!P@0$F9!u*`j`v3+0%L(z*`bLEUN3`I_f?#B=q08*6-L2ZRq&{}{$LBDs)l zd{x@eL{Gn>|HHb(^Te|Ni%`*C7M769upI4=wST?H|u+|2U@(ep>s- zHFX0r=eUjR`4#{F1!MpJ!2f^VI{)HajkQ>Z_1J*IC(6PWQWZG#{T+TOZ#aH)jrK40 zywv{nx^>--7^@+?3fTkd9&c%vL1mqK*l)a#6?_-zihH-kF_hKH%K^^agx%PS+E?|z zDIo

|5G>gZy_D? z|E2z6xbsG04920hKmSpFm_Sdik~e(6lju{BeofvG_H_N9X~Je;7RoUfJ=Uw3N0!Dr z4pBZJj(6<)!-v8G;ft^Y1#`G6e!+*PXL{&ASO4M9uXMZ`Id*C-xeiaq>vU(eTdCI< z$+*YCdcXBV|C;Rm2jBVE^{2C+n;dV!Hta+dc4II0;~@IJssB$NsFxQSq$#Gg@q)CW z332@MVdou1HI5^RG%{#I`)l40I-m1?@NE7cz1ZLT&AvbWkBn;twr=)*k)yY*(*FMo z-`3C6{oAzzeCtE+wP#I#ul}7Ee-Wop_|2$rmOPKXFMF21H9nLc`#vv|S8*M^h-0AR z-X}NdPvXx*CRfQW!}>Udh(<`?svR*GVU=D=N88~#+&@M1>3L_RjBx=wf)Jx*pGua zjH9SV5@{5^{+>@#n*XxdT$fPx-u`#r;A{K6?|EY^$eu=HBgo$0j}Arh6wcr*&f_93 z<0|^TXucrXih6g)c-eA{zt3?T7k%F` z|6Bd3=ZmAmJjcCXw3h_A2urXG)vqW&$(2}*wWt+Vqg-1@Ps-yB^X;of-+=T)X%m*3 zF0ImC+FzG#Xy6McUuPe*HJA3+rKi`Jx8|Bn*n(~NVE@lf$L%-XUmy4(TaB(;>^kBY zfg*pqc@+EWoD+B0U$XTv`>PKjhc;xcIQL()4~c6x_9ExJxc}XLdVZ`rgmX&c|Guf* zAMgB`{$r~DA`d#J+J7A;k0Op8s3wo2wEvyo;u!j*<20JKO4lmsBx4_X-5v7})KeP7 zmqg<};b=k{8P~KSk0MUt49?;_+ILC+9`{Jby$ZYNac`i4`c3Km-<5uxK<2dcpOiM_ z(1uKha$G!@aTV8*8{(W1Yyv&cW_RKy{Wjv7{snY9r)#`^l9%-p2)~Eg`5!9P#kb3g z)jjq`9A-ZqVKJ?7&%-8EeA-+z`6j;W1{8j2{d?EObtImU*(>J1{-gYa9Q#+f-FW)X zXj?n2E{Uh>Y88jcM;Pe5!RY&vxIZV)(T8Is#$X&KU=pSvws%h>dq3g7zs`sBZKlw~ z2B&?qb))3R{=Qixg;mr&A7(gz7RpiR8WiS|^RNJ23*}{by?{91sQs4qKyY zQ5GO~MOz=5N3h2@LC;{3IC}rh8sB7GKX4ft+rD-n_LVPpyb`Oi4(rk9nQb69q54|C zu!XGrZJc*t|8aWlA>;q{8MjDJ^2ZxcMc<9sryA25$2>M-udw|%h{HIFY8*#t|1WDG z^nBO1={WBHlO{80Lq6*2a7FSI&fqLc^ZyQMU!gyp|96po70=cWJ)*2aU4MQNzp!Bx z|8p3>h|ifq6CXK^wXS*d93P*ryq(W{+3|H$n%h{_9G|!TZ*)H9LkV0Im z?}@OC`|7vbH;dd3gIBQQ! zzqFOpliCI1`u}t31^&ge{{MKdGxY%rk7Ita?9V)UoMX0#to)|;V-84hruqUuqkFvf zjXXWBN%}xuU*f!FSdNM^Z9#8Y^Mn3${kzrl;wNW@wPeFC`SFf#_MvYVPuIWe{gSrr zpYvVWs&&p=j}6#_E!c*gsKRdaeKO8@H=cps^LL(oR-S*&9Fc@P|D3qVIL0SE)ql|A zxLnw2k^@$x*f6WL&4aqK}3rSU(?xrY1F zHz@Xtsw4GRH$t5Mm-ZhSHItA zWXM*>A6)-uyYM*XFYbTZXRm?EW@GMut=&`HecA>}`~PVhxa>I2hqy{cf9HBU)?N_z zXX>Tj#BKB;ciMkk<8#vUE$*$}`=&oaY5gC^-N)Q3*(U4>8eUPxun&!7Y9U)errC{t z-<7X05Y-o~g+Ru&>1x@`q4eSS;QZi`j>ljeV*mRDauO;(uMAXfPN7f349r4ZJ^PCW zefdemIsGX#CG;a}gD-c^T+G7)EW#2j!*aAwj{VlgI2g~6e}7CvxA7GP6pt8xsr|lX zg??=3Wyb5bo~_@S%%N?VFlkvSp4C{3b%<>kasR*d^!y?A_ki^W>6=hwR|^l>-@EKD zxkXs@Mq?Prov39~tH|9b?f-YdbG>FPgX8^Z>h@e~JZCbMjqM-S|G%$p?z>5njlP?d z@1)6hktVkrqp{q4f(zJDztv>fv+ zQqsIa+L85qT5($(eYl4Q*fE{mAEYgl{wBT9I6FL{_uFHP?N?`qf#hIRe&(~G>gUQ8 z_5Goa|HVBHCr2VDEQ>Mp{6zV0ynZ_R1QbWPKXi|Ee}3!z^ncRs$M2M@FGyJ$_x;1Oe{o4GgaOjPx;fG@BpsCtnIxLneocQ4_3-!sjz3%)bRaP)5HFA)8lu%J&hl(ni&or`b_x2vCo79TRo3ev%}&3)_L$- z8}caPZDrLd@(j-6JTBs(Z|yR971vR>Z;tjE@8HgyknHiyk-Fs_T$rPM%KZMt=J(?S zGHBgA$1^m)|8?{Gy|?zU)(1f6OzQ)nYohf5hFKqAi1h&m%?aDpRd*nc=j-)<1%35B z&9lR8dhh3~Tksk7oqi8}e<%+R$VaIB5kJ^`?u@ime7Gz;5!UaY*b3yHn{E7=yh6|4 zx3@h8(}$vX&EEFtzG1I>3>P*M)#rr&qAb*$oE^qE9*0``1acCRvuA6|nH^G-XNN|l zhs_R6$i1e&+waqmrOzNU^Q{X=mLu*ZR77cC;*;tK57iYg*KhMsH`)3_#v7E@A4(~g zvc?~DYbPSpSF~>^3!e6GEO7oJEJ2mEVk+M*3+>9j4s=de7An*JWc=eh>K^W?1t(rs zR-kpJvK6@@>L6GyuHH}22`kB-#yMd%+4n8&p5!{LN8EpW1KIN}`H<}U?VPZM+=iW~ zLN#A!H(9gMekkOA)Y1=)Eo`MYA|t2JwIU^L6j=fA6!cJ@5Pf*M!yc zQzyUojJJ>TR>G^lW&CZ^pR~TPV^8eZF?wRZ_+Zy){uNLo3-`^SM)E+V? zfIN?yThi4nU3;`2d|rL{YitW!UGo1ANat4e*L~Ofy}@;*@&EK_mrL{SQhamqT*g&g zM=x&THu})cmUpaD#v_j|bmQqdU(HvfA19#xU(mzW2F@Qc0#Wm8rS^Bq1KL!h{UaBre4o6%&c_cXov27rZolIh!;|XY*E&XG? z=b6%pI<~pq_s}rW`det6>H9|$J&j4OnSyDUg>uYApfmlQ?L1|pcAB-!ZFULx( z##&VHKV!eaI{JESz$WA#`j5N(a(bSB-idAWohYtyZ=2ntI!GKVR3&UT_M&>S^-jnd zb?RFA`5^r;j-o+cP2$hjN9g^9@jowEzt37dah~4xH>3x#Jgs(J<a6;uOx{EY9O1`hFJIe;gEgwp;(faqkP(ktMH39XFA=ufK`x!#zAePMt5- zKOWKZi+!8)C-i>b^IuQ>!Iyoj!k_uSv)}iBckqAr@PBvlf3qJ6$6YtTbptUNA8h{{ z>bU&~|3Ah5t>ORk_q+K1-7EP2^ZEZT^Z(EBe=qTW`Lm_{@5Gty=KqszuB#?v9v&{P zkr;#8LHyt6J`%>!lSBBw^wRtnW3bYj-P;@PF~R>8KEXMYFa??H;4qEMx`&*5nBm@M zIG%-0WBv2!(zpNY{@i2TYkW{Nh%<>eryzwUq*3nLxtNCqScD~5hUIAYoI3VMKbiL| zy6D}~SCFpv+W(~guJj{wn*V=N`jJB$svqj}70+s{#X8guk^T|JQqYsr-5WO0HzD2M zy&>hC#_=Ed6Z^Eq@7u5w73zEiRDRC5G;5;o7Pc4raS(@b6m@&#{{!+rlI&Nb@*vuX zI7T%6SpTqdjw6XQia}kCY(pOHTjl?g@)+_c+1YpGG5N2E=3VUHEAszBd6jHkA#aj7 z$8BU>|EDOPQ#gaOcshRWJUx~vo$TgC`el^%^*kj1?(<*dRbg@cpX+2VYPB8QByS`3 zftA`3Ps?+T@1aRqkapZi#&rfC35)yhR+QnHH9b=#y_Fvt*tuB6q{L@BUBG^WEkk95Xh+*u5_83q`Vk zx&-?_-hYf`C((%1WcFWJT<7?VYd_e&an|v9Ttwkh{BO^w1Eu{D-;sXLq8kNz5zXRi znJWF$rT;nUA0hq7(c2K`KUP?Wy7H6OQ2#sSyLf9;@6CU_PEX!mgW z8hryPH2H>NU(o>948&jz#do8hOOC`CwBJ$xmB%}kBl#Wbzw~Z;pOqc z4ZHiCHNWYF9~lEjE<(j8jDaJozG^(ucRv=E(U)T-R%0#JA@Yq>U_$NQ7;E+lead#$N2lLGwq+dqbw-EOZjC;(-HPGtzjSBVF`)@clDkRZ(VU%xcRA{<1 zDx}{T73R674Oej;y|{_n=tKKv^WV{VP9N`See&1z&65Qb?~e}69iu{v{R2+yGRAeG z{{N-=wpZx?C)?(a4(;cRAGl=vfI0kmvdf(O?#IRt+%kRu%{A7aZ?XP7GJCAwf580x z`{wUEr;V)B-(8P}$@=g0r8kb(e@|~B)8f469v|Qlo}k};&p}=oKn_Ia#+Wde9E#x> ziQFCkfid*_bZHxFoB@3Tif>4pe)Vqu+2udpd`&&?*qE^WRo~xY^Y81&gh|euf*N|m z8u#Uz(mbx;dG^vB*EMKd$)}^##(4 zksi-Re(S#b-u$;}`TmsfGdPR$sO^w{TI3n})AHvs{VL*`0C8==)KvM0ye_O4eP0<9 zZj!g7j(d23XXzWJP0?}BZ$1{DkiFj-6Z-u?8sna;-__>-dF=@=m4$)A2BYt1*7*O! z=+G1J_x1;P$vy$iqr*_Y4aZ1~!8lAnMMYVN`wLE@XSFZph9$xj`ZUbIv+-m1rTwvI zKo;+4b7gDlyV;m5`}Yo;!gi(5L{FbI#@Myxn2UK>fJIn>Wq7v#?jUwon|&9$Q9u#R zZ?S(k@j9PpjrRAg+TY0>;{Lk_J&Wp@<_d^w0BTogk0)0n?g7*=-+f`N<8_F8|DX70*#B^uyo%WWaGmT$ssBOzab4kxM%Nj? z_;lac#u3tqx@+qHci0H__%G(y&h8g(i>D9wQ25I8;Q{#wPtfmx^sDQ4p!16Q{{?mO zhwA1i98&*3!2YY_x2Wr%h;@B=v6aj^j$?G%oVR1CHh}qlTf>&qW52@y@eV|eUh@C{ z?Y*{z&bP$PABgq-k{{qW`s{<9I~2n)5^?Ts4aU%u?yZ4eGLAk0X~#{DQ)Hw2?fo?y z+sJ=^n~lXJ=Tul5U1QPuM4f$(V=bPp=NQ)?SR}kx-^>y+uD`K7 zp7(WqHUCvRz|RJSm5x_qE!Lq^-7Amwn*O1~+=qg?R}o$GZgsO0!ZRuJ7>3A4$kE%5 zn8%PckAWWdhKg%M)E$wJ5BP5BrS`2iykD&M9~-a$Us^KV;@@qdv) zE3)fB-6;HoP!KKzlLw^zS^RW;_G~CVci?~+KBnS`VPM+*8k|W zx8l5e`3N=i2JHp05B-eevuL8neGHz?^RHVa{o-lZ>O8Vh{HgoW|JZqCrEzCf-s5@K zUc_Zw#dY-JPu72AS7V=3_i1?zrTo7~`VUC|5$TcMzu14*f7}$$ZS>(DavP=phA|KH z{9@^!|B>*B{shHC;@;;!UXgy`{eI{@v{;9M9EjSf9|?oWp{Vh!8wN?&F#97q9*L$) z`W4+{qwv^ShXxYzey=kCY9 zxBl;rUD^b3(7A8k;e(JzQH|sHlk#6X+rPX1uR7lc>wigcq>(`zdOoE-@)dIeKB+$P z3-z{-Zw-0jefqBYKI)sIpTZfOMP`L(@%e9s^Yn|jj5w|-t_6OTo|pc(Hp_K-FXEo$ z1#~;73)M%qhAQDVaT|T8y|gvdWVeQU^yI^>zM-w*0sRru!@eDwkaA8VKDhsQrsdn= ziF2}Rw&{P~7W(}t*1wf~W$eSPZ6S}YhuhQ-x9J1k7K-P#g#pePh~cgqOb(4YMq&)= zUjKHeU-9kGu=Lv@xocZ!+_5dB(1bLaU)~m4Z~_^$Zrr90`QHDxk?kwCg^tPFLg#q@ zKimJ)yGQx|Vg8>FRQ-5c80UW`U=pUFcI>yU-~a6}jh?LW-T-zWXqZ|e*C zb|@D%7xS(JF?)|e8bSZ+4vuva9-wyF)w_-ESW>w zC1Jjwo#Ls&ZtO+wkncxc*-wvox)TTKhf!pI3p3f^scbPBee>!O#t`|f8nxH{DjX-1 zsJSAqJ(TB=cAP=eRyH`=UNW`O_5Zajv zI8)9r9e}(RX7n_TwN9 z<0$G5@hi}9haGv$ZrtEk&`az8Tw~wZpq3Q-M#i!1t@LQSO8fgPw64rX zKkOCHi(dO2JM6k5YUmC9-S;5l2OOV46Fq&OogXG0elN|BTk1R{-;%x-;oZ_#?>w}* z<}A+RA}-@9uA>+2^ZEaXV`}qwHvdB%JC?Q0p3$@Cv&ZxCEVIbb+jdE#-*1ZNHu`W6 zv2Ku^sU4r5e~tYgVVxiP6BJjl|LA^I`rj}{;K$yt_y&*zQM=ig0CFg5wn~?ClatNk za6A%C;!2}&x^#|`{!8!o|NCB(Xq+xz&Q?bz)05TD&+*aUFTcjPejMWbpu#td4WK_8 zZ~vD1`7UL`k@tM`!VdLw_50?f>i;X$|H%wLp!K2pKXUZ89`*8%^Fv=SE>v98Faxtt zjz07G=8`?~!#uLrUVerDX^a4U5tbm1<>};C6jY>_IWGFI0=m&9{>*d6*a%;V)mV$% zHTQsZ^g8dj9;NlIIy@J8sr|>i*z_=cYR=h!P1u5M*oi9aM*9eU`(*16jDOFM@4E7y z-(EPUEIZ9_f4zTb*~pJa<`sVX3V!=+>kmxl$4~4Zdd!LJ`67G&B75Iu{)T70y*`A! z;(l|ceo1Qv?2rHV9qTJtC*UysDEfXmAoQ6#SUpks>-ad5NTc@gd-b7A42xy(1$`~_ zeEfZ@`VdlRJgNPkZ(YwvZP>-f;G@Ph3vWxHEO%@y%$>vM^-)sC|vgH3C;M@A29NFf7GyKr2;<=7q+(eukki~6!UO#x} z5dIl`0*bHl|M|ov|DXSQPgpfS_yPF{we%-szr((Ndc#ubTEUO{S-&tqSko}+8YP|l z=eW*aU5oUi;hghM3wN(6vI)`uFU|kDt3Mb6T_4BXR<&AhlRgyB)_+vq=H>Y=<#G38 zW&9!e|A73zNBUo8f002evWWA9+mLY|!^JZaV=xXmVOjS*fu2|QjQ;;5`VQmkoW!nliI8tFF+GL&i!k=z&@Zf{%tlNS$Bn4 z^6(O* z;^XxELgfRJ^fZd&l@BxDufIrlMp#=sFaCzUFAWZ-$TK*L-tT=RoF{v%yLpkkjH|eg zUToLKu|t~UUK|x4wU#~q=(g~}i}ZNTzw`_D$R6#`56H@Yk%li>+n)Xe{odB@je!`9 zp%{*l=xa3o|955f^Y};@<2cU$8%IvSBuqhEGj$p{1G7+$+E?ZO#mY{4T+^Td^XLnZ z9xwmTl>hkzjkD$dL+{)76g$za{A0&c^d>U4f8TVEv2UidKlaH1VUakNAkO_Oun}4(Nv;`-Wtr@oUnf=?W{U5Q9dbxO3B942j{1iL?58lh{ z4~5mjO5@x%@_Sxk_pr(RtQDRY7Td+E*L*0fbG#lKunDmZyarq7(e^c98+|9DPyfO7 zgX@+`|Etb%er!i=d`X1^g*eBr`k4C9 zD(?t!?7%7K#r=nCYNYEm=|Y7z%(KFq=y5LK({z^n-#xyO1MGi?Z$f>bH2#N7pNsSV z^fQ0OT39j8FZT-<$;-Hk>uBF7|8JGY$h>^sbw%DnfnGex{&(~L#eZTK|6e|AT`F&` zkpIax$8r2muXrjxtnNL|-9Q0RVpyR=4x};pYrkooV(`mk`;EoQ7E_ zN6+>FVJ_MGWzSumGd}k>k=d%86`p-v`M$>3e0t@_N6GJ_Lgz#Ffek734_#4)x3CZB z`$-6koVNtaupBF~8gajqim!bv^!!r)m*Y79uh%%%b@ZN3sq>T7;@m*)GA?TqxdpYi zy!#$w6X;3z)UYuTcG9bmp3S#JYAG9lx;y&+^_Mry)c=geiTb6{bVUF2KI{IwW;ga? zKMvx}htjP7bo*oLU)SsZN9TO~_vm_6|NSfa?-%O7C!6n@|NqeZ|C8qWqqRf-KXNJk z|24)34AH-j6Ub+*4wDU7)LmoNl&c0&&LOjk& z8&1*Bptwi>{|^2C&gpXhXN6U79Uab-7g2l3ehTDO)FeDt=}+Ri<6bnK_J8kq&OM&9 zbL+f=`W@!Ez&fScH_Ux`%Q)p0Yn9@rYi^?t_wWFZ@C5z-S^i^VJJ{MzGR}qRBD)V6onVExi{EIGa2=jzh?=fD7T!grH z#u9QF;vO0e>hQ@s+8iCPMANPJ^Y4B26ndeeUsx@CE!JT@HeeIB;DhZ4+Z^x2Zd82Q zIC(NN+5SRgm2n4^U$D*%Jtr)S!}NUg?KUUEQF=9s-SRHFJLKiX{P%B;3CD%^{)_L6 zOryuR%S;UWsxpquqlmum^b4oRGf{6G6V8(7aS@ks71z;=o9O$y(cw1P^RLDN#PE-e z4)@4Mc!GXM%~yI&KJ>1uZ>Xbs|250~n((U+k3D*LM6VeNFq{c>ag)IWw03fh=-pLup)_`rsyUZ9(xg9~<55e;ntQInqCD z6TTBwh~xiq{k3mh>mPPIF0I?kw(q6yN7D}V29(y(*r(l4ST&#cFnJWUFKcrqkE7;L z|M2F%|6)w(e+fy)X;gOnZK%36-2PjC8+IM~+whOa{wC~3#&2!-C*|(mS4W5M9vT|< z3>h81|Hgj^`)~Y*koQ~PZ$A-!c-NQ>@9qb}4tVcBo@fmp=_&gC6waXMlOx01*VH>c zJ~$j5YYj*B+9T72O&%Ez+ZXG{566T-|YKS zhlhPF!^8I`t_yn?uM54#4^)&Hr~fhIQ3iQm)4i|R_5$)>=W!8*kJ}fIypFyv3<#B9 z((dtP`{FsiiQ6$875^|e+>5&QuD*)*$Nx36`@b>%?(C;VeIT(mS;46Gi z-!FYAhGQhgU>qi35~|gIeu`tt-k-=rYsGUJQ~Wj!Gw|jiY1kLc4P!s`-_;+n2HXL2 z7{>FDXY-Hwl}$*ac`N^xuYRIGUw*Rw?U{U6l-6-`{#ltPoKwAd5Idl*wB5X9 zKKl}3%TRNH|Lfdj$~yjzSE5N=X*6b~!+o6SKV84nr$h7WpOTL0VYPGCqBZeJ?KBg^ zI{JESz$R?LHta+de)>vWL(4u9^kWMXp;o)k@x*7s&&)@x*{WYH@!61kWp=2WKQq+d zn5oR28q!;*hQ<}ddUuYWSJ|~+eFW&t)h{HIFY8*!r?W456 zc+Q>9&68cTJsWvG_VEYa)aiSKTMw}Nq5U6S`{sPM=ky@!#%d>|*IIk@lsL|y=92FQNnvLl zpGQ;H|5;nG(KV@V`#-Ax*T2I5SfQT2RJpNPy&R?Xhqw3}uDOh>xQ@7YO)q&9x6$sL zjy38Suk%01zu14657L4Y$eiOd0GB*<}?geiz)8?r6xMfAM1b)vNX-p{lN(hGRHe{tLovgal1e}33H82b53 z;~_36KQP1fvrvw?n1=;egud@+6MM=0kzwjfj(7cHQdmYVN9|1GALQ?q^yF#J7OUxN zk)}5xrF@URUfm$}kxeSK6E9_d*sm0tI@l>paLqcb#|CV|7Hq>#v@c|zm$J``+2_~T zXFh%R8us~h_L*#chyBNiYkb{H{NHoz|3>zGEBj8yuqyHF#$N14?xAyHIZn^JzfP3e z|6h{-WBdPe>~DhqOU6AntM~E4URGwIw%eF%*BwP^{h#^lDykhHN7F*-y00EUre0D1 zXmNiyanAjnbbrXcqkiK5@xl3jy^hbahrnqjS54HX0XuFyd?$MKPxVP8lyqEv^Q_t2Pc-6hG znG+s3C$q0SJR+YU=fC?^s}~{LQLbIeeQhld104@W!9B(MD&6N&_jkbk9dUp4?$3X= z9&>-lp$+Xfr0o ze8tmr-}4n$wRomDZyIJ`7HUUHzjMmz$(gR5Iw#Df&qKPuYmpiv{mzSPYsNa~0>8yE z28+lgD9!(S!MM{n|1TJaDr_ZIV=dNUJ^o_;pE`dErFA;ki4D%#ge};HSpVNiR$({V zN3p-`=>Kv4-&@-MaAKnTk5;xRiyXa8yFhIJ+bf>^IEcf@y~@APws(}CU&wxM$K0YO^@IT8!rMCL2FIaQ*d49kz z)z#y*=Kq~1FXA$;;yNl`Qs*FV;x^*^zdrIF;`(0?$VYgBen0d64!yVj*8uv{^}hzv z2P3TvXo}?kV*C40VZ$*JV=xXAFbRF1^lgwmpPdt?k-eWugzf(?W%nQ7_cb^E|D|tMUJL{@>T#)AJmcZ-4XUc^&wker#cEb{J+fqd#7~XgZJSBXkCAL{@;h`XXgKXjQ&YHdj8)gaVdQVpC-3qaQd9^ z8S*Bi*>$6~#sAyfB77UNZ2la^_iCq*!|U~5V061ac8se_Gl)KR`T2|Uv=2(P57faa z`wggb(-@==tyO=nQomO|to;<*73$#bYV|DM==nXo{MraEgxh%G^wBIdH=tc zvH9t(Y2z0S&PVTbEgkk5qK~2Hq_mMlSG{W#7w7+dh5R~V-^I7c@8aR}|9&9+BmC3+ zzs&!A{@;H(=I5BiuMzYA@5m|q31{d3vH2e{|F4hDjsKVVe}9qA-|!F2Ot}Br)ARq% zp`V$5^BDRCc=Y_g`VI0o9%oyeLzjKa`Qb_QOE4AJ*4I9mc(OT$!jpeLCp?{e1{#0N z{|K|VqEc-qO&%+C_952R8@d`Y`|8L*J`uFL%J?yWv@-L(O^L$cx^!$%kOC$Cj zypDV$vQIugyqTP`zTVV55840!#y{YL+In9wH^#Pi;yrjDqW{(h$PeRV_$1Ew|81h5 z@&EfY{WJJy|G%Q}1o7K&6K+9%{dacSZS*tizvuh^jQ{i$vrUm?*L7jDHTLVRv$tip z_5Txd!yS%E?`B_`YkxPr1z*DGO8x=-k;lo5bpoEiDTAUqn}xS;OF#7^yr)DW~ZN7f8f{R9-hB{Cp?8e zAz>~pKRw&eY3uqq;IGDgt7oz@6ktZ z-^btrJPuF5lW+-!|F{0X75e}1Kl&fcn-QKSou}g&coqg%sekbt`dEYd7tf==06BUV znI`>%Xt?&`5dF)R+xB9#8f$wg`3fZH9eddIc(w5B&`s~bnRT{fo8^t-&$KJwOn)2R ziT5F{|Mvm%!}u6JiB0%4wqfcw{*y@N#~)s!PoI8Ej5n|DHgdu{xPzShcYXL#1~zky zzs^5W$hK0gRNcsqV3m%6oxQ{9KM3#4G*&qwFk%Nun)`Ghh_5r zA^Bfhst^74ORdxYTBmJJ#__?SO5+D_Hi?h9Q_7l2GF``ix<;Ft{yR)zv_koearQ#qKZVR4W8>KN zBXsn+PvrgLO4Tvyl%67M)7bVG{0%LY)ek)Xq@iDe{QrY@s$&!C+IID)bG0u}$KmY!dwSkZC7xpo8YD?k6xA_CF+XZ$vud z-dC~DDz(3b--hf`&mH3{y$7<*b+v0Vd=x_x(vT{u_RfKD1l=3ka7@ENpDtA9@v zhMVYd&6^JS_!jzYc!d90fxdNdccTSgLjJpDk95-Gcicm!kiAs@9GSyc(D-A11=``4s=#qECFIApD4&dNx0H^5>Yuukkxf;ZKPDZ+{{GhJRpYo9}PrbpP8q z^sx%I$tryp^a~JuzT|yuVVrxf;U5 z#sEjEJpZ+xKgQ>I{>UP(5s+X1`=I=PSp829>{qr{d;S>0#J@k}|HqLb`u{ymn$JMv z6Fr~TmxgE2pM&S&1<3yO$zeG;^+)|YzwrGw@$(UWDPDnBxW`8Wsrjr2F8 z=eW8>IR5)Q&cSZmVE%c{{Lo3kcg)-HPP_;2!w2wTdHnXs)BjgBKWw%BL+*RV-{>7H-Ph`e z=HGw8y$kmUcU8Ji@(yXIvq^d`-6y5v+Iz+K zHMu{e67DbK+8_3Q^WR5>?*nlyrbm~*sfF(6Tw{^Wm5S%Z-~wrsd7kvKYWKgMF9Q85 zh(3j9|9`O7{~5%!kl!M|i`I4X!w<+GA>r8kx(L$xPvM^<|Nnyn+8NFp|9>#?`isM) z_{q1N7k*9t4paCO{(`?D>mNA(AZNB~BQCijJcgY5>fgczBY~6RB_S|{KWX=t7UG#gW{X6Xl z-yhe%rQjLi(|7{?NoZX1ENjs}D{S5VjL@+1%5Z~w`Ap&C^gqSo*F+)^JX^wZfPoK40;q(6sd8aC@C^OnrQbw4a80WyR*of^cV* zensCy_8V*x@z21s@Ekl3FTm9QnID#u6PNl&1UY#f|041gcr{*!vGv+#Rq{9e&B&E1 zvydrK*HZ(ZBE0j!9SP-(RBNiT)Yfgi&?q7{-x77CCik zoBFbSMEzHz{%=sH?ozj^$Gi2>_mr^P9eW$@z};xUm(YnGJYs%9q4Z0o@7!7B&}&Sh z5B;^q8ZbbA`22(x=SoRu2xG{hwT}H=%%-MyCS4o8PX89-KLNUhqmN`4zANqr_!0gS zXX^)b-_M07@jK-IKWtoSivB131%E@^4*oyQ+W&{N|6{(K*8WFaVtT60-jKuds&tV)H^zZO|v~_yZo}>2@*HjwPz8aYJ9MPeT)miC1t}>4j-G{xm zy>WcSdpu}vqiehbPs7vk3_J_Z!Sj$hU>-5K9P$5mFD3`4eV-@T5%e?bkGz8ZYCL-V zk-Yz-I`FLjW7@nM`@I>7%;~x>u0N8Wf8XWbb@Yd?Kk_gCKisGNj~>MJ_ueM0cj7&G zA3lH&<6{_JqW;GyVqg0|&Hp=~{f~acdmWhP`C|}6m@u~gN$E_+zKR!K5H``{_~WNz z+?S1wk)Od$m@*FVd;KrBMEzaX&LHo=-Dp8-#M~<~-LCyFk9X30Vmx9QF@|yXpF!R~ zy;=LO#=HsJhcJd5TKkGZVx4hZdS}z=vH$EBwbg{yPq z{#|;bKErK~*9K!xkFod1mCYG;c$Ph$BipLjzxob3(1|W2(TyJbKpH>7f8ytu#INx? zOkucy{f|+Ml}f)z`eo8DkLmmVT}vN(zn`t2TFw5iWdE;mee|If?EgjP5J=}Q_#6I# z)&=Z;@iRN!KYPDJf5tiV#~}I+#I*sFWc-KR1>zouC*Vn#y0j=N&z~2ju@l^8X_FA8}2YB)aK6$SU(+V82G~;kf4aa{7z$QoI8H za{gbv{15*w*Z(H}qyL!lS@|#^->1nzwXWIu5=bn6+{1_6)<-bF|Lwr(r6COSPZ}_P43#03mUl^}deywx; z+I2ayw~zng2>(U*;=k?24_!Y7-OrGFXfx*DUc*nY&R75$=kzBJ@Sj8vKJA>_@EQEN zT|Odj!ELw$n@?KrZ;tm-62jfWEy(_xYbJXT=O3iVZGTmU=nEL4k0FPzAmtvvPJRpD z#SakIc8P18{fIus-XE>8t{eU5$i=<^WV8XsG4Vn6zxc^vb5y=;&cHYLd#Q!)Np z-zPbtJ^L4O>qpKBe~a6E7ylqum0cjP`|tQd_R1jV-n`MJI~Yc3_J^iNon9Y^s!@( z#q;PdKx{K)kvSqQ@yo@vsJC8Bz7(xT&k3&}UyTI4quq7mb;55%ceCq8{yzh|&k1i9 z_cpu}@4@@<0el!A!Ob6Id}{%z~k@)JPDUzGMEEHPW+-EJe?d(>sQE_i%5SK;@ZABj3?Om zLfw&!7RC%mMNB&$CLNUL$X>kjIg$WB(Vc|4Y^XWNM%K|B$j3gY+RxK85|S4G{lt zHu>1ukMJLSv-IDFcjA5c0ODFgA0|H*^;h~nK6^5JlHSrhBWxlQ+xZ`m+t8X(W_|hZ z{O10yZ`nQCCt2wp?GyjC)Qz^fQ=z^7lx?R%C%V{KNpz2#^6pNB_eke6xCyu5Hr#=` z(SqUhsW6f`rT^m}`ak{=#*sl5IrQ#375dPB_>}e!8%~;O4DLJ?hLDs8orS1z|_DilfWqzyjUmd&!DcAL%_DQ?W!Gov5&*_bCm(MXOe~!tQSa@Hn(B^8C?J?mEj{mwQQ8 zdH$}i$MrwKK2O3Wcp9FLXW&_Q4jwW8SiN($|F6XL75o17s{h?%KbbnL{ZHoqFG>#8 z7KGnc=m@C z*t2W+pO?%GZ*a_;&^k|^{_-r}pSrn79gSpxI=Yk{Np?G?cY=Qq`Vr^y4Y148e0=ij zgO~faPa*#T$BmL>jvFU4!dWtx3E{2I^A5Zl@5Nt~CD)T5!bj0It$lF9x!YX>x<*_B zy6HU$=Ug!(e8PSku?3Ad<9~1i{fz(Kjr5z5Tcu2JOl(7s%Rje@yB&AoUQB(~KTyaH zbYt@U?v0#yjW!E8itNV=!+qq|53*adVd{^nuTO;VHF1sVjBT$k4Bwh>LJB9K4!zy2TFz7PM? z`hP8lW{3IGc`7c$6^Ls9Cb0ER-yB=K1F`@3O51wW+1(@BAleRHxAAxW<0V@FKhft;Qx2`Z8Wl z?>yrC_IVZkwdgtK*@#OX_pC7S{KD`C@spR%3U4Ccig)1McrUKUhwxE+0voXfQ-7Wn z8p*a3`akA4$2`w}u{4)Td!cjCd&nE?b0coXtvEAI7VU@Ig@=!@KaY9GY7vN9jij1Bm}+j{g`OLfZlLpZX}~%}#kL=FKG8jh-_3>|OfjoTn8X=te&V zF^b`Z^5$XEva%(h?RSNJWazA-cWfxI6NU~*t)c#w?k?^9&-wW_c2 zz14*xQd*3=In4a8OKJV{eKO88D4~!pha8$<>adn=l;Exd;=2nj%w}7 z+Jf*V;kP1hSE?7fv@Mg1j6d-OZ|6JSLH^VFe@%14T*tfv@5Xy^J+|!f46qr)#_>nk z()scID)#p(_BZnX*<*9}PV+xM!T+4CoEkAdp#8LO|KKk3|LfTQ`T*Jwv;XI?|It;# z{x4+zv*UZn4>{jQ@d;%4W^Rn<$F`6ACir0u7dwBc^DkikV_cus+4(PQ#9lUjpX=*q z^QYMU1N3wm`+o`hAC1zQvKHbEd)?!Kx#64iv3<4+e~12kCqt7>w^N@YEH7^#)`z6W%3>V0s@?RDi#QA-3 z&Tm}%Kdu29%Yvb$^1k@=3T1)tT7j2OF}Ezfo@Fc^XVt!`u~IE$7L&ZLGXMpnjz| z2k>j+;~cJepmlz*Q?*r zE3H0#3H{m%DQ$)U=SX9)R{sQMOXEDu#f6wMjx+J)IpHGuRYu*#1;phAS|*P90k_J6uU0Q`e4qzE{z&My}YiK}Ovk=kLe% z$4$+D57*fCBD@6o_J?Qla(d@c?`DF3LHs|!o$Tp@r`u-9Biag{?Ohw@guC7EokiAu zJT@oXZf?#UON*=t`Nz<#483(?I^1@2I(&Zbabt3SGFJDOaLe@Z@VS#G!p>tS_{9Fh z_}7EsRnGNVG!{P)wk>@iY@PF9Xqf(UxM9y9!)J`Gy}`CO;l_#c!%f9=!)MFx4?7B^ zmz*1JsXsTo)wXxw)}yn+Z8hfK)t(ocrq2tvA21(7S@Uk&-i!LunPD^TtkO?#Tz)Qi zto(dYXy0*BXhToamB#a)X$6_$t@O|X_X{waR70P3BOu0Ks{?N9cU>UmcGyD?2!f){h z+>f@T^#Rh>aYyGuR0|-B(&4!2JFknJO&}4>-?*IE5K0_hbLG<7d;y_Brn?NM*{QjA0y^eaiP~Z9Me4 zmp*Cd{R5Q2XZ^p_4@1Hoi#^w}(r~r3u0gkOPt%iqhyKH`uvGt!F&h79SZ3Rc&^oU) zyo7u?5+s`WQL7bbfdT{oTk-xHk1r#?(H=^%)KRJle7{-108m4}FN%$riefYjZPCmIHe4qRweuAH2 ztj71eUYkRH`K549*^)&DoxLy9o+5W#O&sN=^Go| zmBt`_Xt#e3UC!6xx%{3!B7Z;8cP`DjXdFJ>53u$5=Y$J|FT%yhnrl3toccE3h;Q(p zI6fmhRroSofh*CvMt|&)8R05=oKqS7f3BuqgGbCiSgHSSE&CH^))DWbCvj#S@l)<~ znd4rBm*C}i6<&)sU|5|#f>Deizy3|1Iu1GXs;BzYQ~hM>r23ys3lAPr|EtTQ|KOXX z^H#hA@5UMb!S~Y7)SuVWKZKn6G_I|iA;*_IwEkaGzv)Wf;TrirF*|(JexE=>e8)O= z^#&_#MYQ z+`sty!au}M@JkG9OO7DFzKnc*wtcBw%2g_V7JB}RT_YLSR2rbCOFjQ$&)+xQxY=|0 zuNmQ2(rV%N@mn&{$G;PQpx=)N&~c1^=n+02(f5PwM$bX(*lEK*X#3W$%2O+6hEw!7 zmM|kNe<1qWX!$5hsio6=vU>RP7=tG+y-%~!lL^#_!GrXMK_I%~jPtOmprN04h!dvkU zOuc1xcsKc8WPf9B?&oKR>**7Gkv>Fz6raFGq$>0sla06mHzNA>rE3eq&Ga+n_pZ)`mN6M9s2k2L&UW|I?=UV`ybt|ug7)%%=>zk?LWcKki{?YEBqF1 zyUhQO{R2z&4=gd}%`VN)|L3dU!*{*6gnf^G+fo=f!1g`FUfIvSXZH@DG(TN59){LJ;44)KT;S-v;Q;fe{#riDfabT>0F45a53`h4~m~pAJf(y#Z&2*Ay@0# ztK9>ed>ncIlht$o+y7+M+;D~cvXA9|D88e@bHmn4_2r1~F7w>VJ;xQEBifege?dpp z>HWH-okaIe?+1RP7m*C}iHn7ES`lav3YpQdY8 z7^IJ4>U?4HzNo+5{D1!|_!|A2*!pj?LTc@d@Ev-Z9K`qOKg3TkxaAsnBRTd%V ztMFR90dK-v@eaHj@5S|)`i*DK*3FN1YbVBg|B(2&c0hWMb_jh;-9M_0@CkaB{SfCw zo~{4Y{Vi$7+m@gI_e%Ml-nzv2LxH}3dgoftf8OkHBmHLdRCxaDJ%7Zt8g3Pr%IHTT z@4}#R{a&&au?`xQ&&SrVGlaVl$9{50loKm`2PgT@RQvvspZ`KW`v10De{XKR;|4Jr zkH>xZ8or6)HTwRvv;W8SHx6q59yb0>rZ%vDYqb;XKS&=Uuk_t~M>^j}Y|H+TOf2Eg zf}hZThF_wiRQ@b>eMRzTTAQGceXbplw?F2n|4KsmmHmE;`;mR@wc!D>@ypkSZN9q) z>8CJbSlxm1Fc<&Z{J#tBa}h4ad^{DG;R;-dN0z_x*CEe!ul$8r*Z=Rk#D)mCKaB%R-m^ zl4uidHo0L?YMpnViZ~2hp%CH<#PQm=J}h;KeoZVe)Ih^=KE*K zoN(`+<)IJ#NMQhJ3}OhIN0je~&+ePjo7lW8e4m`WygdAnY}CiQZ5F@XbIZd|gnx!# z;#c@BQtivbcgR2BemsD|-OeHYLHgLrWnmPj=rcwCB^U*_(YcnL9JS#j^T*^DSjJyJa3zmf|$*YjIZosIzYwScpxLWuc z!t% zTj}q>yYXIJj}PIa_yjg$3*x+@MzT$r((WEQT>shspDcDR;piiCgMDtq&A1h}<1TFN zofYmShYz^_L+;=8jbVJJ``_jMo813)_b;tJ_u7wCq5D^krKjCLhA{E*l$`qUvM}{+-x2*jq&(;R{73g6|2^G1gzI#T%fxl#hxiG8hF{`W_$~75=c&UwkY67-sqU#)2eEJC zSa@#}`*Iiil1$k)pgxQ9-r_vCA>m>1BhBhS{q+B|{=c{`;}1!;W!|2g!{&6(`_aL+ z?A%`%9-t@8Gib$w^i!BIstiYqxRiFtdGxus5Eo%kota)aJ6ud3Tk72E)9kxKm@k}P zKUaO4DPeyvQ2%Z>cR`t5kInk-o@$@`{J(~C!e#W%6WS)z?5=b90ewoF{E)WE0c{kr zYvP=6g?&=T_4SiiVX(niEqM*no=bjR#0>>unedB{U8i0U9AV-&m&xDOrEA@z{dcG^TraM3k-V|5FnoyqQS=CRFHi@Q zT}$}Rh}(!QXvD+)6K)XRs+_o)%=`C!`}}Y#{dU}id(md>uf19S`#!e4GA93D?mfzU z^yt%WwND4S(T_olB8&TQ`v2n(`9Jda9oN3eAls~+gWetb-&dX|2!!6sFhntI^A9j{6 z4WGkJ>lX5XwLV^$6}H_|5bB(#9u1iKO+naBPVjNvLB?`^XOw(dnqvI9MPc{ug3wHF zQTE-dO`O<2D|~UDH3UnI1*3g;sXl@db@P+?u<3i>IWKe>GwUgMy!SUR^qnZy$6Bl( zbv;}Cx{xk-zc%*zuuGb|kw80==tK0=ySK!7PZouHuw~(8;nRiJgtToVXl&PBF4OKR zy(To2zaZSOeVOsUt3t-MJ-D&&x#6ZISBKA*Uln#NzAD_j@ak~Ofpf#we^Yj-e{L;) zOt{VS{`?x_f0f4nd|!8LpBwf%b|3cR0Pd_0;UGDzZW_tRW8}XaKdUJUeRV~le-&Q~ z45+iBkMbZnw8QtC(LbdRp`+3|gX_&nv2W7cmF^AJ8Jt%XYMf(>^3G66-ab1dd>A+S zuCx08Q@hSL#$wD@+)*6EVB&o3-}A#Xeat`cMyLII<8t%=o?`vJ68AYT`Zt^(#&O&} zEr-qzC&-g%H9lG}9yS*u;kXXzbY{*EbA*e~U8!7SGj%neACj0St{5dK#R4qEB20}J zgvI2*`^*`2X9~fuZrZ3?U{U-~o zt5m7)`=B{|d->8~jbmEtl>fDbVJ*G$g!zBF3&T44di1Clx(}$c$*zgQut8j^>YPwb z)*#x|wPYQlojuB)A49!x1G4JV9LABs5znkr{C4cXPBftzyU=FrKHoo()<1Aeo$Ht+ zy6HVfjPJLz@6o@CeUE`E_B{saLl~~*AF$GOtZ*G`oR^+i%|C$5E#?1LYA&erB+!l| zTC3RK>ft_m=W2B`()1DZXxnrni7q_+|HB#E_h9Q6)W7Vmz4XRly!NXFVIO@z4&WdT z;V|0PsQ;VT=jcr6pVc-`9#l?goA+SCK1XmA$1siKIDwNGu2TPFR6oerUiIH$^&go< zoa^43*8e2U{(AL425L|DO%JYD|0AxyRgm$$BD?v?VGcQ1>H8_05sK(zwen{5j4+R0 zjGS*Pi!v?{DW~i{O5>;4(wOEJA_sxjUIY8lIX&5ajApz!wK>v2928+ zWPMLa8$Tc2ZQLJogo}_(n9JU7?A};=24#*(V4iJTXX($0?;CM_vtn^=3yja>pZ-r! zV0<3E=tDnJ7$`LMJ%|5+vHhVk{s%?;4~kt!sp}xeg)>L^9~|RB$PN$~LnETDJVrvnS=i_qh^?g{@8IPPBpSS&8Jcz#$ymLd8*EG5ek{UAo& z;}}*5uSB*+ySz?2eYfjC+X44?$o)<5KR{On{{wWR2df;j8WpHS71m%a)?xUt{I7n? z`9uoz1)==eB(UPd1)PzKs(DZLyy;cN-}eIEOr zTp_MSTVy4<3bCJHHCchgw6;rHKSHK3R0>z2yTZ6N;@SmC#DC|;zT^D=jV21h8vC@a zv;I%rIbkimv!4BLpSb={wo9EX9On=wcWR4}>+F+iXNUbp9q>%~N?p8O+y4HHf}9Ippnhvh9H9>;Ce73d;Mg4W92V^?#Fd$9|y3Fy%5LWAmv$ ze~*87{=SE_?;+p*a@@pq+8=*p|Ht)z{_H!KemPcP6;`7Hm8e42*#8=G>iyaum#e?q zXN0xFjo0xDu#Vt5dTX6`hxPP0r?I1AM%X~FMqE>_ds-Wl>{{Xe8`%Fl*vVx6|9|u( z+rK-_PG+;ObW9CuQHOdoU^{kTIB9&}zN3yABgc&yXX@DKWR88`+s^(M-*0=09I$^} z+duFBPsacM?UYUvnz0Lm<@PC@8+Oyj*z=_wZzdj_4uXikRUuDMMtzDlM*S*+%6&rsilM9W%?{fbM`z7s1cDc@^euM7A_S^5j z1NscMy~{gxYzBL<7yGau!?m_ybie$!*ZV*0{ge6eA9=4=dVM?Ge}j8n<^5NA{}`kX zA?-R2Nar91m(C4`$iqmLTMv$mW3}V%FOJp5|Nb2jchEOJ<~fXd1}*F6hNI$+VH(HL zDn3y&H=Lm7=MPBtB)wpd_wV|;kwo<6C=@pbQ{THF6p`~#j1rV$0kYT442#Icn0P%~ zl$?Cm+)zfwy1yPFi*;C!4X8#9YEg$a zzCrEi&_~#bE_xE(%7-{VsaGA|hy4FV>(uYX-gBw?og6AMC&7Ma=MNrJX0VC!{ewrf z4^ZzM4cLxUxqccly;%Q({{Nlyta1Kka&$!h1jdu;39>X|2gRD9;hu%5t8uXWFA7Fr< zMqJ~62&we!P$Zp313S?_Zyr7V8!wKZ7t_b~OB*HhQsn4aWH62e;##J)A;?8&-7z~X zCYKh$QvqQOKR$wJoVKpjHi7E^q zcmLWdqvSuWA6)-`@qbkB54rv|(m3NUxRxG$B-fGau>rZI?stj%CF6L-nemK;-b=OZ zElK_ZwWq(q&G8$23*Rqs(f6^|_W19==-b)&vQ7N_%pVYrzMXmhf$G_z&OTE=)jvZv zU^{kTCz{ZVt@rqb$=s#2VK+H>o%Rtq@e%Lufclf3um3mKg(Q8DeILiT`si`&X;dAR zrjH;eub%BcHwM*S=o^%;;@Ey15GjB74Ze>BB837(VI%AvKYGG4||8_lQziGrW{y2F8 zW8z0ws~f~c-;3NL<@r+W72BqMIy)46MVke4P=tAydbT;R&)gU$jP(}_$Fi7q73jUF6veM=ozjulvmN6i0QyfKVSYz(6q!#Fa?B8T1;8$;jfjiJA4V@OqQ z3HtE0J6jnLkYE+;SjeoR0 z$#4i&^fef^&q&FpFuGuq`G1>Em+vRF5sxe1mG`~Xn?fJ@ky^Pa46N{uR&NS}<(tA# znK;+ARyymj9vd*&Za#!x4&rF3JcCmYauU}M-$ z?m%Lx=Q{1VVyAEuy7zdl`#opRCV9~F(T42Px7E*AFNFbmdcXO5)7H1aFq?V=qZq4G zf2~u0ZE$~N&bQhu&0W}y8t*DWwj+r?v}tR!qr$`;3$q^8pm-0CsFWK-}IL^ zhC*@NBB|GR)bCZCO}@3WUG+1+No<>aR}hLyq- zn+w7!ay2S2^K|!by zSBpB-qXFBo15;n!6n2smf8Q9I$jLu$^3ar1HQF1#g=TTPup2EWq#Mo+an5hMa1yP} z%A~f!&_|EHza1m;{Z9G5ooz{W+kUqGSHEz8!Q2x1*QwpSf>SY1VCk(*Zkm8t*B)&FFwNc~@|ZZB2;lS6aNecS#1 zkYY>c``@fDIJjGXz4K&{?$d8@M8AQ){yoBbk)_8qfX0vO$EV+Y;^X12BOkY(WNo-} z!KcC<)1L~rudOqFR~^1^=!4<5i4TTbYd;d2Hhd&}9=9ad`)AyBVQ1}i;d7> zgijy&K-eek{b*c$UD(!C6Sg+54-L2>Q4>BhQWFl?b`UpKe=yus|EciVrPW~vZg#F) z+Uvq0+YaN_^|j$P&*$?K9}i7+wbs7$?>qgq6Sf_}Q5?ga?IBE)E&ic(ulKPV$A!Pt zHz%}Or!wKcYhP?OrnR$}ZB`uGjrX-x&j~%Hj}6^N=dsaFj{~O6gX^Dua_HNmuSkE< z-ml1S(n(J|QD00+I7u(qt8T#@SxabcK8HvR|uQCqcy-n!8EPg!9o zrFX8^hqAUXETAt$PnA3*E{U${!mvo(Vr-ptepo`5VdDMlSaS0BpA6;X3arFb+w8E4 zT#X7;qFw)XTfIJ-r1LaOceiv$^wA`or`?*G?C&c3t-)HX!+LB$_RsnoR`~`!<1yDY zN@j$|$((D;*7^3O(JRfAH2cZ4a}0;tlQy><}J`adnz7Q)~>Z4flx z6OoP&`j zM=(gwkb5w8#5E#w*fka5iAMMZz?YE}6;kbQIAhFAJkGSpzcD!}fwy`m~ z>s>dxY)jhrr2SGm=Z1o>$s-s{%nftMBBYbfJ>j}BPq-M_qipem&VAf9&|4Dcg%a_l zSb&9S-EaLwETVUAFb0gp^d;yiw+1DWmFMeA_5X=Eo-aB|U1zcDTHw0SjUJRaW+}?C z0xPi!t5JbrZMczx>KOL?7&$JSsZbXnhhAy)p?{(G<2?>6@@$=F(D{b+PgF{y3Tx1! zyje>o5=CqX|M~kVpE~{b>&0~(RTiO(&p^^Y0K3t1m_N%(&waJ>#Pe_Qy={ocui&?X zYI=VCpP(LItR7w{f0fH$WH)+>tvOJ|257Lo-gZ1X|J6vN7Immc1GZxac4BxP`+kEq z0y$Q9dd^=)I4hhxto@f@10?kikSXn@0eX6udZtPLz#4ILW`ri`G-DTbW3WgbMS`AR zUzeTSPER7|db9F#2627eK5=P`AcG0xj(f=X@87*-v?=16kFj4M+R^*O?Z*K`|AV+L zMk3s3#k`W0{2KAWX{Yf{<lrSQrm(%> z>+XYINX|iAdwR6Ww{ct>a+Pm`jBVt+{aeXKL|p5(E3N*|sQ<|x;e*mDa{N3LqXeZ` zfQ1-dufE2aasKfV^)j+W>gfXZ|3US?HbFm`vfn_h@p;7cd4|>+f46OsbQWU?%FtTQ ze#KIHoD&!Q-^=MM(38>sjwHGYX8Uk`f7-)a)@eUiIPZ_W-)iY&rTrU!A>Zqd8eO9u zu~fTZk?Z4^kS%q6hunXY{*PVGLH?`zUnPyzs6ZvEum)?f4#U;*A4V~Tv+KVe^o-@j z{QdX2f24M3|2R*&LjNCzuwEK1?r{TIjaF@`b)I<*Jwfj%c3stMdEq+b+hseo;i3({ zW_G9-*MRNVft_eVGj?G&5@<&f53j%agtuy{WySwIE2HPz!9Y6xufJ6|Ndk2X&lFBw9${qpC{Ex$K+3K^Bhjtb`k~OQ2t^L ziZBnwXj3M%<81jrPof(==v^Uyua>`8%FC7Vcdhr2!4ml!!;6#;7+vc87^mmgC$cU2 z^5n;T<|(@toL;vuL6$ggDOwlJ2n)zX=-jP;Lf>Dqo$WuOf1*kM1hyJO$iA#JYgq*r_W+)?V*9-`Onl1sAsqh! zR87{P7IjEzkJpn8*p3~DzMbjge9h=%yM3F7)Cd3dR1_ihs2g+~7;+GZ~0@E+sq7{wUIkwF$Y^sX^} zDV=`l#kKAS78t*Do;XK+2ravF2;vLP7p^u>Fm}@&?jC0zx zo^UR4spd<<9&#@RGna&YzK8|Mk+Wl{UKR zJvii;!RQGye$+J~YwSPA7ofMqeV|`l3Ip~@V~{?Cmcysl zFv!;vr6HVq2J8kbkI_8mmvHjrM5Z5kgNza%W8PdtBq zSWGTK8J40PE6}o^9~zm^wp>N7Myt3AvJ&y%I`LmXN$uh;^vF-~A3N>Zply5*Ity%{ zV>{VZV!JkVH+}2-3qzIT*I+HyVaqPxJT_xQnKF$0I$>l6NV!_fMlNI<7qF4B-uX758nuZ34y+^V(STw5kCb}Oi#%sM z`u`63_17!;U#;P5MGkl$=`!Vix$@uhZ#1WH+cyfsc4a(iKZC-toQp&Yk>T5 zerOh+eDeITi=1E+?2ch$fGN)^L2pMATlE19?sbmC#(jmy4!E}c!t@d3`g{ZXy!$<_ zm25dw7;404um^k5TB!alogMbkJJr1%?DqZi1Bm~$>t?qn)xBNhL2(c7KRP6Q7!x>x z{QBQtRWHzwVH(GA0wrJ zeTkeY1|28(|DlVX#F_Q~H*2HSpVYqD6GEjl;+kjMKBCWwUg-J#kMkdQvX$_^t^Yq| z%z2Hp)?yvvyqNXm22^A67w3c;GWLJf#<20Ix)}D&)|1iyt%2N*TlvB5AkX-}?WCXa ze`}&Q4rM_sBf1dFh*&oC;NkuSBlgQ+53;X6 zKkOqLzs$bAlur-+01o014r2mw?4eyg??5NI_Aiw_d(pdi%g(LPmies3@ah$+O z3@?`d<+rorzw%Xn{TJ^t`g!EnU#^h<<^9wm`Cppph4Md!irMyj4hz2Jx$-3}Bk23lytVU~*@nclbqfHwBOXVCI`(f&lS~SClRYCBgj&bcp&kv`jvd&E zCJdWnK4K2SXr=S4moDFfjJXI|z6ZIr{15h-w~qcp`Wi4mPh-%2@&A9#(unie2eFI3 zSz9o_KEH9HRI&6*r6-*%a*o+;p9Ds28=K?*0ZHo@2 zK@UC7K}aIbL1?RX|L8y`x^_vU!S{!rw6gm5$}`99!CvgcejLC-ob^9htNy__GRPu_ zUS)q@!uPk^_lE&`I_dk{>H90@j48>yF%(WjAHue~cS zQ^Wqnad9nt@K2B@(Yk^^@wb&tNK~j-SIf8R-8sTV$j_@=EFUX3lBhq%KE~Ad%)!+D znP;C8G(MkQ`VX$V>@=t7~c|XhwWx$<$x2m-X(lbJ{?fLb8+*{+zJkM{>3TOR)+;qBQb9i=Wkj5_TMgr|fq7P~0*Z-|y|JSjv$?*;B#=Y$8{p@S9 zS2}&t?pN2v{{s!EPh;IW$Q~b3FQ*o=H>H!o9_+>70{fI{f6&L)tAE#Of6xygSE>F* zW}W(XgZ78G7TXVzhtVp2f;@smwfmBOTvzv~@G*4Hb6?1hb2v|1JNq9UuCKFC{C@Yp zME*k$Ht*BcM4TTy?fC4+wMlWDeggli|80f*hknGhVjk^(8~;goQaX*7tJi;85DNCY z|32Ty%GsfiKGx*h!W?=L;=cp4OMGJuzBLp(wxwovm}grtT8|qeCQA|55a>9dUc&<6 zh3NL(#dR(6{~6s=5EhABj3p?;Qj}u_R$>)aqXLzv!Wyha_Jh{n4c8jmFi!n%#>zf> zjt#y=#BXv+A6Ub`+5T8<3j0s~_!*(`HRWNOF}4lDQ(rW1B+3`g2sLCa>QIjcY{%AX zuMIoMooGTcCXGk#BIEdXs=~j+=n1qViNV^~2c+#!k2e445@GrX&h&d_^gWKp{>O6p zzh3@tmj8Fk6WRm$|9{G}J=p3$;#p)Iy9ax*5BqTd!xid(j3V~2jaPd&$RdY-n*T2@ zjWhrM+nn~iGM*QCNP3OE=H>t5+HjaY^?H4D&onRi^}cE0$^Qu92zeC8FpZW9-y)g& z@nzuzc@nL)@_)5*_&f6dQunrGMku7uLH_@LzL8|Hd&TD#UmA86ULI~Px;)%+=<@LS zhQ*<&?3v-#z0VA{rI&;+6g)dL7e71P-c%OuSoGX*=Z5EoUCm|TuJY%HyUU&*);Ztc z!DYrAmW6}#)PZH8$a&_W7`a2sLe}<-?c*4uj~-nXN^HAn&84CJ$Q*qT=Z4Ng{p3aZ z#m#N&^3S%O6~?(Tj}5yg_&V6PrNDe^dSaLPaZUcCF>-Exoa~mMM6 z0i;)3|G<7jWy?cL{k5S+I<=@nJz5*C4T+{}Lj%2&-Z64**iPSpp6%Can_O$`4h^cC0+?w5L@~^#>MOe_*lo2gshP@<+7q+Z~@oA4XzY!)$-~bwqn)w8Z!u#m-kfn?}nK`It@Q}9g^p=_ ztiIwe@UfsT#<2Q%B-YXDn&_)PUZ`#^I_fd_yfAd4@|BkT#(S3;hfAE~J#5u~a(se8)%dsMcu?oZ55F^?Wqs9})FmB8tgR}o1 zNV5O+EA%7P!2aK5{6RZqP*&FW^JG|X1H z#|rnVt+8HQi}uI{vKsNxApkU8?vcWMA?_b$2yaPLt{a`^DuIC@LL;pvE{*Pw;AKTdr>J#g@u~*ptYqUR< zoBc>(0O_Oriw^KFLcaenF(c&H!@K-+zG#;AF2sL;#eaW|E%5%|ubz$Pr04XZ$F;pP zXqiygh;K&{eTcSu{P$Oy-YLJt|NV^6Gw9hM|APeV$D(QY$=zRqmY} zsB-V*pzvX7OyCHP;uxmP!=EOPBWsLp(m%IO&?i1Mi*cg9o+Cdw=Sgt|-;@8>YoC!@ zzv}sw>VKurL%P^C&CxDE<8S;20NKN*uPyH1QH|0C`G(l)ly(l(`NqoplcY@?+%ZCT5#tTH1zW)#QF zlvPG?Wi+Eg1V~E?0a8dIKnh6{@|*l1zdnhKWkzPmXjD|jW;K)I6g8V>ETfD~ z`+eR}x}BNb?>-*i?;o$n`+d$m_uPBVJ->h5l~{$k0kqO8x0-{p&L4%IU-Rmg=9ENBwi@Q@kZGNFPFRwfYeG zo)xvR3&?dSL@_o+b5q)jrSvkCqXLx?;=iLxAD&)~8q}gY&pV)j-oMn~|GijWog86z8fP$y zx6ZGPF_##JIZJMSNg4>r;T(MeQ#g-$X{14#IYKsO`8GNKHveDXW*SK&>H(x84e7{0 z&q?_oeP`qW`T0Mbf5FdoWyw*pgWsL>=-)8io_FkmI)NXVn1_xEdw0o8=hM!oFTg_d z8Iz0FkPJ+#x5z<6d)exyVjH930e~Bs&Z}3T6GuwNSAgtxzx2~ zcyqs}Z@-**P##uNN(pMswu%O#PWx^wYm|RX85Gxl=ld&JS3m zk1*R>|ChYaBPrnpW1yG*UwV?eesL7F0fy-#7%EUMA#p-n;&kK}f6tKPAGa1JGUF`H zVFFV)j|-Sa()XkVOul$^s9R-hg`S4mHTG$fhnlVHA?KiFOkPKH=0`J*EEfOBbZs8$ ztJJ?Z(x9F~Q}4xb=@xn`vbcF`{WG6=0TyELYuaDU>i;fvq;Z3Oa)3EutbGteXsglx z$5C{k6Vbl7?ppP~=jIAeGzJizAA5q{onx$>--{590YrWOer&p2|5*QLkUoTge9uAD z_s`*WF_vH{>eiXRaBUgAk>Al6z;b#nqJQeNpqby%S-Ug)@7Hj%O8Cp!d8g)5Zs@J# z3isq;C01cI)*v4R=;5Z9+dlWb`Tv7vZG>9yS*ZUvE&L~he^~g*&QakfyP1!X9TVpE z`BR8ul%hL@oAk><8GUB|<0AXg=@t0*`yVUW9rB&3$ZFIrcAf!QBHh>K3Rj_UE%83e zp$7IXdBVlqL^iu#U#0(hT)IFbn$Rr&w#e75C^lDLgl6|{xL}RIcKh#r_jcwk^qeri zKCF*T_K#{8&=cnP2hVFKRKGDlZ;t*bI?##e-yhvLhK{A$2K-6jI8LCOT^CN$`%C$a zVfqM$=z~a9dKONzJCtV43^|HA`WSf@we;wpKaDuYJb{^Wg_xU0gb7idI%XUqI%j_z z(YW~(cjMm;;XHW()0n*1{@L%7uc$N0)JXrW^(X4GXzn)3BPAg-q_I!O;5C_H4!P-@ z;;Sh$WYXs$3-hr63(E#%;Iuc^5hn zoym0s(cb=gv=&OcXhsvVJ!27auoz3Q6w9z2^%KtbH#dJ|q4{sjoQq#=j@jIGYx&I~ zgPUBeKps|N6;@*ndK%1sqtAK%{r-DCkZ)ccg9YZgXfnY1Gn>qzG4^pwf}=#$s`qdeIwZ(>ls7(&}w zWrgzNC^_?gKYEvPqWiS`pHQ|m_-^_t8xq>+wegVj193ohH``-IFZr4JjqF>%|1AFJ z@ShySP=@};IqR5R-}D9bueM@3y>84oO6)V}jjM%irFG5pOth{LHg?Tu%8!S6?6NQ) z3$PGl_sqY43llz^oOkxeBtI$&OCmNH9ce{s!)w+Ou2@v zMFW~Kv;RQd(@u}(B>K=r@5RuxIuy|ydGxRNnf|x>CHlu*V}&xKOW84^3^}PRIc_bQ zdlEQ~6POuKX!pHvlDWqkzFzd9U!QtlmHxLmfI)I7Q~h72?7~rf^Z&5^kDEj4xncf{ zAX;LGZ+>9YQ`{yhfo&9r;oItns#}s*9I_+QV z9rgPYOS~h75S;;3zfReSOZx<8>VI)@X`kSE_g}y?l1^)PAr)yz$D98@s#O2iYL_IU zcF9}De=i8Xy8h@Hbv-)8YZos4|L9PXy zsHGn{pn7k{56XZz@ zV+5OiW_`gc+VS)=7{wSSPhA$yMtYm{NJjtmogk-h9v3i;q#sICNJSdbk%2jg&W|g- z-udAC%48nDNqb|yvL@5}>MzY>H`D((<@~uXI0w-D{@8cF7w?{u-!GWkalQYH{68WM zyl4$NbJQjsL;}&jYU-z)pN1o7MAJfjTeP4RUEVRveG8F|MaaQoEJ4pX>-*8CFWg^h zUA%sCLf?20LtXw4kZb*Up7rNs2X~zp_=#@%u`%oC*QJG}{8@(O$VFYd{&(+HVFkUh zO8>jUIs^JjL}%x>pcze`v5H-XF^ARU8g!qt)_^QP7roDzMZd9%CBm_e-Jr3FA!7yu z>=QN4t)FN99Y^pV{(tD53*A$UQj{So<7dWy64tsf_bd{A^qmxbM0*kvi2m_6gti63 zpDp~}cjo^H8NxqD_)~=+hZ>xJz@LljaD8uM1a-~Ak1F)5(XJ5@t$Amh7tVOi;vw>_zbk66@y!0CBIWl9AK9uL%M}1Iwy*2y|*7YBmC;y`f z&4|wUY(=ko5;%?%csAXbISzX78vWXyGHx(4PFSf=t8bqukVgyUf3oeY`R@ty-(<(M zwRiNcQR8IpJ;rTxe#N#EH-wY?9>xgjYK$MUKTV%Gf7E-Np^u`~73J#aGcgZYn2!ZmhnezeGX&+$b{Jku7Z@M}-P5r+# zC1iWfBIIB(>XzAmh9&gIQuQB}(wCuCUD$%??6M{-XP1i=$iqsk!fLER6j#NF%FBG_ zXwIkv>*$53KcW6V?HP#v_1iSf?@VK(8OBZ0?9b$;7@O?9-h7k%NiRb=D)8q2cgnQ? zF;FF6VURwAHuUlyt~hN^3iAof{P$&@@gU_yGg*H@ z{g0XbIqLhSEcN07ZDca~_vT6W3}XbR@z(a=8Rni!^?$WGx<>tvfqZRC3}Ogv=hXi= zdR|?x?(G~^|97eX+tvRVJQWY4{29YJOkfJglbMk>2B45jmPo%TV#cUgS#nI$j18c%mE19Q-=-tR&t{czH&@!G7{<8^0Vk3X06 z+xR?oH5txG&D6KZQO2bICf=O%a=c~1uj8#rzmC_Z|1RE8^}G0yq*vmNnZJp5CcPBz zTJ_8L;PbD>OVF0|t9X0vWc=u|$#_Tp|BPqxZ$1`aA+oUuIarLD^MCJ2Hm+!V|E0^q z5_U_m49iiJE&b+5zvK$$Jgh|2H<r%Y|*F^*`mW#TzPLiyx_eE#62rwZ9f`PP`UxX?QK(id^@sLm`S$iZYa= z0zKlccfsrNKC(aO_4ojpU>+oghF_1j@uM9_i(iX(pc7qXuf@ApzaBr9`FgyPA7i(? z8m}V9AAL1mP1c|m4QNI?y3mW6`d7NzfXUg@@s8Zr;tBT0G5C=78QERv8S7q)pQQIM zcrD(Cne};1(q!6e@xc`LWV{w1z@f_53=i+{2uU;HD*J*h}TIx;W^nV5&3rSdz!`tm%79N=f7;9?su8f$J-_O}l! zf5?teuJE}{>bA8%!j*4{Rg z(HWj^J^v=!|ChsEEm?=f^qKvC*Xe`L*ACB8z9?hf9AA*nmT~1GUiyJ@oDeTt9B)rjiOHDo@bJ%D}kNWc73z`PF8-oK$d?VJMb9C{(UVw9o`n-AudBTI$W$8ymPYIW(|qMtkJG-nSyhzV=$YmmGgt zx+Ra}1WsZYBRGwYillIcytKb;ls<;D=<}`nrH6qv(#N#&`MkVvPXB+t`Y%^Kftm5| zO7&m0I+l#i39WVSIZR**=WzkkNcsoqPhY<`Mc-Oo-cJtb^C!re`S%?C|MTi5b?Q;F zW0m>$Li6v&3c3-k{~Nz1C8Y3Y?B(Q;O1{~?R%fR%r{mK4gADqlJ&tq8&BkLlX%8Rr zu9?j9kcIiEQ>WHq0ljgZu;uF~)3Xts8QgNg8iOf&4YXJ4D}*0MFth)!SNLm%zghUn zMefPLVl2T@EW>i-qGy%-Cr)SD+2VNSTtDVP_CxC8wo&2d&r$w$kTdfi4cbRl7w12s zGm%#CCl4#J3f)=SPz$Xspht5becY^}=VNG%vHVpR=ROA3S!=-Vkg%>J3sJ{?F&Ujf zJhT45wP+4xP?{`b-!f0QeDkIp;mj8H6Y79=ZGZ9zqP~9<*?L|+N4a|{P>Cv3qXxBT zK-Bl|IWBLVk;llW??1pU>iZ8)$(!<7GzJizy&vt-i}t5SV*tkxjR6!bwFeIyqgnw~U&cZNQSF^n5GIUDI)lfpT20#mqvX%wwdKNqNv z3XLCF$GGWb;{lJRg`}Toryvz+NJj?dAQSVDh51;3g~-MtnHU8 z^r4REr#6yJQQw-JIVX=iFN_<^>ldHBT0wAiSjCTgY}#W!&Ysbk^Z%CUXD`#wCI^t9 z50XRctPjZ3|Gl99t3TUOtN)+S|0Sb+_Qxu(3I+U{6vlOAA&OCoGL&Ob{8x~bs6sWy zt)r+R$DBEF=&Zgny#dW=M_puh-gD^Do`NI#HK@6cSUHEYn9q63$Ug*X#ba=Nj{Fzz* z+haWnJ-QZ+k&My%PxBjR>E|$%&2J=5cortu9jey9B+sL+%ejB#G-}0lG{Tx!YJ(4qcDjxKv_`ICkDSb*+1+-O@Yr1yKTK4jAu zA^NBGpz8_OqknDZu#3*XTTCuR^zYkc>1k*+ z573gQ{%6i-G#|FrsF8)#;H#AL-LvV`r~5fVI}>)K~+EqJ+GZD*etep3A=q41UUb z$m{Iq-{=49U&;)pm8DhetD|cJH-#Fq77ggQpd2CF(S=?_XFEh^ybiu#PF&pgt&`u_ zAID7no5PF&ZpOYcE1Y0Ie!2ZFQm=BpT-&ZnR7#9hBNQeoPBS8zXh#mOP_0f+1zl{_-V(3gp6zqz63 z#N5!!zVEcSXs|w@*ZKeq(uZo?V~>8i=VV|GGSNN4k6L@s>HX2Y=bgV#pN}EW8C=AV z9DZ@PfZal5V-a#N_Q>3@m>loCDJ&tEVi_hwW>`+SWDEsEHl3az=Sc5wA>a|(c zCD0qqvmYriPF$}K$lO|MZ9uQN^MpBc?uwR)yIko99VhLP<)%x#cGC;#n+|FxB})V9 zg#KpVyvw&2_k(0~7DIiN{Es7P@_&kNlIh!!Gy6}orA>a6q73DzKqabBjh-y|e*ym& z@gD>8`H#U->HoC!zfAt;<|sEEWM_`FNp{bZ|2?aQAGK&e^be`#$Xq2~)7$A?=*1Cv zy;0t6I?v7Y#lAr+bHcUbIDx4Bdy*W+2u@>Wyiodz*2kP-9>o|ocSh%bC(n%konxNB z6eeG?4xYS#X^g*W4g60prvDT&6=_IE20B)n+b1(I4_Uah|7Siu>ht$u0evBc#*_`p zgiG_^ZN0*Yqp9KtopZ!biul1Xan(c4{9kR2_+2G_SBhU;+Fx1vnOPy5zl)HAEct0M zxdcm5ztH%_BIVvm@rR}};twtKRzzp^j1}D+mT@~i;9msJSXdtE(#_yC$zcUO50kT{ zQ*sqnV@+iDwVSPfv;QxugBL5yl$rVLFYW&+pf~1uHfH|+bH6eBF==aB8bcH6OZ5TE z^?%96D)pbbEIJprC83|#Zham<)}au^C`B2{QGp(HWN)+juh#r`s``D7`W-Xp|D@>u zkEs7}6dm~Y{|AiD1gzvw6{=B#?o|EXBz<^#e+IwNKySv-I(2KlIyOaJi*|Nn2a`h= zIsO^#ku-6UExqNa7uY3m93_wFGk!^)cv;(RyKzExlY5;1Up_08T&`{n`fXdSX<%Ph zru|SnD-6>cE42+ULO+dGV+So$+7G4L4>-fF6jV+TPpvrkUrHr1M8$u3`)l{ z|NlGj|1$o^T?RkqAQOkIEt^Nyl6AxBnGOVa(JW{$DT$lo+&D=AGFk8#y6s5;V6=bykFOQ7Q|6WP1!fLER z^zYAnvHHTH;PYHbldKqTUFI%qMUZE{lq5VHUEtIpbKqabBmuDZr zinLHoZ(M3_xKMwa-hkFBXTMcA1CDH3p7w9s{|l_8=BA76K51V7eWv|?xiXl!|Ag^@ z59?Ek6O{$)-u50XQPH`@Q|yrXBePt<#z6WYI{+P~Tc z$6Sy4j|=&ejYY^ocd7JKsZULh*4#yB-!Gvr#mxWj<{N`UvMWKrQp9UORr2UU3dNW$8wf`$aSmmD8Sc7~N zU_*mEBi@S8QziV(!e6WXzee~AguhVu$sy)8;cY)J{OC9#{KDQP+}&NmuRk#Ms5ZEM zb=1xtzh6I_|AiR5E;$sFB~7tVMs9x8`26gYP)@HvbXL#gNA$m^tUY3`LN#g-{e!n_ z%orp+IuovMwQo!>IhY)V7Rl!;l+jUHtsT)E*>6n_?U4-kx=%hp0>^Pg`is(FeXo4b zF6~VlzZsMEj1M)Re`8#2m2oCxJngl{m5g^CW9}ro3XC1)MdMOwp@;0vjmD+YjQgiq z`1U|Y_g9yGk&6{5%F!N)`j?p@CP0oz}J>;BddfgdqpAl!h(Hq6_kzwQf^mT~F?xQjMW-=P9C}h`>kRQlW zbgR?L$Z~Yi`*O7l^2|puSK`vRKy(Iu^dF4+eC<;lK_mXd|8JV^4^{4~Mh$AwfM&F# zXSMk?^rae~$3UjCAw{{6X8e!*H~+uk?#=W0=*N@`UHs@p0*CZ1j+3%po%|4_TOx1z3n|EW)I{3OQtPp>XGiu$UgLaal?(i>|9T z20!8Kh3!dUIrF$P1arw1$it=|i=S8QnXk?Xr-f??yOmgly3;wv|8l}=dgBFqv!`;x z8hSoj=V(`@KMB3?PB}+y_F# zst2r5cpx;AO@;hk?HRrHzoWuEm8e2BYEX*?G@~ci{`Wll-^u_t?^^yAI;KYa*o({E}2t1mlyg#+F6W7?h_XLG_i z{!CyB=g~c8kG#G67wFMHdHWW{!ZbbUMeko|><@``+K=hZm0)+MCMTqlX{f8s3F%}8 zYMZ4UZX1_5Q15C##o@c`}dv z*4|fgA+oUuIarJ(Sc>TUpJn8w^M97pbFl*dU(f%E&L7Nk|4OXFYOF!D_b;FP|9$?? zLl1^=ZHe*A?c@K_1EGMQ>rjYdl%fpfsKB7|D>@fsa^C}?ih1VzpVi`UrE-9*W_RiQ zpBnm`=l^^=+4&U7`bGNxQ`#L%d;`?i>VK-sX3oFqRj1IK(V`A&MI}FK(ST;OqYJ%A zpeNt_KW6^#H>&@CLjT{G`XD(pt^d!jc6HoQ{IBzG)xXQtzlhGiInJLGIEi6&&sYDd z<3{NH`Rd;_+8^{Y7}6IXR5vG9sDD>#f3SLR;wOdV95i2IP{-p%DiW8l%* z^!fZ9l>fTb!RzRS=vVjlA)4D8XgBuXpd28JU8`HIt!o^rME~_rx%MQxO}{ms67AQR zGFG5H+^jv^L`G#nbS79CH|3~6C8|)3adBNkjw$zQ$>?mL2C^CL=t3_NIF61fbNghL z?{<HPn5{F}fO&f@~6k@Rz6|A8~0$f*CHN{%0PHp1+TkVa2O2BNvoIbE^$)JZru^0_+ZX)}}4;sWU2~wexxGR-(36-+i@kVHNXg zwB!nxzEIN|;nc_PD0Ai-`+O8&9lDD>pM4>{ztZ!|t-Gd|V(5bB3rE5;2T;bY92Ka< z`0TV$MUGu>{DhornjLD$S~Q>;?dU=;>XY>A@aF#&=*?(BwAZ0%RNTCg{u10DN6G#2 zw{sm&(t9$+zxe6P694Dq{|iyO-gWvA+Lns{<>DV5=*$*2{&thc+;^xfEe!K#lezKe z{~sgtI{zf8Wq+F9SZVyH+!=xNQM6t#{$rm=bE)wkjIryemiIp*UpmTWj+CN7MjGxe# zqBUFDf@tkQ)CO9{ZaH$X0(n@8RhZP~8T?YR`2lBYGLK)Ue@Nz|0P9eQVw56U-&sa> zY4b(@cP^(_pc2s<^8QlkqgwjN=5LPlK{lkAgZ6KfM)l$m^f2W_SEB~C=v_Q3L~{T$`|FH%&g`!<_IYW4UG(q2=pVt+KXQ*Isn@kJI@`^C zH=6?|4^`Ovzt#N2ZSt@7UjskuxT$rmncjHPdrfOY(7P}*{*dq5I`27R?txuLmFJSj z5w$B$kS8(o{~&$+=v=~K<`K;NmvD)83j2gHfYa>GU=(9Gi*uO3WPM7QBFFAY3Fpai z|8%@SP9y0h{XiVb^#8pD7uyg~{G_t0JMI0oInwBjQ{MlaJ|4Z~Hg)=O@jB}L$GkuL z|Ly(*-=c-Tt>oaZjQzQHX8+?Y(k6W#vM?V#h2p?B>`U_v$N_PjNOj*L=MSJwd#JrY z+;i7KcCPRZmWX@b_ZV4UA%9d!qc!pennt9}QEBw_#rdfPp0N?Kx zdvBinsXyPpO8(TRPhb#3n3-QISO1l&|BUx_7RcX)@;7;mc_}}Zp+i1hPIk#VC0msB z^ltg%L{w(nc9Zm_jxA8PwtI%Ud59eBwf=8)EUe%*4=b?>b(!+t9A^&FXVx$0Skp_- zN2{_V%KOb^w0^l>9T=TKek39MWK*;7%NH&3M{AT%?z|}!xNjW_QH(Nd{(|)ZH)Vz% zd8)Tu`-|+Cw+85M&hM@>{&e{lrblLpN#Td4G28=)%w{VL;-vcf+OqK)tRd zaL7NzOUdInfs?2+Cth2d8HVYN8Row+LO+exuA4&(np4bw;|#k|jNvTaI{$W#c>+`D zQeOV$4cv8#;~#4SF@J-(-u!<<##`1GEVRDB+?tTnZ82`l}hS0{3c77g32Rc)&Ex@J!>&^J@KR1(~$PA()Y>u8W;F;g`j`PhA zWclA&g|n|RGQ+_%^HoJRhbOriAD}-kO`o<`U;P|~Y1|oE7b@1w58JW(LV9?5 zDm_$|-f8dJ0&4+`f$uVgHudhXvHbQ>Qh9qQ9lJAZ%6xCwyzafB$#eI5k0)~97oME| zHvc`dA8yXI;i;+X!nVw}&xGrM>pODa5q4HwA9m&65GpU+5T3qpQ`lYTe1@OJ!_!xq z|H}2QzSBPGbkl5U7CmKGgr=lqZ3*RF@nzwZb3#w)GxWN4h@U6OZ{zO}=SOkv`(s1& z4C(2aO7m&C?}`mnDDT>%@2$L3r}QXOdtK{eevTiV%G<7C#t_U4;%TbGtkz3Es4&{fF!l{>I;dxVCe}l5?xSMlDcye|Kw`iWN zzySAug`2LBcfXkw{+`~m=87;v{s^P^8OG7i-}mz89DRsBcKGJ7ka_$Q&Sl8@YB=Q# zwtseQ66c{CRrEWYDvE_&_7A*9-*KJv`iMP@!P*G-rlDu*if|42ta!SXESrC2cstqZ z83T8z>z`K+M0E+fQ}%K6oVX(NJ*<70>b=?h?Pa0pr}$^QllfoCEb@w3`m?wm7w|r` zvcHe)BL7Pym2+PtkLBMUQ`Hu=8;kf$vLkJNY&)4aCm)f91Bz^iF6y0gN9?c3{uS57 z-g&vQ+dDi+YLXWH3Asp7Xih#F)-3v4@(%J{+~kmdN`9DJNq)q&HCW0VC;yOK&wd4Y zn|J$DvOn#eu^-Y;UH5_5T;^y^*r0qC{X^4{okc~f_3sMIzvd@}jid8oC97wJQgi>C zu$kSK8f~iPr0|6K=qJl(g{{UP%4?ItAMk4%xqZyo`oyfT!;dC*3g9koDwkg-k0piO z`rUg*l0ubw_Zju?Ught;)ynd!>tg%4Ie>!&*TrgjFAL8uwHK#!LG186<@y5SknEr9 zQa{(E#~PMj7mNN+bZnk*3}nZK@6V2T{h}kf?oDXLw_N*MjNv7mbp2}d)W{3I)Wjr?Y95( z7`L70ok25UUucfD<8lAdBO9`v8#!!^iEB+{^NE!3Ec-(#DdC`J)GV^DATuRA$9$Nq z&2Zk0ci8Lx2gqFPA594d7ABh~kM{AWguh@{hQr?Z1Mdt)KM{t5^c}w46wYI(Z=T}& zr^0sJqTS3#<;A~n?I}EiS~N$vmSicrZS_T_kBd$v-yIunyDN6;7V>B0HTVKPbysY7 zYosrX4PUn~cIpR_{6Or~S8j<7fA>AHQ+LsCWk37=*zgw~jGemv!PxLUk<5t=fAjsZ zQxD%yUlcoa@B3rJ^~_ty-`*WNwcc|cB5xxvC%-)}Hhk;5W2c(%(7mx!d$MA~-@Y$) z>ZV&`!`IvzJ5}-__kYNB_5ah1c$o;+Vj{R*i^S$>;Km5C9QEd2;2V$p0=3tJwz==C!KXkocnL0}x_2#@Y z_8orsHFQxe`#OFfCBH1)XpZ>bvpxvDHQ|4DRqu^8*Uk=q#oTz}ow2y<&%6FE^1GyA zv7-6(JRC~8K6a4Z4gAf-fxOva|MJ<^hs+M|Vcxr9UhF8@iDT%;?J#6l^nL7H@b1_T z$uIlXe}Qc&v%^#Q%IQ_J!+Pc?m)sKjtoJX=eqZd*$S>k?_FpB}!cbSyW#XmSbBk6Y z$p7{cZ3xNX3ic(}$p6_0{PgsgaP`~x^(D33n!sqFY>yyKbJNM@^pHgoZknOv^5qh*idW@r-dR4o{ zyRVC`Ym*d`(KGs9@{i1t-xn)ea9`|+dH2Ph%(^eOb&)h!5(`f)yf3yb+co}IoVz-d zM$bDX4C13Q(qFJP=&bb@^S#$X?Ya=uLumoM3>s-4$ z=QpyeZ@4ek(0pGkYUedG_hg&TkKBG-|Gmq*+!+t2t~0*JuEY9{PU|~P1^tUu>pu|v zKfFg_*;C-T#wZ5nUmebHJ1FxF)+bFJ&ElbmvOAB2p z&uDBX`d{=EH|=w*jhSx@?!5eMolatfIsfe0=7Q5gA9~S)qQn)USX*F2MsnDA@(S%Q zfW1Dv}aO6RnFz%8Ew72?%(JBYWJU#uTI^fURZTSsL@t>mOM3} zUT9AiUh}_XkMV^1;^c7SyVP}6&Ir%bUi2Q#4as3HyVmT>tuMShv`;06@5lNzDD?-n_D?@MA<-w3}QNK3PKw4%jF)CdwyfO@-NLU-x@kiQ|LQJ`I zBTDC_hD}weVRL0_*fN$H%CfRz?`6LjpTOtwb(CX!^A(|DX^QnWVNZ4x{*>?xxi>8(?8~qgPTjYkJdi5<4XL4KT0MDQJ((o@>c(2`>d5EZS5G!D zA0Zp(s8`EUg+Dr<*}s6NQ$t%)YG^NXzH)(j+WnoYQ>~p;Pji3F{XJywSXQi$T|Wj! zvSNvx)G)X%)jrnLu<@>xQ1j@l@M~ciRkt3d*M4kPxHd_<5Kk&|ZXs{QmX9O{-Oi#r zp-EYE4|KVTevYHF&EDiJbOp1Xj(qFa7S$ByZiPfV^`=0 zN%!LWNo+wGo+?cZh3?%tCLffih8?|k#C9%zPi%YD9kGfNSB9sZ`?0&=J+WOU(wsYb zYwQ{3y=QKZ?IEky%?jJy`!CoxePuW(e1}mF(>z6Ouq3Rg2Lt#f_W4hJN56o%Hc z!q8SwDE$>me}(pMJ{CIj9t&N|9t+*{W9uH1haU^QxsNGx9t-{CfNP2D$Fw;f3quS1 z*IZrR$NesH8944o;EDb|MKoV8)Y(zRjZlC_~^@!C+jY;D+d;cH=Y;adG4 zb9Ky5tXS(m=wA<8xht>!dU&ecct_^f!}f|lidD>C8+H`04LgU|hFxdYhRSnm<%6|h z_sELao|?6xYSpJ>&rGcidzXDWwr^x@s9yQ?u)p|^Vh2vI)&D6BHRYd*rC+5 z;qbh*p>~1vSN7@Hb5);?)%Sin)=;q48uq^ozvp}X5&i`K4S$X&@D%EB1V`~#_>%?N zhd7GA#^2+6_(!C?SFn+Ts?}G7JaRSGARlEYM=v(K`t`8!rLTt*^kHoJ$~VGhoTi_} zD|i*Ry-&G_dvGrv!bh;VMJi_-O13dE#d5#dA1@Z{a)m5succ4;}ak{ihhm zKcVyC^`Yxi>kVqo3dN^cCiH*Ft@^m-=S$%3S}!$kYEN_pFKfze)PPNntO!kGY!te%B9>2lcIMx{R?U zqO&NH!eQpx)B3-oN#QxNe!?8ZdG+(O^|53Vb2IxE*IUUp?%TVJue!g3>}2lBHNKj6 zaejMde!Rf=Y@xAFa_0O4*9To6A`iS@eBjsk4St8$G3)M7bS38EL-;T@Vl%d3C-&k1 zzJnP5o!I5yEtR=f`Tp9*g zTU>7?+qiG|d8`y=b!aqRF9~JjP91W^!mxOAbdDrTc8c}<4`x5zWnfylX zWZs%3KQ53T$!$6EYl{4uCclzTGMBI~b$t`LnYn0*{OUdOC^r7OG-e}K>8^Y}9M;s9Flef$)Yn6*fKjBD`@+=%z!4&04Dz#rn1_%gnV zZ(tjCVmA)p5S~LLS}}wl;V1Yf{1TJ+HGYTJ5!Y6@8t=edybE{WeYhJRLXq;KxXbvT z@?#@e!d!aVy8BVFj9Fh)>@0;1FJ2ZlZb=R$Ke#cJ?z+L* zCdTpg1GhZy4B~K2c%t5Ub4}@CYq%|xcis`6>bX5^8<_9k6kiMD2mO1ihu`|234RZL z&AE7~)`;mNZupWlwd|_A)40ap{_0fd55-)@6#a+5O zD^r^Ij5f|g^!dK|m(&@L#KM#2VD>L^?Rsu6H^w0ztCsubby4Y$-$2svt=Lc;ixJ#R-upAOYbuW`!V`5+1I(}B)gBf2hW&m*?-2kHZ}~n z9=+3f`U}RDx46#jeseAdM%eYZCyJAW&Ru^+`hC^)MXqIW6ETmD!Rxuv!FwiyKAY~n zCU2E~f5tt1e-d}8{D|I#oAEpO8@)68F9`4C%j{F$c<;2!!dKXB_?r8hxZC9nMgxoE zFB>EIICtk}g#*LP_ZXl0Of2j>IVve(^Za+;hHJbxzXreLTCHoJ zV4s*ZqetIyd^U3hvvyTvj`HK=DelF^hVA4YWADkqe4RCv(n!^Ob6B5|Ctor49@&eJ zjvo2tHfzWpRYwcgvHQ6bwysy>^0+;lAMx(tjr=dZT5r1OuK6MM4|~2aN8fXBpES2n zdMnbls`noJ+Mts;ei%BC<1b3rBmB{MtUAkm6o>M}zA6Y;S>au+>! z4tsI8;ZyEcZf!_*&x_X1Mb8(9Gv%B9Vbv{RV zarz8B`gR+dtV>vOd)V-bw$HbP`M2WgF7s~uA6MR$a5H$XdCF@pzUvm(_U8rdl@Ok3 zSDv}P&3CU_>>CN^ffL?wYb@-z-d>;|$3y%*+F2*r;~4`Wd-%Wik$9*Q<_*fmEZ6Ts z5q4obdfqMgxO~g}d}Q~tSeSjAe9r&_sI>&?X&M&9}$t zV>gBU%RG0L`8Va>xH$g?`DT6avUGL-O*83XXn3yonHwH|%k?M^MQLRtyC~jg(oeQ` z=1){_ZHwG~MVu%@_ly7NTSW2y`t|DCIiA;+8Qyb)I-hut>}2jLF#cU=eGS>e+{-@N1K&>$FekF5clSr@@cvF5dZph7xnW%p6^tvVh4Akx|YLG*p_Nxb}# zSomY|Pf@Ww7CsxquoHddr^9Y?FAm_# z{Q4>m(KpC1&ykIo#qLU6jhFA$hI!$8v1{q$pZM$8JIEU`7w1o;ksf$aYPDQIc;>Mxpt)g3H3@HJ61gx&DKreey)J{!ESW z2bAL}C?bpA>plL8+~MDzJN@T#m-@TXr0~u#dS~oxHzX zT|fW2P?cqj0(K)sRNfKrrTGd7{ORvtw^`fzDo`Vf5(?Px;>jw1RuMkkJ;8x7is^~#_| z@`!e0^BQ9$%&nQmNKivRs9Zbb`m^RaYAdue**!d&$!9Qlu5bL*cX%u`TB%km#52> z9|!am>%Cu}btKPQ2RLjE-4CrHa&E$Z>i5hR^+T;POP+n&*za@XFY)`v!mh#}d*6T5 zPrFwB-mMSwW#-$N50H)Q=H45M{>*(l*v&FN^ua~0Gxwp<{U?R_hpzn@me3dBckX+4 zej89Uou+YPPM1({*%i5o-t)Q#b(!fV2|u^Ym6_7zK^|&3IP?W9>j1Aa`l7+_qunC*7CEfTRp1_mXns59MPhlIj zryKvn4(+9#8OHz8oO!Fvep;Emo7|%ee{=m)!DYt(tbfYZZ_CkdBM%Nc=lrzwPs-Lq z3RF1)diOaG`zpQsrv!`eS5l#PjtXN(@zu8GbV?bObR&KT`xj?NhE zDb)TU`xYx>3$2e{Wxb$xh|W+QTF&nbX-xWPls=E3Sz2vEYlD3b?asX@Fi*Gg%5do1 zmEqZC(&pHe;qV1%b5XQL;quU{q1;oE=4>nT>SUtVJiGfoC~gPI_A_arZ6wY5_q1@7 z>~g)6{W0cl@&WFCA1l#=zrt=jhaaN&Y?=ZfP5t4F5|p9@o9*M@wBUMqD=qBd=U(y| zR4+{n`^W?NY5EOmp~7B{?XK@6ca)ivBzIM%`6ohJc$)d=(#N|!XD7ag7jcy|dN)3T ze_;;f+b|a%44SdWk6OZXZ#VF#W;175_l=s`2Sg@3?HnDrjzG3Fr`doU5{()nlc zdHi?$3%B>Me;B`y`B;I~D8X9n!?*DrG~o=!(TAJmrE4%nzk~cO86y|r_waFi8lS_X zD8*JBMk9JKf}dalv%J%Fn2Y(i4~ww^-%=+0JeeOU=45g0WuAkNLy{{h!Cs7Wr~6)% z!R>AM5Plyk@ilD5GYI^+8aHDB?#G{9Zu}IQB}K2#Qf8&{@7`;}2gn8N@4`K})wOf# zpdXnt`nfvi+vI;$PuQbKgMjzSXzl<}CZ$aR7ge>)rpa>|dh41JB}@xB;h7i%Hy!@1Ozyir~H{ z`CCq3LB{bAyNB^HEW(X=jo-iF<}Uix_yzN6<_bLGp1aw-4gZt{QrjDL*03oqf-Yr4RI7A+1u4P~6`g3GG_YK+hk-NW4$LGwOZ8CpF{nk2SWpn&>PX@S~FVc zt!P{4yaF7hcc9a?E_Bn6p=Xiv5zt5P$AD`I4AO^CbV2yBVQEg-h>{gKp%k0wo3Uk` z?~fv|4ciy_{+Ox%mtE#RJNig@>VI-K^PU3rzk2r>axe2f_SLTMCl4?m z>{9$+v5+6`#%GldNZMSS7yKh;Q<+Z(X~@ z_gLn8kO!G-QhYDp`Vcu#Ek0|-r~8M<9_HRs@m(&y$<9&VgMGK_$H+G3_7lF>ub8py9Zg)=wc3>yIr40HBEN3eEHP$JgvXqzW$r5~$evtfo zd=t+r+g?yNeM4RJ_b>!q^gjF^K8vrT0zXo}8CEY!<@TM(#z*kS_;VCv7ycEN!4+MP z_uvCqg5Sp{@d#F<68j^wcl^v%#)$Ay(b1D?j1a9?@Z(DTLUw#Wb|5DtqBfQmwbu;b$G58i?+3=f<`*GYa;$F@*9dMLz zF5sP16ZdLUlzh|qeNufz9kb`F^G{z@ zm)=+RS5B$tYaht3lL{=ERB*xX71CeSyYh^}EC0&)_}|p8Kj}llZzBADp37y#r}hEb z@00{#7NSyYW zydizJlD_9*F=5{aJtiMWe-HX&Mjj>aHxpJ0X}pbi z^}`Ri{ywg`i))|7z6U?HQ}ln%(EnL}qY@_R|9~BmAbIje`oA|Sm2ua!U1tB2#r`LL z69KZ&XGdGg+09%Z&-O1P&7MUsv_LDg?Pd%e+MxqFH!b4($23z@-g@G@sfHlm;qpaCL=3>r4<~MwiH65(gILDfe?er&*RZx1KH63SJ%W;}D z9Z35O_U1z}*C(-OJbiT@1zW#gWOOvRjnOy~MEWS5UMJB&y5AiEjA z@`V@hPG>C-=61}cuikrls;}O6p1qcs{i&?wK?X6qSFvUU>c9gIr~wx^!F!JJFldDq zFl&kG!K@`Rzm4??WvzrVa}DLFh4N8G8JtBKd66=8ALVEm_kY0k__+?d`;j5a(k#kQ z7Q6*d;y#P(FTwp0uGvXhyq&QA1xqMrom`VnS^l0W>$s&;Ki`9o;a|9QaQ!D-_Yvjw z4gCKJUcvrVt{H}xxnDWg-o`y#@F9L*B5|orKd* z_#bfn3c|S>zyFQDF7B0K{Lo*HJA19-*}t0riF+ykkc2)NQnu3nhqNQ#< zr_Y^*%*LE^g#P~t`v1s$>QTWt`u`W0|3DUFF2TMO_cCNT{wune|KR#6WHsiREzEzE zGXFte+y%9}d4GTh>Y#ozeROEtGh0pgZQjKEM;FgB%rix{9i+@0p{$&s%pf~4cb%gw zU7$=M{g?yT2XPM}dohQ*82{n={_th`9(l48UL{}4;YITEA8-lJ<~H~i`ISxHJ%QbE zs3p%n=DqP>*l!^Z_dqRqJWjqJC!A!S%ToOP(L7h=S9vb6n${2#&{!2fp%x0En%!{6U{jz7e{8Sdix*Kz+8_dgP*9Y2S-$0F>Xhf7S@g!wu= zO_-nO+IjE+*F1y&$8i5M@+k6E{1g+Gqm>k2%=DGi5B>oPAYC7#_lMBWnrU zg>+(eAZxa9h9Gfh0WWcCh9>ll&=6R{*(avVE-gM4X?wk*d2v+g!Ni=54f5-&tRw8xG{y6m?nT$DQ^g5+d57Q1` zr}RD4|6SC7+&Sz

PfLe;M^3O6F7lq3ksEA1cmM|DkH~ zI=zhFnz_tDBF*pZeTaO5ym8N={?E8Xw~-%D!V}Oqlk-8LnXtUjg1!~n_A>ql?dUt8 z6SpqtM&ASeHH`nkf6ga7wOnED)psTHpWG*$c@1Y4@GY#oMtu;-HD}iItwaV(*(<(< zJ>$>=-Qa^Z=zw#tD_W*5@A z@fuZf&V2<`J6P}4z?{7O8kImfltCeO1-KQ@pihO&ZJ-U$+<1QA8rtw*qwm0)w1c-O zJ5I{l!F<%?tXbR4TD1+VS=+>1wwF0=>Pj+oDv5Vq+#DurAYq8LAdpO)o1h<#!`+nU z@4=Ih4#(j#%H+#%4drq*?1bl`1A5>x%HxM{BjxjM$f0a`{!BUN+J4@Z2PmUyrkui0 zp$cxGtlkS>r+nT3_kst`!UdR5nOz27A*==P0K5P-;D>tXg@40Cl*1pvPfU4)FH;`B z2|FoIPeMK{CT)K+lJfODq(dv1-*SlqbEa$pBnHScNJ5_sDY&H``I^$mzjR~y zf^5jiWsY$9*O(h*{t>?gsjLOF$Es-WS5zGHRh5wEW*t`9l{{12D$eoDE9p-VUUepO zb7#LM$8P54zy-CreE-06=4+}eL;d}5PJOQu5KVUIs;W2m>hT$6X-hy&iOS!lOo`DaRQFbUF z^WZO-=YPn%36|1tTMhTpr`Q6I!Y0@ZKcGMQ@fT?$!5R2B{2ex5#-12RhE?z**adsQ z1tZW01JDLNFdb~vk+==K|F-h{A#o$mACmU+{I~G_YheB5dg{`4=07%2*S7Kg!(Z0W zt;*)woK-ykwXFX{<{#ww^Bf9KFgJ`W#$0la=YN6oX%6JYGon|}-=3qA+=0Sey=X5#p8i)-|cE##dy(j#*;9+!NWjB9n>sjylL9) za;~|9v(Fi*a1bB3gRm3a%*}^Cr|+UqYG95YQc78WaEiHo;%(;mGcHWc@n>N+bNo4# zpJV?IYXk6`?`8bUM}BpYUxZT}VEoI@_*Wd`Ur>&|0>719Q&q_L7Xu|VyNJ&Y#=rJ5 z{?ii7)nb*w=3%J)0T-f+qCM8yWxF#Q4_+#=q7x{N4}IT#^nv%%_nko>m~#U1ArE~a=k65H_bkS*=>wM%MhRh+(`PQD z4_dj9J}~{}YV519cTCU+K0)92usQdizVD6)t=e4=T6KJb%~|3N7w7|@=WHI#jhM~Z zMCLa%S_#XXCF)!Dpw+hgLF*x&*>_+w`~ZGMIsSk;u=;YwRNz`z49g)F=EEcSORV^! z{(`@=Fax`*jh_^^t%J?*V|WSnLJcs4ZwmucH@4Go$CnsSgH5m%w!<^< zQ}`v6KqXB4@Bzz>Y=lVnt z>}6w5hkG{dklF0+cC&1o@pE$0GM<$%R zOo@zP*ujitBtr_MLK>t)24q4O{FlQrkxIUNfV;S{cCn%lN?D1>TgfDZVC z_XtCrwoBnvut6Gp2p>TdVXT35a1T5}otQ^>UxAyT4w_*Q{zX`qaZNTX!tV+0y#fEL zaN7Z+unGHTuy=zWuA*Kph8rQD_XN{}wxjSnxCFZ|!1MTf3BnN1T80F$otdRLWYX5@ zN`}Pc)0qdFt~9=>>0bKkO!B7gp3c0_bmo1gD`V4iW%?K^IKr60cJ=}8W6XeWbJ1Sb z-EE$wg8hsspfB6S{%5|^C9I(+Jg=g*!d!oD7VBVVsfB&itSR`TkGx{ol*?e?EH! zpcA_;=w8G87x>W!!2E`A2zt>+&IdeC`yXDT42)2QBEK8Hk$U2rsSx;A(Qa67_EFAM z&m8I%vTMsswO`A@2obrJeUgu2c>s+h(W~iZ-+F4GPwiuaJj6A9Bvq?#4X} znGG4!SeK2A?9)0$zm2wz*|(L1Odg_LkF;ZsM<(EIL&jnL=z)tmNcteK;u`o0e2n=z2` zGFQ?6TT8h^zE2*UfV>TS{~qW2hb-L0_YYaT@Bicbhg%hXs*yEe>H+&0oCi50EtBzY z`uy%n`v1r}%=H2K|InC9{~wx5>Hk9u`c`N=&-fR#qwj#uX`k}_|5V+KA@qQsetrOg z5Q5%@sWrxZgkgT8`^%Kc%b*aRf-K0L{&}sX?7a;Ia4%(R24!F!l=J+vAq@SPb17S& zNB<^%4xoPryAJpi{Q&GFycYNwcK?O>C*;+ggn0%t#|YY9#_k^MZi5``KEUnY`0vDx zpAxfug!yCmDR!44JG4i1b91ugA ze+V9U3;qfm-Du+oN83E|;>*xR`SDSX+9^*R$WF>&7qT0357JK=3?PG;L&#p-_j27Z z+)N(V6W_1|LJ#H)!)L|4Y z2+;0f&I0=cNaUND&b^<9t(4G4Dm~JnvJ)LDM_-Zq+5Q7$^+x*t0s8;oT-Y(S{v~&6jlO4n zyH*fh{VBfxv?UsmO&5qCd6Wz}!0|J-wnD!Dm81`uhS{90{CCQKHwCe$i|;?rCvqMp zenR-^#T?#4{_G=vkny=LD*awzE z?`ON&2ZeiP41NB?e2*b#f_BgezSpCCuMhIQhQcBCvD02Cr=3u8mhU-K;kWt-``%B@ zP$mAI(d>g?I72nd*cT6;HB;vv)vjgV{3^~pnq$tU<{Tux?+ws0gYyNkYo0knUTEJ( zdt@)?6Yi!x0^NkyLwKFrXQ+$tg7Z1|D2j8CAV4^MT-(oey@VBp#DDX9ql@@nlV=w4 zJY0a>0_Ng5P_z0M?5TT-GpoM+1x>tBW4-ZfJ7;g$t$*SF+@JW4mb|V1L6-lYw{;Qe zTX{m$PxDzRsIl5BKH;oG&apA(#2PF4dge;m(DOyi8MEm}bz1re;%h9;!_NFRQyzPw zirMEEV9#J8n}k044*jcNSc<~^E8xy}TZ@pHY{vL#3G@FuxW>;se<$zxcj&7VUJ+?( zUCX(24P)qS2+<^IhOV^_hua}CrrjGNkREowINakH6^o6VXw&eNO8d3v0y zmjMYevlY*|VR36_^WvMO)zUe5Dl zzP1AL*UwhohS{8vHJh~(j3;a%j++@z;QYM;XbQ|`Y@0rwZ#Lh$*=pVRsahIlvnFe{ zn!%MjTh2nx&@Y{>nz-4jv$M7Y>CT+3S_oku#804--(rCNi4^N2exIT1Ypiv^ZwFzu zduPk%n60)|pQ;Dh&3(GMW~&q8_RrEsxA5*?OCMk}>#HB94*M)cmZ zVKrxl)p|P2dWNuk3;Wep8CJ)}QPLS!mm{psaF}$4)xFWDz>ZP&0QwXf;=I>`K85M) z^wQ7hUr1kwc-SuVF&{F7vPsy2mO7Z4esi*pswk*#|@9N|H zw>&Geuuoab+m$_3W#ydiQ?9Rz{(qj8f415xIMSy={1;93vHma5D%qZAmF~;4$^w1l z0q4J6VEm(yGvmI*{QnuoKdxl_gY)5>d%hqShjZ3Of8n$7kGECFILrU5 zTPlPi^uzbyv5%4x9A7IrHaX;s@%o+&nm`$CQ1L{3C2H;GyZX0t`o;qJ9J#` zgU8wbd0cfDj?>mX&U(b-{07`{#z2p&89!e9wm|FNVWqp_W{x0?XU8SQ> zgHL(CwMBCt_KDvq3G9#v2@}6l{E^=&hWE(La2wnMTcCpL%XhuY``}&u81wT`yyaci zg1$@N?OhdYe3!o4yUN?~u6}`ADl|h&E`9k@`t))1^;4zBO8WYY1=M4ghr4I@Ovc11 z8*^vMwY|i0&f&~FsDV(JJ;V43@6WKRk^Q_o`{uGwm_}Id<1{XK|GtL(o0rng;1%{1 zMgAFh0eDT=>}*>66*Box%y}DL}=RABxxz}^%=R@=b9%X)(^=*|uxDx$KoS*T}pyIFWW<6<;@z0>_3%eQr z3@UkTP$@HmjDH4Mvk|0h1lj)+ROY@Q;~zm~Zx1TR;a6^Skn%sKd}P7GpbGbQs|Z&t8|2zn=QA>XY{D7E5w@ZVyna+ zQt`rKtLRjSb36M~5DqCnGo(CZZXl$bb0K9f4=Jm#+{)b1!``?h(2CO?ZnGh8B$weNUhknoCwK#p@(-vNKMc<+M|ZUA=N(~P#yR2Tqw5O zXNs*_q-$eH&J7_s_JvflzRa?f)+#PK#fopJRl@vwaomZ`9+O1Sa ztG|E;C}TdR_dwT%DP zs%&em%D2?2BBoZAYipVNsa17!t!lP;BVs+!MXS81a`H5D5 z_ywa{6*`q-^=`6T;oY^21JfB+qp#CB{wL zs6lB+hLvp`6=Tc9K5M31*)!bCeHnYV^5(nQ|K?`?uTJ;lr)c7^idVTA>u}TVZBQAq ze5ir-FNgI2?o|zTs~WqSjc(SHyX7PtSBzV=Nca3S%d^$3y2@0mevO;-r&^8NuPM;L z_@CYK62F#BEv$b_wc7S2TE6vee*f6bn1@@P`%|s1!>Lv`vWIy2x79KK=N7LdTWHd) zUI_1RP#^M9>}<WXnC*!&(`S>Y_cW_oZ14jx?(g*|g82W}dq@ z>`{xmQLR1?_1`1k*+%x(dDL;nqt5dlb)EI7oBQ;f^2qP-DB$%l{_jz!%ftHDT7?Te z%sIWKeq?c=PDNdHD$K0o*^!3%=~mukGi{nW<(x{ks+ZTPYE6?k{>N6~f2tYFm?0=;t-lsr z*Xb4X9>R6w_ZR3n-o@rTNL-L9(3)DOmhIz=kMgahJTx`bsnK0WdzSXJz_ol!d=+dyR57XIQ!G zGpsyher1gc4m)Yv*D(Io%(!ihN_N+%baS)HkmZ|d*#BFjO6;pP)-e86!~9Q;90zOU zoT!oONDb>hopPV6k!SvIRfnuUT%(4VW;G(4W;8Pv_*;1ko7J+zX0>A1HnWEDj~cbF zuVMXjjXIHCTeGe1`8BkEGA#ez8U=6<`oI~*e zhZ4FR0D z_c&BA*TMP64i(LGux87lk~o)2k!7ckFt_SZ#WaU1C%v3cz}f$`ltk}VWCrv_-ne-sb)u(<@IG*Eyz~l)yDJo9dxq(sfFK?bMpRk z@~v#4{p)0HsZ;(@rvm7M6HeCuI_dj36%ISqN4)yo#QDlr<+)wTU6yU-6uLM=B-<*Q ztXHAWrGirlR(YvQW!RT`T`I|qx2h*yv=Otd%C*^61v24utK!$X6nCPP@41UMlS@hV z1k1j+Rf$LFGn{a-c7!r?+NBiklf66J%Ib0{ld_kAeFygKGmgrKY~wlhtZ7v@_FZ#Y z`QE!|v$z!8lVAl>Tjh^F%9=$NC^KhW@?huY`(2A{ znsBM{h>N*1m+FUHlo=QAPM5r-uQ}|JpE?paMg2JIWlgs84fuS$1kSY=x>tnytMR>cmlDxqp_i>e`KAMck9=KbQO@8DIz znids8(fk&kRf|&g_gblI0#@4QE0qr0_(m7-UdvkURrWIKH}9>uJy$BSzHjf9N<80d z*=Gc-r1L>5d1k=ch5r)#zlf|r(j~Xu$M|D9lHp2QAM$m^Cf`I}hWr9;@?zvykqqzI zzJW~o5p(7?#*Xi2fg5%s@IClGJPtXGvrH1!l}i{Kjb{G#2Kqj)u(weBHt_X~voEI~ zwVF$>V;uP=`Y^GqlfeE*H}Y=Bt{J=kK`vdzI1~PY`2RcZFXHcV+`bK0v(D}&PY#IHZ51aAuyRLzP*CN;Duc^~gFq=T6ZWscfrwl}knTRu#@nunO>(ACq9^ooi*> z$7bbtTa|sGRapmHl{wnVxKJx+Zni4To^7S>rM@3IuH;eP2eduy7hFn2Cd}u3aqhU{ zkT&W>KMM@{8osQsN|V!>TAvWbOwjZw{fGpCebVyv}Q&NB&? z<3fT}gS5@`TXAdWDxNXZgbQIS@m#&^SN2&+(Q}o2mNp8LqN$L^IBNRgu$2+xw=&oE zSy{fh%7z@qUUPk6E05=xe_}4*KgNBT>@3>sw~D=Ct7KcBRhk>N%69mza%6>LI`h1j ztIF4;>eR~_(_`Fs-{o>5U4?yCEz-T?a(OD}GS)X&_2Id+f9G;OKjXmLn$%23-0SVL zTHJkBYw29IWd>d`mL0P%asagbEhkPqSwlp)1plG z<*a>U9R&6{8v|BuSFe?~vERxM^jZa*16E-}uT_-FIJkE%-@itc=FU~wj_E3AlC6U4 zDtGl;RhX;e=BkFga8#21!pj-|x?HuX)8*dU$oVH`th3*$5BFOQahI#{l;3K?Z!>kdqujQZ7punkqD|qB`h3Mz>?!H`M z=o_7`{%O6IZCa4u|LC>ik8$?H7GxDg$g#a?!$Wp&!2T58+xsh-OPWKUcs2qJkI}}$GVSs zwEyR+ig2ra4U9{0{!3-AtY#TT?%nGkPduEn>i-)dh@`{!V<)p=%~y4Ivw-CO6W2mFW4dQ#GVI%tJX^;*3w zs0%|M>r?t?@*Jjxt?ILVR>icCRY`d$+srymCg4k=n^nqWd=YCViVyc$1&%9J2zfCf zD}Qa+%9%;Jsjpf4uTVB*?BRFOXsf4n(N5kHwo-O8Z$vpw+7YsnAHPD0>zieV$T?S~ zl)KG!j5}UI-EUSu^foXrLtP2d9|>_EKY1D;AG^;4tsdgh*%h|BQm%w~amGk6W$R3#)^Hg&_ zWI3kwS@Dj=dYJd<4h>kd%3REt#}XCeHxv6j z?2_>FF7CNqi`9+3_`+hwz!$3z{Z&xDcZq(3EX`f4z=y1t+P_?X#ay}Ux5Q^L=WuaO z6yFC&)M8zVtczK!TE`N78GU2w64vQ54;EOWhEq#e1F%@F-X&_r-V5#6eHVXire;tgoy!+9&$n2?b4zIVFVXk$6PQb!)-Tu3(f4jyt|X*wPPF0*7tuagqy)zz z%73)%kc2)tu!#PDH0|GL72!91S~UHrXxjhL%CawF|7WyvwnZy9yhwTY$v?kH1@1*E zJh(_jLyH*yTBMT0izxq#IR7tN<-4O*ML16QJNf0}yWEZ5apfX8mo4JmyoleMWR9mH zS|0RuYZlS}jb{EYS|8)TDK45Z(r9_1WyT`4x}&N8(ekZbq;~vt>|LbJX!d_@TEsf2 zMe5nJNdC=>DF2HX|Ba@77p((?{}vnri?)C*aHHZNew6*MkT`Lp?2v>$8B)%({}s}( z`zzP{4}1!rld?EjBZ)sYxgkH#oZG8dl|03G|Wwif`X#W@S{x75bUqt)Ai1vRG?f)X$|3$R_ zi)jBB(f%)@-WAdQFVYa_t~P91s>X#&SyQr9&ESO=_X5Tjm+GtB!_l=s&eR23guZs> zQhft?Gt`wX(6^9*zyjTc3~gSjXOPdq_u&W7wtgvNbqkoEU#f1*o%RL%ZrTEMpIV@v zBMX$0yOegxQl&YTDjhN)6SATg(7#?Pn|G<=RxMEcO!m}HTcBSOhCOvDjjKeE>PKtrSz|ss$$;_s*F3TsvS2lzH$TOb4yv1vp{*=BY*1x)~7Du_h}ZWX!`=@ z;ubLBxt96wwTj=kP6?2>b)D>xggzNk_O4Sr_e_GHLl&fg6aGz^%!yjZ{+D&CzvoKz zVfPvwfOlaU?Odmr!+VNzzJ9_!&u1kMW8QTy(H3EExce8gH0{rnC76TXxo{=Sgea(^ z-Zw!s`sFYSe;bjTU@KgXc^h&&+(}!r6~BAY7eX$4757TyHOK~}8@v#P{m_Ab+pgOb zxA!*1J8q-?bN&a|y(3EEyoO{qZFJ6aNTqF*Rye}=2lL*{OJ^dp&W}+4nfHF2_Rn_K zrEOy#d=K;9%rO^omPq8hcIKK(@LxKEdGKk>gVWcqKvu?`qW{l4IBk@gnY4en#<~65 za?y6GT}Aun%n^AWA5qGc1Z z4=29%g)@|dKDqR5=7eV`bvx-N4(ar)0{~8NK`pq!19eaj4bTYx`Tg!0to@%u z+jrZq)yBPi&_0v>@6b7)w(`U&b+iAV2S5IEr&#mF{&xt$|9k&?wR@eapt5kCDxiGJ z8!C%ir_$7Q^!L}%hWV$8yz6MwtW&|{2dsZTt-LPE{HArv+3>y&Q;sI!1WZB?WoilK zErgs-nfwCGh7bAf-GKcEtLU?A`$Taw*HPft>D##V<99P|i}4fMzD_|1>{-Y7-a7Sc zdtcp-k9eoAQ|In=>e#VP?c3JLhir5IjXC^vYDxV_-pPNe+5Qo~5w(u-|9`4s*9ZJ| z%sR%9*2zPCc5hzC*wQ+=)~}QE;P2(wvW~si>vTW&e+UMM!%vW}z%L*X3gC6@hhYWr ztH9g_oiGURz?<+}_#J!%|Ag7}LB9k`;8EVMw;*rl`jyCQ;X7~-{1EQO{wd_6@Bkd* zUe6+5f?vQ3FpgaT@_zjO8fnGwenGy6{c$)7jkG6P;B&bB6}J%ie8)F=2VlMdz5(00 z{tx)Q9sSMl1lUqp2MY0}tO0;T^ma&M&MX;HP7EqFlXazb=FTcvXNt_koQ2GuHmn@x z(Q+XV@}U3cAQ3k^BtbIxPZK`~o+p0Li#`l} z^NAm{Z5@`c;SlXS(nmTwkzHpv-vZe)!TVv)F*Teef1qjYu$po6k_Rp1Q7f{BYaBzo zUk>tqnK7(du5p8Bl>7x7<@0&U#ZJn?z3?Ku46nkkARS)9Jsl8UFd_b8vWhy31Qubyoh`f z`82!;zk*A?$vu(%`0qdtBA3A-^rOfz7>Cb)lk~v}^zXp$;RCpgaj1E44P5n2?hj|N z|1a`=g|Wx{FU+5Sc>lSuTZ~;RvIAydcPYG%ekt~sp??qkAK*I7U&4G5{f)+rI{$gN z3cim0jmQ<)--^5g9)#7Huf=US+=|{d=RL)l`F_?#CLj}8Cu!e9{ohCZr(UMSyvO** zdrDhR{f7+5gsctJ|5GE%+4`Pxhp7JtssGbnWBi-?4@D<_t^e%*+;&1`XQ}^CahUoK zk#p*gQ2&=x|5s7}*D_C!ti|l!!n%3lW7f^r@8ZmoJ*=xIK26AG%wFtUaBoGn;orB4 z^`Bhdf$YTGb(s1AJ>aKa1R!|sghIR{dZ{1bBh;@G)GuTe;~~{h0hLe&lJC5o?>;iOkaoZc+5x9%7a%i_&`#J-JE7}cr5vW+07;XyBOsBsjUD2n zhG@sKP=R(&T+FX&hY%0wg)jsgcn^So_lN?>?o84_J?=zy*?G@-c@JQ2+w`t{Zt6K{ zY=vg(xOYGA0n*e&+UlcOFNpNewyKNz>~|HBPRuT3&3xKjbKi~B|5(bxPE$6JFC!D+ z+vwLrHTt`db;yVR%s4G&rXT$P+>L%1xsdPvefW6*9>dR%;6v;$VVw0o{9JW2bpWo0 zZ_szT2DuoP!i}(n>sBD|hWlVOd=H+4JKzy`44!}=;P)wb0=`Gszrrp7(%}{CvXMKG z590O+Jcs^8cnp0jZaol!VHkiz@H&jcQFsrY!0!jh58+AlpCI=U#w^C3FN3GCyBb-} z`yw9uWY~$_^Kduz8JMe)E(pOe6k+!#-cKya(KzNqa2(EH_W`mT_ZnD1p4|p_gB$aG z$d@4jlHpg755>@ky9@au_iaM9VtyI@AJF@FCZ8ZXvFn3ZvHuk;UPHPuC!ile+UE29 zhxldGe@MLWPuZzUNz|oe>Qc&;>?4Wd`;Sbg?q%$u{_msyBeOB*%p6cI_?7bKXMf7 zF@}o+)Lp;V-k2vo?^~m~+mtuJ!_JaFO|#3$QP& z{6s}8`Y*PBq7pB^Q_{unfP~rCgsi}=vVq@+q8uKi{5^(DrY!D6z6`%IW=O~GE!=z1 z4?s5Nd>BT56#HWIze7&KAE6xc=Q%TB7R-aI;UnycI%l}?`xy`fqQP%K^ZOal2U?&7 zZ^S_YB!V51AQ@5&Z`dNL@sn~e9yJZpA>R;j&5wkTA8{?fwjg3#5V0+Y*w$fN4;@-y z%m=y41Yt~M7$$O2GYt`47P_p6E*o8TM3;juC!#Zcb0^wR^R(a(@ufy7(?Tue|28es zVl7d$zM-Z3P}a3E%it!XM0Bx6x56kZjk3xp9I$51j;c55@?0{!rCp-(g;5m36UVz>3Q+N?xf<5pvco|-Sz3_9GOteMR_=uVi zQSA{mDWWDv)YOQY9#JzQYF0$ej;J{iH8-N>p}y}oVZQG-p}p@n*S;U9G-}XX`+l&% zs3Dh8d*dRix%QO3DWZCfn$~I5Oyhbg)A&D?lWEjke*5g&Q{{Yuu02&zZB(xZ^^e7F zqgJFuRCC8aR@57{rqQTvCbSRYO!yz928>#6?*2h-pHbZ=ZXdYKegBkf{QoJr!>H{J zqjs1aIGt=__~$m0t`Fl&jGA6+)JhYd59`c*KWsDc{7ZbZQ414|T2lL$n$I+6$|IU8 zlUILnnXCTNW@7l4fQj3g1QYU^43j@+3JHIizNzb21G-!{=mztDgKpGKy6Jk|bc1gC zeO%%S-Sk0xKrxEBUNJW)<|h7)ASaAkk%AiWfDhvy6LE-%IK)I8VigmMLu|w$HsTN) zaaf_46*#PjIIM_RtyIiPtX4*>Rz|E=DTWE{m{k$0RS~P36>~FIH%F{)j##Z$%xbJw zN32#ytZq@vEm++WvAQK(uv!zbS`)D%iHbon2}2OsNTQ8OWKg+3Dz`-CE~o^I zN?4I#A_+1<#VYoCkS1C4<_}H5j8%c8PYQ;tp8heXkBgukE^{w8{nyR6WP z>qUOAxQTz`v?7N8h$dDma9ybtt3*3-#cI(?TCqke2nNeYU`E9<;@t!piRntMyk0yn z23}UiXl1NcuFy&>S83(VTDe*)Z_&y%S{Vr-lA@L7(nzXSY1Q@OxiVb83iq2M>Me>S zbd~WHdD1uQ=Ih0yjwJhLV;y*NX=q$(UP=b>!Kv z(VFYU!;fUs8e<)K{@>EKzO`1jvQlEL)|&sdTBqA|+d|!To4%vf1gsgOSkLx z-MT|}+^IXBzd?6kxASS;sXOo0oj=f>Ki8c@Vd&Ea-KD$MKdQTR_wBm-o}cUP2e#?% zy}C#DJfM3X);&)P%Z=J-{^4G2+@g&SYvZH3SNG|@+jSr6Cf%?5@74Ve-KqPZ(fzOW zXp=VGu1$An)1BHx0GkMP)AzOMVQqRuo5mXSU48dnefJrC_j!F!59on=^uPm~;9))R zeLe8d264e=eP7>yQ4i|DJN4iuJ&5LKdPom#xL*$u=0lsbMGx!Yjr>PIiKP7zJ#vR0 z`GFq!u^!c<8~NX;t@?o;(+~By{!35jN7|-sH|)AW+irXkUevY&efqI}yzQrYQcvFa zJnYev+n&{v7@yLUyZDcV6?#ff-L9v$>8WS+)Gq#G0o$MJDKc}rwm+=xKYd#Nt^YRv zJK$Lvfx1gi>uG}giGD)bezHwFw8Q+vGurW-cKl2`UePn!sh#V!ll1Mx+s=oz^AYVd zQQ`URMBT-IEUeJZm$Z|dd{)msqi1*N+2{1^^H1y9SM)6Tzf1VS)2{W}^{{q5qFq1M zuBWwY$FthSKhDP8cz7P;i`qrbeWu3hIX$;t&)ucx9@cY@=(%U~+%7%$qMq0D59}V^jfoCYr6yP5@{N+X&_Dm@ft|bK%xfh8c5PW zvIbH#kg9<+4Ww%zLj#!_$kITz268o!r-6J86lkDO14SAr)j*jB$~91-fl3Y3XuzQX zrv_XasMUa519cjx*Fb{?8a2?Qfo2VOHPE7gRt>aiz^8$B4RmOrQv+QZ=+;0$13?Xh zG|;Poum<`yXwzVv2IDoDput29+BKM@!DJ1lXfRcSX&Ow|V1@=WHJGKrYz^jVFkgcO z8Z6Xckp_!3SfasF4VG!JT!R%FtkhtY2CFq#qd|uTof>p$uvUX^4SF zgH0N2)}U8|EgEdqV4DVg8tl|ymj=5v*rUOK27?+5X|Pv=VX=sRC_zJs8nSCBNkhpR zO3_fdhB7pisi7XvnXj zfY^yJ6w*+ycu7&Mev_r&wZ&`6?2>>5eZNU}y!G?J>3 zG>xQdBts*a8p+Z~wnlO^lBjKqG}3Dbh%>MoKhNs*y5{lxw6)Bh?ylYQ&?F zI*rt8q(LK%8fnre8rNqaLyTf3!iPjT&vzs8=il8g0|4PowP`?a*kaM!PiX*JwbaAsvX*fp{HA z(1BDPNYjCI9mvpuOdZJ4fm|KP(}6-AsMLXK9SG<^?{lJ)Ll1`z5=y?tk~Eg0u~dzv zX)HrySsKgMSdPYWHI}C_JQQfGP-8_JE7n+v#!59-rm=F3RcNeIV^tcf)>w_k92#?J ztX5-gjd?Uyr?GnR(i>~kSd+$@HRjb=i^f_t)}}F^#@aR3p|MVlb!n_yV?7%4Yb>C# zpvFQP>*aqv+yxKABk(jl3%lS&jfFMVr?Gw=vguHq4#n$Gf(|9>kX?t8bSPPeQgkR) zhthN?U57GsC{u^Bbf`jyD)pvKZ^r4(1ihK8H#78Rrrylbo27a)pf|%hoTkGWI-H}! zxjLM$!^JvWp~ICrTqSno9PZJ0oW|odo}=+xjpu1RU*km@FV=XO#>+Kcq47$MS82Rj z<4%pcG+wK5x5hmhuhV#=#+x+WtZ}c#TQuIP@ivY7G~TZ94vlweyi4PLjR$nZ{dT;L zbRKaVrO7B>qjc*?AIi}j{@?B~ir*-`C`T)FG?#y4$bGwCNAvhMhJ5~wEa2bB!nXrP zL04py5~Gy9-DwnbWkxCIKO#*y75p1RrH)n^Sz{E3QJgyJGO~6`!LP^Y>Wos)zmW}} zDM#ZX`lIoPh~DTL`8V$Eq~#rl-f`+3m)@z>J8r$>;s2dFqcj`EYm^qF_>9tyGGRAL zl2MY4l46upqoko6Hx9>*!*SzqJjW=7GC|f&B)n5^!l|1`z4Z%5&s?jV7Lr=6;P4#|eAnGK2{PVJ@7oN3KAa;7yboO_Um^Y_V(N-(1&Z zLTfUiHJQ+wjMpaPwaIvGGMD44*@TT^?2ItB6Nw9q63H_BN0NmS@i}#SOjAD1RYpf3 zku;)2jOLD{)`V(=@iC=GM=Dy(ty|1hk!*-$LL>`X^-dfAjGvg;``+o8NSR1P6R|j+ zVWK3Hqz?W~7$_zbgbBW5N*{?yhq({Rlzk-d4ijdFi5g1e>ZwFwnzAIZ9TEQ~!W58B z;|V2VG+}p|)JzHE!`!0N1lk#~oq9s(r%Z&_X+rBXmv`}RLO}UUL~Kn6T_%Js@;hLz zLt`$Ahy{^HK&f4TTjE3-nu!#mnE08;<9WtocHR7&$akBYp%@1vjFY)1 zin*@a+`2oWi}*0^RL>rBRgbYo`Aq+rDdNNU?J+t(|HdJ}zquVs#5fgM%u}XFe{*fn_(l0&fnQ@XK8)WWx(OR~7bPpAOynAo9Z{zAIYw`W;SdoePZM$c z6FiF7C@m3%drZWArkHT!xC08hc>ayg_=!Yv7lBWdDJv5IC=t(}DG|?8ku@*ANVzf(M#kN4?qn)~QH)O`2m>+Y9Jyg6-6&K3Oxec+;+@th zjR|Y&DpRA4e#(ofZc~XSoe<$CPkRi3YqiO=(-Fgb$leo%A?PdI-bgoD!`OQJ1chbs0KYUv#n_$NJ(a zQ86W&rbPRc@JB?m<76{Fo1GEiP0>j&VR#9nr9dZJ(*ScTW*=@HHb@4e$E^dmE@YRH z_~|Kw3aErCoeX52^tbC|u<2y5bV@jMGL(5Th-EL=gl#ALx)AXv`+5)=QzCat6itcp zDd9UA=Atly#3TBMN?%Uo^1i%?m%fTA;g}MhDbX|~+W&J+`_!c!|FP&W7LzuQCKD>e zs5itxJS0FO*dYm$Aq7$)4bmY4G9e4HAqR3H4+@|V%D{wO2~|)74se1C>YxFdpc%Z- z0izdo8$u^wD{iMZj?74@ZIQhAcpJ(Xj zUVfgTpZocFhJNnp=NbBWhJMoQ=l*`;>?hrRo}-_8`nji{y!Vp^KhM=)4#dGv9Q@>s zpEUV-CVukB&olA!O#D0(Kk@VPO#D0(e?9O_{N$mZXW}Ox{p6#cXX59X_{mQ{`9j^9 z^pl@{^3zX#`Y8wg9`Hi|f-uDX^an_Pfb<7Qe}MD{NPmFz2S|T_ z^an_Pfb<7Qe}MD{NPmFzb4=G{fb<7Qe}MD{NIyr)Oa@4Qkn{&he~|PCNq>;^2T6aB z^an|Qkn{&he~|PCNq>;^2T6aB^an|Qkn{&he~|PCNq>;^2T6aB^an|Qko5CHn+%f9 zAn6Q}&LHUwlFlIM43f?u=?s$2An6Q}&LHUwlFlIM43f?u=?s$2AnD|FI2k0JLDCr{ zok7wWB%MLh86=%S(itS3LDCr{ok7wWB%LAB86uq_(itM1A<`Klogva0BAp@986uq_ z(itM1A<`Klogva0BAp@986uq_(itM1A<`KlouU6r)qP;Mk!4Yv#T?HfZluW zZPHgI-2{3PRib_84x%@dS$}u0m2Zg#48#U#fQb7cIUhJ5I3GA4+|Lj0=Lh%mgZufx z{rup5e&B!Lf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-Mf8c-M zf8c-Mf8>1ReB^xOeB^xOeB^xOeB^xOeB|_J_0j$M=ze{4zdpKOAKkA{yidGOyidGOyidGOoKKujoKKujoKKuj zd{2B&d{2B&d{2C`hW|XdAD`t*o>`t*o>`t*o>`vVXU{y(?z3m6XZP7N*R$`*XSQdy zXSQdyXSQdyXSQdyXSQdyXSQdyXSQdyXSQdyXSQdyXZPJR-?RJfnemy?1C;+fb3VJ@ zo>`w+Ul?B)Ul?EbUie=4Uf5o^URYjOURYl2|BL;9vHvgj|Hb~l*w+{P`eI*S?B|R9 ze6gP|_VdMlzSz$f`}txYU+m9|{duuJFZSof{=C?q7yI(^*!k}tlh%wiYhC&8pMU;; zZ~gtw?SKEc_1XSEzuy`->il=em-{pSUw`KRJ7hCM%hrl@-uds(F>BnK==^th!1`|e zv_`B^%hwJ2y5U`G-#YQ%KmYvy-unBU^ZyRp=y3m^-*1h4>-=}bCP!>?#3n~Jt!?Yf za>rd;_y0Ps1$iJ+AcmCV`!}9fAU*G*<4O_O+J+*v&*VlJ_ea|*~TbAwicC2H| zKK5L$=W?@H^DeW{>-=}$cJ_xl|2BnKuqLf3 zYucLaum9_p7X{_1EY>Z@u-_{`c1T|33RZGV{;J-scwn_txs^KW{l{{lB+X z{`qWw>+{w^=hruT^Ua=ovlrj&!8d#G&F8)?TFcg|wQg-%+t!|SU>#Y1T9?+fbz|LG z_tw9yzpRH}-+;e8T2Gx{1H-=t+y(=#!N5&_-sabUEf2V@25fm?``6&quffM(gKn_? zJ14&e=l=QZufgkIgUdep_h($@#9KSoZh!q6?Da1(V1N4WxUAcLz@855_t&q1BOmmZ z{T?{%uU~GHfs5Z)=AQAss~vEC_SESAAi4d;R7yjf8pFL{VVdm4bhBwr+;((zt{QC zTQq;)Snk+?%l>5zJ^UKDvicUc^56gbzL&6EU-$RN`&(V}){S-BU;Qiny#v4B`F*ea zzUn^Ecgle8l!5!-x5Php-M@UGe_g(H1|ItB_wCldEB}6L*js&N|9k1#dg-wC*Z=mi zf42_~ScBGg>xVUD{p_#b*Vy0fe}seX`oWRkw{D*>*!?v)?w_7DYPp4ezd!!#pPPJe z!bc{4-<17rPxjYuko)(En1Yz@xz*WVYO_2%p^-{!s(`}@=1 zfcN|N*WYisl?VNt{{9j8-}hO6hy6(G{Jz#cWbXH8`WKz=uV41mO*(M!_f{5s%#ZQl zVt=~>FTcO4zx73*{`>avV>r0<`_}8Hp(??{D?*xBC0C+F!q~qrZ=9{q_4+^WF0M1HbRqzQvEk;Cla4eYbDVzu)=2 zwe?^9TfKjad&{@*#ouq)72h&^{p;E2ufMeLw=(S`i(N^H;Ug+{arY)4y_}rf7Pzn zZvS(CdZ+*K{#UxaXKnEBx7=Xvs{V)jyL0;Y4*I8m`}j%wd&~WsqmOpW^tb3OKdys* zCvXB}>&$I&?#}$vx74K{lpFp#-*%p#{PLXR zm&d}tJf?W{_wMzNMeE<4*Kb3e*Ma%Yt6TNeafw$C1YU;^I!$NMw$yo@ zn6vyl+3UPcopfGjt+_R8%kuWZf@OP){-?sr&z;wmpPkozUvseEc|AI}?mDl>SDn|> z>(1-Nc<1%Hzs{_GbzW~>zvuR^p4*!tv_`D1<+?n7fAyUF)pP7u&#zxSpMLfH`PFmg zSI?JUJzswH9QoDr<5$m(Up;>{d)RsP9P!n2!&lD@O(9yd7L(_4ub!v9dhYe=dDg4v zPOqNdSZ~wb^mOFaQ;Js)rC&Ydef9A6)x+6W4B-G8@8?CLF=dG>xb7Y*Eei`haWm` zBSV(!8M(4-#{;Xk?uxZzU0XMux1O)<`FPL1^lZE5`g^wB`?qB~qimzLV@&#O)OC#V zkN(TD{Zan0Z`OCq<;HAhY|3&wjJe#H{T;JEV}Dw%Z_IYR((T)ruN!w8jNf(MChY&j zu+_B~CpIkmH{mv%a63#~SeKTspR^y7E<4FI>AslcowP5L_I1kkrhIP7ZSKeIZF<{s zT{Es@cCzy}=kjwi7RQ|J&)esD_IbDYyzR`pu6frn@A3<7*9F^Ku#H96xx}<&yGw4b z6}Q2P+i=BouDah=y}xRoR~grPowto~%jGuh^A%YJR!&bIq@+jZ?M zTE2G2b?=;8wz1>#JMODp*T3t!cik?#_IcOW`<3vvYoB&~Z|xqupJD8}dG}EE(D&@- zo|}HpMfPm|z~&EJ)uG)z^d(0=ezf0tJ9antP4aeXS5BWhZ)a|vGYl`P_jYbq&)t>h z-oJ3yT-e5iuf4E)7q)rf=Du+E{b|?!bd&yRn}7P)pFVcEX?g$Bx6q}#@6y*?x(Tk_ z6b?MU-MFc5ZS&Uk-6G%m{Ox1s?GE|wug=@O{k-3@Y~$Y7-`k&iUw`jzytkizE4=v) z@aD+VnMK&4HLV2TE{c_vq){ zi;`xja!@6q4ihieaQQs)xG=WtaIze^0|?oHDj%H-n(b`Fn4|^6{R1={;J{o%d1uH2TzeAG4h? zmmPEcV-KD8@j=V}j30I0C+z3M56iwxGEHuG-lzDcd~WLAvYqK=%k|B;?5zFu{~vhw zQ};eMZ=G8I>b(1jd!JvhmMq(v|I>0?E!g&g>+%!#zOZBMb>0`p->>Y(qTOBesl~JY z@9p;`3O|SM%jC;bmc3c_{>qSLYb*V?UDK*t$H?jXs>}NseP4A8t-94$>DJt`Yc{)P z&(_?cYp!P99<00E#D-ny|IbAP{gJx>3;-~0HzeQ|93-LdR<$8_HvkA45k z4ef8~yW^7YjzzvZ?)dJQ;=AK>?~cd4`_XxKtnJ+~rgz6u-d|3v=g!9;E7rR8@6N}! zu61JlY28~-oezIlKRg-x@VM$@(A&5`S7j&;amM< z_|Q7GY-{+|y0advm(GWy5FaBhH)2~OF6&5yGNAL}NW+I8g%3v-J{(c_=y~6dz{jZV z_`~!u_E+a){JZ7r$6fb$&)T!DIv*4E-yffkNgwyE|1sqEeXtHJANPasvHH!j@BTo2tnse7oIg?@ z>n^k5@*B4A8~$U2Iv+bOx8v;{`?h;*xow>A?0^4%_)GJ#$GT^~yxfY$ z!g3vZ_I=O3?7PnWG0QghCoKE2|1Zne9@wtGN*@QV>(Fg^FM9p=VRw{*!CUB`*Z;B zbJ*qlTzn31bUqz%`*g(Z(}TWGN7+7o?|*h}yE|pAS(lb=be}q(y*+Ec^Ev9{qpoN4 zuJbu&U&m%F?~l8_ahIF0e;%NHdVuyhv1%PyuEPVf&q<%3^7*NS&gYElp7F66+n=$I zv-6$Lx!um^!kFbY@nGt6(e@W@caeL^$CoB8+g!3u-`k%{C!J5f+&`DSy|Uf;TphRg z*L-}9W!*N`ZEItz^SS9Z@JssB1FX+2`?lq_-Ll_11J;CPpLW(Qm)&t&?D)D}*SF{E z_UzZ5+t`u9&waPgzI{EgZwJ2q(DsgP>)366Y(IT3exCT;iOZbW_Y>PX@wKP6b84T? z-2NUpeV)6G{=DyeUfQP9?Vp!(mdjqbeXs1p71I^RwXeIjk50vZIu-wU?e@O$uiw?5 zxAyDS_lDoipLcG@JGZ0X&7b#f?|UD29_rJ>nNR0>KAq3_bPnUw`G`;FAwE4^`E*SE z)3NkV$H70HoBwp|`|FP#>tCHOzkk0RE&Td+Wj%Jj1}3a2Yqs;{0Lz!70AHThehof# zzMKR9`o7Zn@_YEplbbK+pT9go`EqXf>*tU)YWez~F6Uv#*Rbt-dH1go+wCn_i=D60 z@13tP*E#X;&e!Bz=WFi1^ELm6)w9kl?=P-e=bf)5AM^e2rLFv0vaKcCS#~|k_I=rQ zmtA(*$CqtuWzgETd~W5?^7$2?U-kYf+uE>Yzx;v!@(2EF&1KiPH(ZC`dtV!ymh1Ps z?`z|w^X2>KYx9TY^PBd2(`7dwI$s{JfBF0Vwc~3%Q2*Moot+EIzU}(BN8ew*Z@%{3 zF8c@8Y3J);z;YWNJa@j1+-4pFe4W_liLZA){OfejvW-*wX>#N1Y}xYurQ79lq4VW$ z_t&-C|JL^I+^+Y3b-tXJ{&N2G%fs6*=P$qhvMuK+zy7k1&KrI?C-~)@;FrJ2U(UCE z`F|*Ud1(9P9NU+3XkY$De|dQO^=Mz68~gIT^5wkLm-A3x&N+QKfAi)1&6jgAU(UgN zIS12L4?1n_qSH?AJMC<`qmZ+{Sp(Lf_1*em4Ou^}VQa+dT0Lvj8nec&32V}tvijSa zv1Y9~Yu;M07Of>~*;=tytu<@i+OX`?pIg?pwPWpCd)B^XU;jL`j;v$r#5%RktaFRw z&p)k8>&m*eZme7D&SLZ|rv3SE>o4oUdbFObXX~ZY+&}G-|C0Zb|C0Zb-_Lrx{!9K#{!4y; z>)Iv1XISl$-?OadSysE`_ZYBU@?Y{_@?Y{_@?Y|M7S=BLFZn$iYnS}~X0}Uy|C^U~ z$$!c3_f)&&zvTD#wE17)v`hX={!9KV{ww|~{ww|~{ww|~{wscuLE9Do75^3g6~AYG z?TY`3|BC;L-{0wW#ecw!cl>w!cl>w!cl>w! zcl>w!cl>w!cl>w!cl>w!j$5}o{yY9V{yY9V{yY9V{yY9Vevi@H9seEw9seEw9seEw z9seEw9seEw9seEw9seEw9seEw9lxVZ&Eu-(Z)>~bzvI8-zvI8-cOW6jaAcF%v$?LiJ^www=Q-`3|DOMz|DNCT{^s#UyXU{>cjT=(^49M89k*|e+c(dR znq&9POkOjS*UaQKGkMKSUh|x(IiBAf&u?b(nq&ISOkOjS*UaQKGkMKSUNe)|{>5)5 zubIhfX7ZYuyyp0SbKIes$!q`OH}Ui*vR`JLu5aC5k~nbm7%^_p3|_7}hBNX@yh zW>&8`2h_~!HHUdd(r?W>&8`Z`RD}HOIJ`$JNcOUNfuL%<46> zdd)ecW>&A6)oUJeH?w-ptX?y#*BldTX7!pw&CPSO=J;51zNvX!-aIaEX7!p`z2@&A6)oW(;npwSORdd;j} zGppA;cW4j%5Bv}Oo<}s#BiaMM8NTMES$pI+!`B=(Z=PQ?GknbqUo*qk9{J7iHHXui z8NOzQuX+B_JpX8B_?i=L%?w}jT%*v#-X&q>-NzgfO!mam!RYtCObvwY1gUwh<# zr`kI-(W~Q&1>1&=lH8Xw9 zOkaEAf8u}QH`~`dr)p;Vn%TZ)wy!<$Kk+-Q&>TBzo^Lg$7n&KrX2!2Q@tg5$PyFWm z+7rK7zxKp$-mg9JoB3-_R5qt1n&$1d9ozxjdo!tWSP53lCvm}Z<5&mV);tr4qh^{i2A%o?{Q ztVwIinzm-FS!>Rkw-&5LYsp%+R;*QP&04oMtW9gn+O~GAU2D(Uw+^gB>&QB`POMYw z%sRI&tUs+w>&m*eZme7D&bqh$W&PXw%X+XLttac*;{V3~jsF|}H-4{TlW+Wv>E;{% zH~w$@j5F(DYo{$ffn zrUc{EWK0UixtM(8H!T>$4*ACK_+Xr$iDT4pj5-7SPGe?(e}I30e}I30e}I30-*N1i zD~w~=8Q>q_AK-VKF$4ULHD-W+fZs944Dg#t$pHTVzjHw`iOFc<@aF&vly{sI00{sDd?b{XJzd_CsoVmK~_<6{0T=HFtP zDrT!<TMU{Db_4+G40JgZxhG#`&ulV~hE<7+lLB{~*8P zsu|=rq!x2&G07FvXBp%-RTkr78RU0bIL?2?fJsbs#pqTHY{fCz7_W-asW@#MlTk6l z6O&Of1r<|JF_03oPch>ZvrjS56a!5$$dy5U(@Qb46vuXBW+`TtGRQy3Kge%(DQ1^q zb}43;VsW6eCO9yMOPK4v!koZcYa+}bXCz+MOPJF zRdiKxOgXx$eCO9zMO&5c{NMS%^LwqnXszP7bM#ixTSae`@BEspI5(K@{FI@BH8S4dz6L6&+S|SkYm{yi>mOf9Ka@Ziyka`u>9cv!T*E*2mcTLAN)oGq7#cwEIP3mG>So^=)_{!D8>e2*eH(E z$G}l^Vli|S=TM^)i%u*$vFOC26N^qPIXax16CR0H)NF|{vrM${vrM${vrM${vrOK{6G1B^8e)j$^Vnzxsd4O zqLYh3t^DNw$?yC~4Bp4N>o`vm=da^jNq+L{=<<{QC;v}=O@v|`G4{o_KLhnPGkdZ5ifwUNgh|!~Dbi z!~9}}4D$;XVvaFp0%G_r<^p0aAjAB_{KjH3%s|!`D&K1YG;*9W*@H>we|9kq3@Q?7D zpNLW;UH&e=Vk3%;bosmd%8eK>OqbvJ=5+bH{9XPof0y4m=ydtJ{9S(Y80qqN`MdmG z{w}{E#dP_duTGc0%irbi@^|?SFs94jsy@A3Ee4N<1YZ^SA+{vLmi zzsKL>@A3Eed;C5A9={Q+^!R)H&X=ag-{bG`_xOAKJ$^%(>G3OIqJW74CJGp@Qfcu! zf1e(IkH5#SjEOQP%9to)(&O*(_xOAKJ^mhlkKdqYdi*{99>1Z_82U_)-)u~J{6@sm z;~(WWC!A6KQGVxkV>mR1Lt}(2qx_@%qx_@%qx_@%qx_@%qx_@%qx_@%qx^Oe~f>O-*{aN$Hsg`%vZ#aY>eAwjDL)OjNklq49;eZ-|%d_E^)^A z$N0zi$N0zi&1Phbe~f>Oe~jPQUJTo2jDL)OjDL)OjDL*Z5N^i!$M_B7W{lt5Q^xqm z_{aGD@62NkD#rgZ#&1|RWBdkoGtNKGZ*Vt817lVyQ>Z_XwY{ASN$C^{4T6Z{kW2Bc#YGlrxy!EYWX6Z{kW6Z{6JGr>Q>Kf!NS zClmbUbuz(km^u^u6Z{kW6Z{kW<}@yD;B){3FO!80iPw`LjPw`LjPw`LjPw|@v%oM*ls7&!s@lWwj@lWwj@lWwj@f(}X z6#o?e6#o?e6#o?e6#o?e6uQ~Xo>Q~XAFGsQo}KgB=AKgB=AZ?rd4{L}o?{L}o?{L}pA z9y859%|Fd=1~Sw9)BJ{YGtEEEKg~bQKg~bQKg~bQKg~bQKh1A;GSmFVh%?PU%|Fd= zm_O6})BMx?2KzJ3Kg~bQKg~bQKg~bQKg~bQKg~bQKg~bQKg~bQKg~bQKg~bQKg~bQ zKh3Yv$uz%CC)4}_f=u&I^Uv_l@Xzqi@Edc^4F3%O4F3%O48Qr$%<#|f&+yOi&+yOi z&+yOi&+yOi&+yOi&+yOi&+yOi&+wZe%?$qx{|x^O{|x^O{|x^O{|vu5)6DSC@Xzqi z@Xzqi@Xzp@$;%A?48PgD%!4yrFthyTWHZY@%RkFM z%RkFM%Wr-*v;4FCv;4FC=4ms_Kg&PMZ?-nG{ImR;rOfir^3U?m@|(NOEWg>?n7xhh z^vv?l@|(lWEdMP3EWerD%<|9jo6XHE{~Z4u{~Z4u{~Z4u{~Z4u{~Z4u{~Z4u{~Z4u z{~Z4u{~Z4u{~Z4u{~W&<~w^Uw2}@5?;@JpVlZJpVlZJioc-%=6Fl&-2goO9C>_ zFAK;#|2+RZzc45B{PXoz`wvRTgU?c0{;U40{;U4 z0{;U40{;U40{;U40{;U40{;U40{;U40{;U40{;U40{;U40{;U40{;U40>7E?NF=hr zzrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1zrer1 zzrer9zsSGHzsSGHzsPT{EPgH{EPgH{EPgH{EPhN*t5uQ{x_PAEb=e%FY+() zFY+()FY+()n+uJ(_bl?8eUAnti~NiHi~NiHi~NiHi~NiHi~NiHi~NiHi~MFwv&g^5 zzsSGHzsSGHuVKt0|04e)|04e)|04e)|04e)zgguh@h|b4!Os%^68{qa68{qa62CYl zOZ;Y8v&3(nHB0e{|f&K{|f&K{|dkG zC@cIc{44w`{44w`{44w`{44w`{44w`{44w`{44w`{44w`{44w`{44w`{44w`{44w` z{44zCo@4GgEBq_`EBu;|n1jv={|f&K{|f&KzZN7b{Nkvr@UQT%@asde!f$puEBvCV ztnjb!ukf$%ukf$%ukZ`1q9@5J|0@3~|0@3~|0=)ODq^dw@~`r*@~`rn-_I)lD*r0~ zD*r0~D*r0Kh%y4Ltn#n&ukwqvvdX{8uV0CNC9C`buB`H#^Uf;&D*r0KS>Jc{N}i`#=pkD#=pkD#=pkD z#=pkD#;>o)8owaU|Nq3|U*linU+34VXPtkYf1O`67d=MS`Pcc^`Pcb1Mp@@y=U?Yv z=U?Yv=U?Yv=U?Yv=U?X+-esMCoqwHwoqwHwoqwHwoqwHwoqwHwoqwHwoqwHQ&y;ok zb^dk!b^dk!b^dk!b^dk!b^dk!b^dk!b^dk!b^dk!b$)G0*7?`@*ZJ4^*ZDX2H~2UB zH~2UBH~2UBH~2UBH~2UBbq3ks-{2PnXM=x(e}jL6e}jL6e}jL6e}jL6e}jL6U!#x> z{tf;O{tbT3LNqVYY-NLggMWj6gMWj6gMWj6gMWj6gMWj6gMWj6gMWj6gJ1U$k#aWp zH~2UBH~2UBH~2UBH~2UBH~2UBH~DoE+2r5k-{jX!WRriBf0KWcf0JKFkxhOPbT;`n z`8WAD`8WAD`8WAD`8WB+meE~glYf(clYf(clYf(6(P5w>(P5w>(P5w=O z-9|R~H~BaDH~BaDy}m*=`32b#WM`9qlYf(clYf(clYf(clVAIhP5w>(P5w=OjZwDv zxA?dCxA?dCxA?dCxA?dCxA?dC^+?&`*Vbi=e~W*MUq6~H{w@A3{w@A3{w@A3exZ1_ z__z4C_%$fm;@381i+_t>hs_;>hs_;>hs_;>hs_;>hs_;>g-;Me!%fd7F1fd7F1fd7F1fM4UB1O5a41O5a41O5a41AdKZ z4)_oF5BLxG5BLxG5BLxG5BLxG5BLxG5BLxG5BLxGHL5w_Kj1&$Kj1&$Kj1&$Kj1&$ zKj1&$Kj1&$*CpnF|A7C1|A7C1|A7C1|A7C1|A7C1|A1e|m;-)IY!3Oowm}a05BU%I z5BU%I5BU%I5BU%I5BU%I5BU%I5BU%I5Ba?|LJs*4`49OI`49OI`49OI`SsyBLxklKjc5;Kjc5;KjioNNjc;{ zBmN`)BmN`)BmN`)BmN`)BYwSoj`)xGkNA)HkN7qCIpRO!KjJ^)KjJ^;KjuH?KjuH? zKjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH? zKjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH?KjuH? zKjuH?KjuH?KjuH?KjuH-KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;& zKjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;&KjA;& zKjA;&KjA;&KjA;&KjA;&KjA;&KjGK3$7?O*g#U#9g#UzJ=bjV(Q+~aBPWeyyy`D)< z`A_*z`Msi0PWiq5cTV}e26#^Sy$*Oz`A_*z`8Civ3+GaWBKjlB= zKjlB=Kjrrt5b@e6Ipsg)Kjqg_=am1H->Vtrl>e0fl>e0fl>d}pZ=F;AQ-0lbyf#@* z`A_*z`8Cct<=67(l;3NXN1Gk5TOO~FlvDmQ{xkkF{xkkF{xkkF{xkkF{xkkFelNiv zFTtNP{xkkF{xkkFel2y*_`L>9&iK#x&-lHDK+gE};5p+z<3Hm+<3Hm+<3Ho~x-mK9 zKjS~+_o_`f<3Hm+<3Hm+<3Hm+<3Hoqm*(Atj-)qq1jQ@=P zjQ@=PjQ@<^>l@^p|D6Av|D4||Mdh6Tod2BvoZo98MbtkN!XU|LFgt|BwDZ`v2(vqyLZo zKl=aZ|D*qp{y+Nv=>MbtkN)5PQ`h3x|408H{eSfT(f>#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz

#PAN_yy|Iz#PAN_yy|Iz#PAN_yy z|Iz#PAN_yy|Iz#PAN_yy|Iz#P zAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz

#PAN_yy|Iz#PAN_yy|Iz#PAN_yy z|Iz#PAN_yy|Iz#PAN_yy|Iz#P zAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz

#PAN_yy|Iz#PAN_yy|Iz#PAN_yy z|Iz#PAN_yy|Iz#PAN_yy|Iz#P zAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz

#PAN_yy|Iz#PAN_yy|Iz#PAN_yy z|Iz#PAN_yy|Iz#PAN_yy|Iz#P zAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz#PAN_yy|Iz

#PAN_yy|Iz#PAN_yy|Iz#PAN_yy z|Iz#PAN_yy|IzHkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L z{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0 z`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2 z>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9 zrT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Z zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>Hkar zU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Je zf9d~A|6lt5(*KwKzx4m5|1bT2>HkarUjzL5|I+`L{=fA9rT?!1e*J&x|4aX0`v21Z zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT24f5;% zOaEW`|I+`L{=WwK_5Y>+Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9 zrT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Z zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>Hkar zU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Je zf9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%> z|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j z|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A z|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g z{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1 z^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5 z(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8I zOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&* zFa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwK zzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW` z|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y% z|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5 z|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L z{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0 z`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2 z>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9 z)#dNgUH&eAm%q#3@A3Eed;C5A z9)FL&$KT`c@%Q+9{5}32e~-V%-{bG`_xOAKJ^mhlkH5#?+Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8I zOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&* zFa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwK zzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW` z|I+`L{=dff$N0zi$N0zi$N0zi$N0zi$N2UCrT?!n{xSYB{xN?2f9d~A|6lt5(*KwK zzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW` z|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y% z|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5 z|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L z{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0 z`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2 z>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9 zrT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Z zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>Hkar zU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Je zf9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%> z|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j z|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A z|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g z{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1 z^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8IOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5 z(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&*Fa3Y%|4aX0`v21Zm;S%>|E2#g{eS8I zOaEW`|I+`L{=fA9rT;Jef9d~A|6lt5(*KwKzx4m5|1bT2>HkarU;6*j|Cj#1^#7&* zFa3Y%|4aX0`v21Zm;S%>|MmY=aW}hlEC>}w`!RT45ItzB27nL@KqS|Rb8}q9&t(Kc zJP`V}RFA+)!-Zzxice&nct?qSTvgxF|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D z|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ z^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ z|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I* z>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq z|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq z)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ z|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJ zr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c z|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUc zPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>? z|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm? zpZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v) z{y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6( zKmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp z{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7n zfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH z`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D z|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ z^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ z|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I* z>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq z|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq z)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ z|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJ zr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c z|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUc zPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>? z|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm? zpZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v) z{y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6( zKmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp z{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7n zfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH z`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D z|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ z^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ z|LOnJ|EK>?|DXOp{eSxZ^#AGq)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I* z>HpLJr~gm?pZ-7nfBOIQ|LOnJ|M!#qKiRMUPye6(KmC9D|MdUq|I`1c|4;v){y+VH z`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#8rt|7O4bKmC9D|MdUq z|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBOIQ|LOnJ|EK>?|DXOp{eSxZ^#AGq z)BmUcPye6(KmC9D|MdUq|I`1c|4;v){y+VH`v3I*>HpLJr~gm?pZ-7nfBJuKzyG#` z0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K; z2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu z0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx z5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S z1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rX zAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv z3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L& zKp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST z7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhl zfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuw zFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp229 z0AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPU zVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I z0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy z!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a z0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1Da zgaHTx5C))^{V)4p_P^}M0E7YPW&g|mm;EpMU-rN3f7$=C|7HKn{+InP`(O6I?0?z+ zvj1iO%YF<%7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K; z2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu z0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx z5C)(R`#Zq;3_uuwFaTiy`mq1Q{tx>ZsU z5BoptzwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4 zzwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4zwE#4#{h%@=(7K^|FZwG9|I5u zAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaPQb|F-|O|F-|O|F-|O|F-|O|F$0k z5C)*z{@ecB{@Z>GKp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXpxgf2{@ecB z{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB{@ecB z{@ecBehfeufG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu z0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx z5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S z1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rX zAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv z3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST7=SPUVF1DagaHTx5C$L& zKp2290AT>a0E7Vu0}uuv3_uuwFaTiy!T^K;2m=rXAPhhlfG_}I0Kx!-0SE&S1|SST z7=SPUVF1DagaHTx5C$L&Kp2290AT>a0E7Vu0}uuv41l-)R9i;_hz1Z1AR0h4fM@{G z0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLaw zq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V z0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0Q$v#8bCCFXaLawq5(t$hz1Z1 zAR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ( z8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2 zKs1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4Immo zG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c2GEaxpEQ7I0MP)V0Yn3c1`rJ(8bCCF zXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks118 z0MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT z(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G z0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLaw zq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V z0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?W zL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz z1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$ zhz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c z1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh z5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC? z4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1 zAR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ( z8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2 zKs1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4Immo zG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4 zfM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCF zXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks118 z0MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT z(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G z0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLaw zq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V z0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?W zL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz z1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$ zhz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c z1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh z5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC? z4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1 zAR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2Ks1180MP)V0Yn3c1`rJ( z8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4ImmoG=OLT(Ey?WL<5Kh5Dg$2 zKs1180MP)V0Yn3c1`rJ(8bCCFXaLawq5(t$hz1Z1AR0h4fM@{G0HOgz1BeC?4Immo zG=OLT(Ey?WL<5Kh5Dj27fYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=F zfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfP zU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR z7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|n zMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y z(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifp zG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C z4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IaJvHdiF(EvsR7!6=FfYAU(0~ifp zG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e7Y zpV&_W7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifp zG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C z4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU( z0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy z07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=F zfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfP zU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR z7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|n zMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y z(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifp zG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C z4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU( z0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy z07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=F zfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfP zU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR z7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|n zMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y z(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifp zG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C z4PZ2Y(EvsR7!6=FfYAU(0~ifpG=R|nMgtfPU^IZy07e5C4PZ2Y(EvsR7!6=FfYAU( z0~ifpG=R|nMgtfPU^IZy0QNikX#k@Ej0P|oz-R!Y0gMJP8o+1(qXCQtFdD#U0HXnn z1~3}HXaJ)Dj0P|oz-R!Y0gMJP8o+1(qXCQtFdD#U0HXnn2CzS~p9U}*z-R!Y0gMJP z8o+1(qXCQtFdD#U0HXnn1~3}HXaJ)Dj0P|oz-R!Y0gMJP8o+1(qXCQtFdD#U0HXnn z1~3}HXaJ)Dj0P|oz-R!Y0gMJP8o+1(qXCQtFdD#U0HXnn1~3}HXaJ)Dj0Uj7{=@#m z{=xcaMz=wQtiR<~~^{<2H`sCH*^U1fR=aT~e z`6Lg1K8f6(Pim0ov!4T=&%XFQpZ(8&KC2O*&yt$wvzu_A&+Za*c>m4wlWR5(fA8Qd z*z=PwKhIB!c8A|Nh<2WzR07YN>-W!_lcCR>vrf;OQ#B4gmp3Op9RA9|kCo4xufflo z4EuQ#dp&RJq37)vf7Nep*nQsI>GZt0b>(?;|Ht#i3G3&JGm+01r)HinUMoCb{O5bV zFae$~?!H7KV4wdKKWtHdMHG=2e$@=HrnbzmsslDgj&s)#C zA1I!8uOOdyu!Mt8?fvq6?>|1ZceiOj@9wF6-rZF6yt|9!`Q|e2^Ub-Q=bLZ7&o|Ba z^G%Wad~+ZA^UV!j&o_4+JwJQ7=io2wXJ3e(pG6kW&u+JUzPon#e0O2z`R@L6& literal 0 HcmV?d00001 diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_gpt.py b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_gpt.py new file mode 100644 index 0000000000..b9583fa832 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_gpt.py @@ -0,0 +1,4293 @@ +import base64, collections, copy, fcntl, glob, io, lzma, math, os +from pathlib import Path +import random, re, subprocess, sys, time, uuid, numpy as np, sentencepiece as spm, torch, torch.distributed as dist, torch.nn.functional as F +from torch import Tensor, nn +from flash_attn_interface import ( + flash_attn_func as flash_attn_3_func, + flash_attn_varlen_func, +) +from concurrent.futures import ThreadPoolExecutor +import triton +import triton.language as tl +from triton.tools.tensor_descriptor import TensorDescriptor + + +# ===== Fused softcapped cross-entropy (Triton) — training-only path ===== +# Replaces the eager +# logits_softcap = softcap * tanh(logits / softcap) +# F.cross_entropy(logits_softcap.float(), targets, reduction="mean") +# sequence with a single fused kernel that reads logits_proj once, applies +# softcap in-register, and computes (LSE, loss) in one streaming pass. The +# backward kernel mirrors the forward so there's no stored softcapped logits. +# Numerically identical to the eager path up to fp32 accumulation differences. +_FUSED_CE_LIBRARY = "pgsubmission1draft7fusedce" +_FUSED_CE_BLOCK_SIZE = 1024 +_FUSED_CE_NUM_WARPS = 4 + + +@triton.jit +def _softcapped_ce_fwd_kernel( + logits_ptr, losses_ptr, lse_ptr, targets_ptr, + stride_logits_n, stride_logits_v, + n_rows, n_cols, softcap, + block_size: tl.constexpr, +): + row_idx = tl.program_id(0).to(tl.int64) + logits_row_ptr = logits_ptr + row_idx * stride_logits_n + max_val = -float("inf") + sum_exp = 0.0 + A = 2.0 * softcap + inv_C = 2.0 / softcap + for off in range(0, n_cols, block_size): + cols = off + tl.arange(0, block_size) + mask = cols < n_cols + val = tl.load( + logits_row_ptr + cols * stride_logits_v, + mask=mask, other=-float("inf"), + ).to(tl.float32) + z = A * tl.sigmoid(val * inv_C) + z = tl.where(mask, z, -float("inf")) + curr_max = tl.max(z, axis=0) + new_max = tl.maximum(max_val, curr_max) + sum_exp = sum_exp * tl.exp(max_val - new_max) + tl.sum(tl.exp(z - new_max), axis=0) + max_val = new_max + lse = max_val + tl.log(sum_exp) + tl.store(lse_ptr + row_idx, lse) + target = tl.load(targets_ptr + row_idx).to(tl.int32) + target_val = tl.load(logits_row_ptr + target * stride_logits_v).to(tl.float32) + target_z = A * tl.sigmoid(target_val * inv_C) + tl.store(losses_ptr + row_idx, lse - target_z) + + +@triton.jit +def _softcapped_ce_bwd_kernel( + grad_logits_ptr, grad_losses_ptr, lse_ptr, logits_ptr, targets_ptr, + stride_logits_n, stride_logits_v, + stride_grad_n, stride_grad_v, + n_rows, n_cols, softcap, + block_size: tl.constexpr, +): + row_idx = tl.program_id(0).to(tl.int64) + logits_row_ptr = logits_ptr + row_idx * stride_logits_n + grad_row_ptr = grad_logits_ptr + row_idx * stride_grad_n + lse = tl.load(lse_ptr + row_idx) + grad_loss = tl.load(grad_losses_ptr + row_idx).to(tl.float32) + target = tl.load(targets_ptr + row_idx).to(tl.int32) + A = 2.0 * softcap + inv_C = 2.0 / softcap + dz_dx_scale = A * inv_C + for off in range(0, n_cols, block_size): + cols = off + tl.arange(0, block_size) + mask = cols < n_cols + val = tl.load( + logits_row_ptr + cols * stride_logits_v, + mask=mask, other=0.0, + ).to(tl.float32) + sigmoid_u = tl.sigmoid(val * inv_C) + z = A * sigmoid_u + probs = tl.exp(z - lse) + grad_z = grad_loss * (probs - tl.where(cols == target, 1.0, 0.0)) + grad_x = grad_z * (dz_dx_scale * sigmoid_u * (1.0 - sigmoid_u)) + tl.store(grad_row_ptr + cols * stride_grad_v, grad_x, mask=mask) + + +def _validate_softcapped_ce_inputs( + logits: Tensor, targets: Tensor, softcap: float, +) -> tuple[Tensor, Tensor]: + if logits.ndim != 2: + raise ValueError(f"Expected logits.ndim=2, got {logits.ndim}") + if targets.ndim != 1: + raise ValueError(f"Expected targets.ndim=1, got {targets.ndim}") + if logits.shape[0] != targets.shape[0]: + raise ValueError( + f"Expected matching rows, got logits={tuple(logits.shape)} targets={tuple(targets.shape)}" + ) + if not logits.is_cuda or not targets.is_cuda: + raise ValueError("softcapped_cross_entropy requires CUDA tensors") + if softcap <= 0.0: + raise ValueError(f"softcap must be positive, got {softcap}") + if logits.dtype not in (torch.float16, torch.bfloat16, torch.float32): + raise ValueError(f"Unsupported logits dtype: {logits.dtype}") + logits = logits.contiguous() + targets = targets.contiguous() + if targets.dtype != torch.int64: + targets = targets.to(dtype=torch.int64) + return logits, targets + + +@torch.library.custom_op(f"{_FUSED_CE_LIBRARY}::softcapped_ce", mutates_args=()) +def softcapped_ce_op(logits: Tensor, targets: Tensor, softcap: float) -> tuple[Tensor, Tensor]: + logits, targets = _validate_softcapped_ce_inputs(logits, targets, float(softcap)) + n_rows, n_cols = logits.shape + losses = torch.empty((n_rows,), device=logits.device, dtype=torch.float32) + lse = torch.empty((n_rows,), device=logits.device, dtype=torch.float32) + _softcapped_ce_fwd_kernel[(n_rows,)]( + logits, losses, lse, targets, + logits.stride(0), logits.stride(1), + n_rows, n_cols, float(softcap), + block_size=_FUSED_CE_BLOCK_SIZE, num_warps=_FUSED_CE_NUM_WARPS, + ) + return losses, lse + + +@softcapped_ce_op.register_fake +def _(logits: Tensor, targets: Tensor, softcap: float): + if logits.ndim != 2 or targets.ndim != 1: + raise ValueError("softcapped_ce fake impl expects 2D logits and 1D targets") + if logits.shape[0] != targets.shape[0]: + raise ValueError( + f"Expected matching rows, got logits={tuple(logits.shape)} targets={tuple(targets.shape)}" + ) + n_rows = logits.shape[0] + return ( + logits.new_empty((n_rows,), dtype=torch.float32), + logits.new_empty((n_rows,), dtype=torch.float32), + ) + + +@torch.library.custom_op(f"{_FUSED_CE_LIBRARY}::softcapped_ce_backward", mutates_args=()) +def softcapped_ce_backward_op( + logits: Tensor, targets: Tensor, lse: Tensor, grad_losses: Tensor, softcap: float, +) -> Tensor: + logits, targets = _validate_softcapped_ce_inputs(logits, targets, float(softcap)) + lse = lse.contiguous() + grad_losses = grad_losses.contiguous().to(dtype=torch.float32) + if lse.ndim != 1 or grad_losses.ndim != 1: + raise ValueError("Expected 1D lse and grad_losses") + if lse.shape[0] != logits.shape[0] or grad_losses.shape[0] != logits.shape[0]: + raise ValueError( + f"Expected row-aligned lse/grad_losses, got logits={tuple(logits.shape)} " + f"lse={tuple(lse.shape)} grad_losses={tuple(grad_losses.shape)}" + ) + grad_logits = torch.empty_like(logits) + n_rows, n_cols = logits.shape + _softcapped_ce_bwd_kernel[(n_rows,)]( + grad_logits, grad_losses, lse, logits, targets, + logits.stride(0), logits.stride(1), + grad_logits.stride(0), grad_logits.stride(1), + n_rows, n_cols, float(softcap), + block_size=_FUSED_CE_BLOCK_SIZE, num_warps=_FUSED_CE_NUM_WARPS, + ) + return grad_logits + + +@softcapped_ce_backward_op.register_fake +def _(logits: Tensor, targets: Tensor, lse: Tensor, grad_losses: Tensor, softcap: float): + if logits.ndim != 2 or targets.ndim != 1 or lse.ndim != 1 or grad_losses.ndim != 1: + raise ValueError("softcapped_ce_backward fake impl expects 2D logits and 1D row tensors") + if ( + logits.shape[0] != targets.shape[0] + or logits.shape[0] != lse.shape[0] + or logits.shape[0] != grad_losses.shape[0] + ): + raise ValueError("softcapped_ce_backward fake impl expects row-aligned tensors") + return logits.new_empty(logits.shape) + + +def _softcapped_ce_setup_context( + ctx: torch.autograd.function.FunctionCtx, inputs, output, +) -> None: + logits, targets, softcap = inputs + _losses, lse = output + ctx.save_for_backward(logits, targets, lse) + ctx.softcap = float(softcap) + + +def _softcapped_ce_backward( + ctx: torch.autograd.function.FunctionCtx, grad_losses: Tensor, grad_lse: "Tensor | None", +): + del grad_lse + logits, targets, lse = ctx.saved_tensors + grad_logits = torch.ops.pgsubmission1draft7fusedce.softcapped_ce_backward( + logits, targets, lse, grad_losses, ctx.softcap + ) + return grad_logits, None, None + + +softcapped_ce_op.register_autograd( + _softcapped_ce_backward, setup_context=_softcapped_ce_setup_context, +) + + +def softcapped_cross_entropy( + logits: Tensor, targets: Tensor, softcap: float, reduction: str = "mean", +) -> Tensor: + losses, _lse = torch.ops.pgsubmission1draft7fusedce.softcapped_ce( + logits, targets, float(softcap) + ) + if reduction == "none": + return losses + if reduction == "sum": + return losses.sum() + if reduction == "mean": + return losses.mean() + raise ValueError(f"Unsupported reduction={reduction!r}") + + +class Hyperparameters: + data_dir = os.environ.get("DATA_DIR", "./data/") + seed = int(os.environ.get("SEED", 1337)) + run_id = os.environ.get("RUN_ID", str(uuid.uuid4())) + iterations = int(os.environ.get("ITERATIONS", 20000)) + warmdown_frac = float(os.environ.get("WARMDOWN_FRAC", 0.75)) + warmup_steps = int(os.environ.get("WARMUP_STEPS", 20)) + train_batch_tokens = int(os.environ.get("TRAIN_BATCH_TOKENS", 786432)) + # Fused softcapped CE (Triton). Training-only — forward_logits eval path still uses + # eager softcap+F.cross_entropy. Default ON since validated as at-worst neutral. + fused_ce_enabled = bool(int(os.environ.get("FUSED_CE_ENABLED", "1"))) + train_seq_len = int(os.environ.get("TRAIN_SEQ_LEN", 2048)) + train_log_every = int(os.environ.get("TRAIN_LOG_EVERY", 500)) + max_wallclock_seconds = float(os.environ.get("MAX_WALLCLOCK_SECONDS", 6e2)) + val_batch_tokens = int(os.environ.get("VAL_BATCH_TOKENS", 524288)) + eval_seq_len = int(os.environ.get("EVAL_SEQ_LEN", 2048)) + val_loss_every = int(os.environ.get("VAL_LOSS_EVERY", 4000)) + vocab_size = int(os.environ.get("VOCAB_SIZE", 8192)) + num_layers = int(os.environ.get("NUM_LAYERS", 11)) + xsa_last_n = int(os.environ.get("XSA_LAST_N", 11)) + model_dim = int(os.environ.get("MODEL_DIM", 512)) + num_kv_heads = int(os.environ.get("NUM_KV_HEADS", 4)) + num_heads = int(os.environ.get("NUM_HEADS", 8)) + mlp_mult = float(os.environ.get("MLP_MULT", 4.0)) + skip_gates_enabled = bool(int(os.environ.get("SKIP_GATES_ENABLED", "1"))) + tie_embeddings = bool(int(os.environ.get("TIE_EMBEDDINGS", "1"))) + logit_softcap = float(os.environ.get("LOGIT_SOFTCAP", 3e1)) + rope_base = float(os.environ.get("ROPE_BASE", 1e4)) + rope_dims = int(os.environ.get("ROPE_DIMS", 16)) + rope_train_seq_len = int(os.environ.get("ROPE_TRAIN_SEQ_LEN", 2048)) + rope_yarn = bool(int(os.environ.get("ROPE_YARN", "0"))) + ln_scale = bool(int(os.environ.get("LN_SCALE", "1"))) + qk_gain_init = float(os.environ.get("QK_GAIN_INIT", 5.0)) + num_loops = int(os.environ.get("NUM_LOOPS", 2)) + loop_start = int(os.environ.get("LOOP_START", 3)) + loop_end = int(os.environ.get("LOOP_END", 5)) + enable_looping_at = float(os.environ.get("ENABLE_LOOPING_AT", 0.35)) + parallel_start_layer = int(os.environ.get("PARALLEL_START_LAYER", 8)) + parallel_final_lane = os.environ.get("PARALLEL_FINAL_LANE", "mean") + min_lr = float(os.environ.get("MIN_LR", 0.0)) + embed_lr = float(os.environ.get("EMBED_LR", 0.6)) + tied_embed_lr = float(os.environ.get("TIED_EMBED_LR", 0.03)) + tied_embed_init_std = float(os.environ.get("TIED_EMBED_INIT_STD", 0.005)) + matrix_lr = float(os.environ.get("MATRIX_LR", 0.026)) + scalar_lr = float(os.environ.get("SCALAR_LR", 0.02)) + muon_momentum = float(os.environ.get("MUON_MOMENTUM", 0.97)) + muon_backend_steps = int(os.environ.get("MUON_BACKEND_STEPS", 5)) + muon_momentum_warmup_start = float( + os.environ.get("MUON_MOMENTUM_WARMUP_START", 0.92) + ) + muon_momentum_warmup_steps = int(os.environ.get("MUON_MOMENTUM_WARMUP_STEPS", 1500)) + muon_row_normalize = bool(int(os.environ.get("MUON_ROW_NORMALIZE", "1"))) + beta1 = float(os.environ.get("BETA1", 0.9)) + beta2 = float(os.environ.get("BETA2", 0.95)) + adam_eps = float(os.environ.get("ADAM_EPS", 1e-08)) + grad_clip_norm = float(os.environ.get("GRAD_CLIP_NORM", 0.3)) + eval_stride = int(os.environ.get("EVAL_STRIDE", 64)) + adam_wd = float(os.environ.get("ADAM_WD", 0.02)) + muon_wd = float(os.environ.get("MUON_WD", 0.095)) + embed_wd = float(os.environ.get("EMBED_WD", 0.085)) + ema_decay = float(os.environ.get("EMA_DECAY", 0.9965)) + ttt_enabled = bool(int(os.environ.get("TTT_ENABLED", "1"))) + ttt_lora_rank = int(os.environ.get("TTT_LORA_RANK", 96)) + ttt_lora_lr = float(os.environ.get("TTT_LORA_LR", 0.0001)) + ttt_chunk_size = int(os.environ.get("TTT_CHUNK_SIZE", 48)) + ttt_eval_seq_len = int(os.environ.get("TTT_EVAL_SEQ_LEN", 2048)) + ttt_batch_size = int(os.environ.get("TTT_BATCH_SIZE", 64)) + ttt_grad_steps = int(os.environ.get("TTT_GRAD_STEPS", 1)) + # V19: PR #1886 (renqianluo) + sunnypatneedi research log 2026-04-28 found that + # the Triton fused-CE kernel's fp32-accumulation interacts with warm-start LoRA-A + # to destabilize seeds 314/1337 at TTT_WEIGHT_DECAY=1.0. Raising the default to + # 2.0 prevents seed collapse without measurably moving stable seeds. + ttt_weight_decay = float(os.environ.get("TTT_WEIGHT_DECAY", 2.0)) + ttt_beta1 = float(os.environ.get("TTT_BETA1", 0)) + ttt_beta2 = float(os.environ.get("TTT_BETA2", 0.999)) + ttt_k_lora = bool(int(os.environ.get("TTT_K_LORA", "1"))) + ttt_mlp_lora = bool(int(os.environ.get("TTT_MLP_LORA", "1"))) + ttt_o_lora = bool(int(os.environ.get("TTT_O_LORA", "1"))) + ttt_optimizer = os.environ.get("TTT_OPTIMIZER", "adam") + ttt_eval_batches = os.environ.get("TTT_EVAL_BATCHES", "") + val_doc_fraction = float(os.environ.get("VAL_DOC_FRACTION", 1.0)) + compressor = os.environ.get("COMPRESSOR", "brotli") + gptq_calibration_batches = int(os.environ.get("GPTQ_CALIBRATION_BATCHES", 16)) + gptq_reserve_seconds = float(os.environ.get("GPTQ_RESERVE_SECONDS", 4.0)) + phased_ttt_prefix_docs = int(os.environ.get("PHASED_TTT_PREFIX_DOCS", 2000)) + phased_ttt_num_phases = int(os.environ.get("PHASED_TTT_NUM_PHASES", 1)) + global_ttt_lr = float(os.environ.get("GLOBAL_TTT_LR", 0.001)) + global_ttt_momentum = float(os.environ.get("GLOBAL_TTT_MOMENTUM", 0.9)) + global_ttt_epochs = int(os.environ.get("GLOBAL_TTT_EPOCHS", 1)) + global_ttt_chunk_tokens = int(os.environ.get("GLOBAL_TTT_CHUNK_TOKENS", 32768)) + global_ttt_batch_seqs = int(os.environ.get("GLOBAL_TTT_BATCH_SEQS", 32)) + global_ttt_warmup_start_lr = float(os.environ.get("GLOBAL_TTT_WARMUP_START_LR", 0.0)) + global_ttt_warmup_chunks = int(os.environ.get("GLOBAL_TTT_WARMUP_CHUNKS", 0)) + global_ttt_grad_clip = float(os.environ.get("GLOBAL_TTT_GRAD_CLIP", 1.0)) + global_ttt_respect_doc_boundaries = bool(int(os.environ.get("GLOBAL_TTT_RESPECT_DOC_BOUNDARIES", "1"))) + matrix_bits = int(os.environ.get("MATRIX_BITS", 6)) + embed_bits = int(os.environ.get("EMBED_BITS", 8)) + matrix_clip_sigmas = float(os.environ.get("MATRIX_CLIP_SIGMAS", 12.85)) + embed_clip_sigmas = float(os.environ.get("EMBED_CLIP_SIGMAS", 2e1)) + mlp_clip_sigmas = float(os.environ.get("MLP_CLIP_SIGMAS", 10.0)) + attn_clip_sigmas = float(os.environ.get("ATTN_CLIP_SIGMAS", 13.0)) + # AttnOutGate (per-head multiplicative output gate, PR #1667 MarioPaerle). + # Zero-init weight: 2*sigmoid(0)=1 -> transparent at start. Source defaults to + # block input x ('proj'); 'q' uses raw Q projection output. + attn_out_gate_enabled = bool(int(os.environ.get("ATTN_OUT_GATE_ENABLED", "0"))) + attn_out_gate_src = os.environ.get("ATTN_OUT_GATE_SRC", "proj") + # SmearGate (input-dependent forward-1 token smear, modded-nanogpt @classiclarryd + # via PR #1667). x_t <- x_t + lam * sigmoid(W*x_t[:gate_window]) * x_{t-1}. + # lam=0 + W=0 -> transparent at init. + smear_gate_enabled = bool(int(os.environ.get("SMEAR_GATE_ENABLED", "0"))) + # Window: first GATE_WINDOW dims of the source feed the gate projection. + gate_window = int(os.environ.get("GATE_WINDOW", 12)) + # Gated Attention (Qwen, NeurIPS 2025 Best Paper, arXiv:2505.06708; + # qiuzh20/gated_attention). Per-head sigmoid gate on SDPA output, BEFORE + # out_proj. Gate input = full block input x (paper's headwise G1 variant + # driven from hidden_states). W_g shape (num_heads, dim), plain sigmoid. + # Near-zero init gives g~0.5 at step 0 (half attention output); per-block + # attn_scale (init 1.0) compensates during training. Name contains + # "attn_gate" so CONTROL_TENSOR_NAME_PATTERNS routes it to scalar AdamW. + gated_attn_enabled = bool(int(os.environ.get("GATED_ATTN_ENABLED", "0"))) + gated_attn_init_std = float(os.environ.get("GATED_ATTN_INIT_STD", 0.01)) + # Dedicated int8-per-row quantization for `attn_gate_w` tensors. These are + # small ((num_heads, dim) = (8, 512) = 4096 params) and bypass GPTQ via the + # numel<=65536 passthrough branch -> stored as fp16 (8 KB/layer, ~65 KB total + # compressed). int8-per-row cuts the raw tensor in half with negligible BPB + # impact: scales per head (8 values), symmetric quant over [-127, 127]. + # No Hessian needed (gate weights not in collect_hessians()). + gated_attn_quant_gate = bool(int(os.environ.get("GATED_ATTN_QUANT_GATE", "0"))) + # Sparse Attention Gate (modded-nanogpt-style). Keeps dense SDPA and only + # swaps the output-gate input to the first GATE_WINDOW residual dims. + # W_g: (num_heads, gate_window) = (8, 12) = 96 params/layer (~44K total), + # vs dense GatedAttn's (8, 512) = 4K/layer (~44K diff). Name "attn_gate_w" + # is shared so quant routing and int8 gate passthrough Just Work. Gate + # passthrough int8 still applies via GATED_ATTN_QUANT_GATE=1. + # Mutually exclusive with ATTN_OUT_GATE_ENABLED and GATED_ATTN_ENABLED. + sparse_attn_gate_enabled = bool(int(os.environ.get("SPARSE_ATTN_GATE_ENABLED", "0"))) + sparse_attn_gate_init_std = float(os.environ.get("SPARSE_ATTN_GATE_INIT_STD", 0.0)) + sparse_attn_gate_scale = float(os.environ.get("SPARSE_ATTN_GATE_SCALE", 1.0)) + # LQER asymmetric rank-k correction on top-K quant-error tensors (PR #1530 v2 port). + # Computes SVD of E = W_fp - W_quant, packs top-r A,B as INT2/INT4 (asym) or INTk (sym). + lqer_enabled = bool(int(os.environ.get("LQER_ENABLED", "1"))) + lqer_rank = int(os.environ.get("LQER_RANK", 4)) + lqer_top_k = int(os.environ.get("LQER_TOP_K", 3)) + lqer_factor_bits = int(os.environ.get("LQER_FACTOR_BITS", 4)) + lqer_asym_enabled = bool(int(os.environ.get("LQER_ASYM_ENABLED", "1"))) + lqer_asym_group = int(os.environ.get("LQER_ASYM_GROUP", "64")) + lqer_scope = os.environ.get("LQER_SCOPE", "all") + lqer_gain_select = bool(int(os.environ.get("LQER_GAIN_SELECT", "0"))) + awq_lite_enabled = bool(int(os.environ.get("AWQ_LITE_ENABLED", "0"))) + awq_lite_bits = int(os.environ.get("AWQ_LITE_BITS", "8")) + awq_lite_group_top_k = int(os.environ.get("AWQ_LITE_GROUP_TOP_K", "1")) + awq_lite_group_size = int(os.environ.get("AWQ_LITE_GROUP_SIZE", "64")) + # PR #1145 online n-gram tilt (AnirudhRahul, valerio-endorsed). Causal, + # normalized, prefix-only experts; closed-form multiplicative-boost-with-renorm + # applied to per-token NLL. See online_ngram_tilt.py for math + compliance. + ngram_tilt_enabled = bool(int(os.environ.get("NGRAM_TILT_ENABLED", "0"))) + token_order = int(os.environ.get("TOKEN_ORDER", "16")) + token_threshold = float(os.environ.get("TOKEN_THRESHOLD", "0.800")) + token_boost = float(os.environ.get("TOKEN_BOOST", "2.625")) + within_tau = float(os.environ.get("WITHIN_TAU", "0.450")) + within_boost = float(os.environ.get("WITHIN_BOOST", "0.750")) + word_order = int(os.environ.get("WORD_ORDER", "4")) + word_normalize = os.environ.get("WORD_NORMALIZE", "strip_punct_lower") + word_tau = float(os.environ.get("WORD_TAU", "0.650")) + word_boost = float(os.environ.get("WORD_BOOST", "0.750")) + agree_add_boost = float(os.environ.get("AGREE_ADD_BOOST", "0.500")) + # === v5 Stage 1 optimizations (env-gated) === + # 1A: Move ngram hint precompute OUTSIDE eval timer (single causal pass over val tokens). + # Compliance: still inside validate(), single-pass causal, val tokens only. + # Save: ~168s (measured in v2 fulltilt) — enough alone to fit cap. + ngram_hint_precompute_outside = bool(int(os.environ.get("NGRAM_HINT_PRECOMPUTE_OUTSIDE", "1"))) + # 2C: Temperature scaling on logits before softcap. Σ P=1 preserved. + # Default 1.0 = no-op. Tune on train holdout, apply at eval. + temperature_scale = float(os.environ.get("TEMPERATURE_SCALE", "1.0")) + distributed = "RANK" in os.environ and "WORLD_SIZE" in os.environ + rank = int(os.environ.get("RANK", "0")) + world_size = int(os.environ.get("WORLD_SIZE", "1")) + local_rank = int(os.environ.get("LOCAL_RANK", "0")) + is_main_process = rank == 0 + grad_accum_steps = 8 // world_size + # CaseOps integration: optional override of dataset root + tokenizer path. + # When CASEOPS_ENABLED=1, the wrapper loads a per-token byte sidecar + # (fineweb_val_bytes_*.bin, identical shard layout to val_*.bin) and uses + # it as the canonical raw-byte budget for BPB accounting. The sidecar + # REPLACES the build_sentencepiece_luts byte-counting path entirely. + caseops_enabled = bool(int(os.environ.get("CASEOPS_ENABLED", "0"))) + _default_caseops_data = os.path.join( + data_dir, + "datasets", + "fineweb10B_sp8192_caseops", + "datasets", + "datasets", + "fineweb10B_sp8192_lossless_caps_caseops_v1_reserved", + ) + _default_caseops_tok = os.path.join( + data_dir, + "datasets", + "fineweb10B_sp8192_caseops", + "datasets", + "tokenizers", + "fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model", + ) + if caseops_enabled: + datasets_dir = os.environ.get("DATA_PATH", _default_caseops_data) + tokenizer_path = os.environ.get("TOKENIZER_PATH", _default_caseops_tok) + else: + datasets_dir = os.environ.get( + "DATA_PATH", + os.path.join(data_dir, "datasets", f"fineweb10B_sp{vocab_size}"), + ) + tokenizer_path = os.environ.get( + "TOKENIZER_PATH", + os.path.join(data_dir, "tokenizers", f"fineweb_{vocab_size}_bpe.model"), + ) + train_files = os.path.join(datasets_dir, "fineweb_train_*.bin") + val_files = os.path.join(datasets_dir, "fineweb_val_*.bin") + val_bytes_files = os.path.join(datasets_dir, "fineweb_val_bytes_*.bin") + artifact_dir = os.environ.get("ARTIFACT_DIR", "") + logfile = ( + os.path.join(artifact_dir, f"{run_id}.txt") + if artifact_dir + else f"logs/{run_id}.txt" + ) + model_path = ( + os.path.join(artifact_dir, "final_model.pt") + if artifact_dir + else "final_model.pt" + ) + quantized_model_path = ( + os.path.join(artifact_dir, "final_model.int6.ptz") + if artifact_dir + else "final_model.int6.ptz" + ) + + +_logger_hparams = None + + +def set_logging_hparams(h): + global _logger_hparams + _logger_hparams = h + + +def log(msg, console=True): + if _logger_hparams is None: + print(msg) + return + if _logger_hparams.is_main_process: + if console: + print(msg) + if _logger_hparams.logfile is not None: + with open(_logger_hparams.logfile, "a", encoding="utf-8") as f: + print(msg, file=f) + + +class ValidationData: + def __init__(self, h, device): + self.sp = spm.SentencePieceProcessor(model_file=h.tokenizer_path) + if int(self.sp.vocab_size()) != h.vocab_size: + raise ValueError( + f"VOCAB_SIZE={h.vocab_size} does not match tokenizer vocab_size={int(self.sp.vocab_size())}" + ) + self.val_tokens = load_validation_tokens(h.val_files, h.eval_seq_len) + self.caseops_enabled = bool(getattr(h, "caseops_enabled", False)) + if self.caseops_enabled: + self.base_bytes_lut = None + self.has_leading_space_lut = None + self.is_boundary_token_lut = None + else: + ( + self.base_bytes_lut, + self.has_leading_space_lut, + self.is_boundary_token_lut, + ) = build_sentencepiece_luts(self.sp, h.vocab_size, device) + self.val_bytes = None + if self.caseops_enabled: + self.val_bytes = load_validation_byte_sidecar( + h.val_bytes_files, h.eval_seq_len, self.val_tokens.numel() + ) + + +def build_sentencepiece_luts(sp, vocab_size, device): + sp_vocab_size = int(sp.vocab_size()) + assert ( + sp.piece_to_id("▁") != sp.unk_id() + ), "Tokenizer must have '▁' (space) as its own token for correct BPB byte counting" + table_size = max(sp_vocab_size, vocab_size) + base_bytes_np = np.zeros((table_size,), dtype=np.int16) + has_leading_space_np = np.zeros((table_size,), dtype=np.bool_) + is_boundary_token_np = np.ones((table_size,), dtype=np.bool_) + for token_id in range(sp_vocab_size): + if sp.is_control(token_id) or sp.is_unknown(token_id) or sp.is_unused(token_id): + continue + is_boundary_token_np[token_id] = False + if sp.is_byte(token_id): + base_bytes_np[token_id] = 1 + continue + piece = sp.id_to_piece(token_id) + if piece.startswith("▁"): + has_leading_space_np[token_id] = True + piece = piece[1:] + base_bytes_np[token_id] = len(piece.encode("utf-8")) + return ( + torch.tensor(base_bytes_np, dtype=torch.int16, device=device), + torch.tensor(has_leading_space_np, dtype=torch.bool, device=device), + torch.tensor(is_boundary_token_np, dtype=torch.bool, device=device), + ) + + +def load_validation_tokens(pattern, seq_len): + # Filter out CaseOps byte sidecar shards which share the val_*.bin glob. + files = [ + Path(p) + for p in sorted(glob.glob(pattern)) + if "_bytes_" not in Path(p).name + ] + if not files: + raise FileNotFoundError(f"No files found for pattern: {pattern}") + tokens = torch.cat([load_data_shard(file) for file in files]).contiguous() + usable = (tokens.numel() - 1) // seq_len * seq_len + if usable <= 0: + raise ValueError(f"Validation split is too short for TRAIN_SEQ_LEN={seq_len}") + return tokens[: usable + 1] + + +def load_validation_byte_sidecar(pattern, seq_len, expected_len): + """Load CaseOps per-token byte sidecar(s). Same shard layout as token shards + (256 int32 header + uint16 array). Each entry = canonical raw-text byte + budget for that token in the corresponding val shard. Returns a CPU + int16 tensor sliced to match expected_len (i.e. val_tokens length).""" + files = [Path(p) for p in sorted(glob.glob(pattern))] + if not files: + raise FileNotFoundError(f"No byte sidecar files for pattern: {pattern}") + shards = [load_data_shard(file) for file in files] + # load_data_shard returns uint16 — that's exactly what the sidecar stores. + bytes_full = torch.cat(shards).contiguous() + if bytes_full.numel() < expected_len: + raise ValueError( + f"Byte sidecar too short: {bytes_full.numel()} < val_tokens {expected_len}" + ) + return bytes_full[:expected_len].to(torch.int32) + + +def load_data_shard(file): + header_bytes = 256 * np.dtype(" 0: + pos = start + while pos < end: + seg_starts.append(pos) + pos += max_doc_len + else: + seg_starts.append(start) + boundaries = seg_starts + [total_len] + padded_len = get_next_multiple_of_n(len(boundaries), bucket_size) + cu = torch.full((padded_len,), total_len, dtype=torch.int32, device=device) + cu[: len(boundaries)] = torch.tensor(boundaries, dtype=torch.int32, device=device) + seg_ends = seg_starts[1:] + [total_len] + max_seqlen = max(end - start for start, end in zip(seg_starts, seg_ends)) + return cu, max_seqlen + +class DocumentPackingLoader: + _shard_pool = ThreadPoolExecutor(1) + + def __init__(self, h, device, cu_bucket_size=64): + self.rank = h.rank + self.world_size = h.world_size + self.device = device + self.cu_bucket_size = cu_bucket_size + self.max_seq_len = h.train_seq_len + all_files = [Path(p) for p in sorted(glob.glob(h.train_files))] + if not all_files: + raise FileNotFoundError(f"No files found for pattern: {h.train_files}") + self.files = all_files + self.file_iter = iter(self.files) + self._init_shard(load_data_shard(next(self.file_iter))) + self._next_shard = self._submit_next_shard() + self._batch_pool = ThreadPoolExecutor(1) + self._prefetch_queue = [] + + def _init_shard(self, tokens): + global BOS_ID + self.tokens = tokens + self.shard_size = tokens.numel() + if BOS_ID is None: + BOS_ID = 1 + self.bos_idx = ( + (tokens == BOS_ID).nonzero(as_tuple=True)[0].to(torch.int64).cpu().numpy() + ) + self.cursor = int(self.bos_idx[0]) + + def _submit_next_shard(self): + try: + path = next(self.file_iter) + return self._shard_pool.submit(load_data_shard, path) + except StopIteration: + return None + + def _advance_shard(self): + if self._next_shard is None: + self.file_iter = iter(self.files) + self._next_shard = self._shard_pool.submit( + load_data_shard, next(self.file_iter) + ) + self._init_shard(self._next_shard.result()) + self._next_shard = self._submit_next_shard() + + def _local_doc_starts(self, local_start, total_len): + lo = np.searchsorted(self.bos_idx, local_start, side="left") + hi = np.searchsorted(self.bos_idx, local_start + total_len, side="left") + return (self.bos_idx[lo:hi] - local_start).tolist() + + def _prepare_batch(self, num_tokens_local, max_seq_len): + per_rank_span = num_tokens_local + 1 + global_span = per_rank_span * self.world_size + while self.cursor + global_span > self.shard_size: + self._advance_shard() + local_start = self.cursor + self.rank * per_rank_span + buf = self.tokens[local_start : local_start + per_rank_span] + inputs = torch.empty(per_rank_span - 1, dtype=torch.int64, pin_memory=True) + targets = torch.empty(per_rank_span - 1, dtype=torch.int64, pin_memory=True) + inputs.copy_(buf[:-1]) + targets.copy_(buf[1:]) + starts = self._local_doc_starts(local_start, inputs.numel()) + cu_seqlens, max_seqlen = _build_cu_seqlens( + starts, inputs.numel(), inputs.device, max_seq_len, self.cu_bucket_size + ) + cu_seqlens = cu_seqlens.pin_memory() + self.cursor += global_span + return inputs, targets, cu_seqlens, max_seqlen + + def next_batch(self, global_tokens, grad_accum_steps): + num_tokens_local = global_tokens // (self.world_size * grad_accum_steps) + while len(self._prefetch_queue) < 2: + self._prefetch_queue.append( + self._batch_pool.submit(self._prepare_batch, num_tokens_local, self.max_seq_len)) + inputs, targets, cu_seqlens, max_seqlen = self._prefetch_queue.pop(0).result() + self._prefetch_queue.append( + self._batch_pool.submit(self._prepare_batch, num_tokens_local, self.max_seq_len)) + return ( + inputs[None].to(self.device, non_blocking=True), + targets[None].to(self.device, non_blocking=True), + cu_seqlens.to(self.device, non_blocking=True), + max_seqlen, + ) + + +class ShuffledSequenceLoader: + def __init__(self, h, device): + self.world_size = h.world_size + self.seq_len = h.train_seq_len + self.device = device + all_files = [Path(p) for p in sorted(glob.glob(h.train_files))] + if not all_files: + raise FileNotFoundError(f"No files found for pattern: {h.train_files}") + self.files = all_files[h.rank :: h.world_size] + self.rng = np.random.Generator(np.random.PCG64(h.rank)) + self.num_tokens = [_read_num_tokens(f) for f in self.files] + self.start_inds = [[] for _ in self.files] + for si in range(len(self.files)): + self._reset_shard(si) + + def _reset_shard(self, si): + max_phase = min( + self.seq_len - 1, max(0, self.num_tokens[si] - self.seq_len - 1) + ) + phase = int(self.rng.integers(max_phase + 1)) if max_phase > 0 else 0 + num_sequences = (self.num_tokens[si] - 1 - phase) // self.seq_len + sequence_order = self.rng.permutation(num_sequences) + self.start_inds[si] = (phase + sequence_order * self.seq_len).tolist() + + def next_batch(self, global_tokens, grad_accum_steps): + device_tokens = global_tokens // (self.world_size * grad_accum_steps) + device_batch_size = device_tokens // self.seq_len + remaining = np.array([len(s) for s in self.start_inds], dtype=np.float64) + x = torch.empty((device_batch_size, self.seq_len), dtype=torch.int64) + y = torch.empty((device_batch_size, self.seq_len), dtype=torch.int64) + for bi in range(device_batch_size): + total = remaining.sum() + if total <= 0: + for si in range(len(self.files)): + self._reset_shard(si) + remaining = np.array( + [len(s) for s in self.start_inds], dtype=np.float64 + ) + total = remaining.sum() + probs = remaining / total + si = int(self.rng.choice(len(self.files), p=probs)) + start_ind = self.start_inds[si].pop() + remaining[si] -= 1 + mm = _get_shard_memmap(self.files[si]) + window = torch.as_tensor( + np.array(mm[start_ind : start_ind + self.seq_len + 1], dtype=np.int64) + ) + x[bi] = window[:-1] + y[bi] = window[1:] + return x.to(self.device, non_blocking=True), y.to( + self.device, non_blocking=True + ) + + +class RMSNorm(nn.Module): + def __init__(self, eps=None): + super().__init__() + self.eps = eps + + def forward(self, x): + return F.rms_norm(x, (x.size(-1),), eps=self.eps) + + +class CastedLinear(nn.Linear): + def forward(self, x): + w = self.weight.to(x.dtype) + bias = self.bias.to(x.dtype) if self.bias is not None else None + return F.linear(x, w, bias) + + +@triton.jit +def fused_log_softmax_dual_gather_kernel( + logits_ptr, + target_ids_ptr, + hint_ids_ptr, + log_p_y_out_ptr, + log_q_h_out_ptr, + BT, + V, + BLOCK_V: tl.constexpr, +): + """Fused log_softmax + dual gather. Single pass over [BT, V] logits per row, + extracts log p(target_id) and log p(hint_id) via online logsumexp. + Replaces F.log_softmax (which materializes [BT, V] fp32) + 2 gather ops. + """ + pid = tl.program_id(0) + if pid >= BT: + return + + target = tl.load(target_ids_ptr + pid) + hint = tl.load(hint_ids_ptr + pid) + row_offset = pid * V + + target_logit = tl.load(logits_ptr + row_offset + target).to(tl.float32) + hint_logit = tl.load(logits_ptr + row_offset + hint).to(tl.float32) + + NEG_INF = float("-inf") + max_val = NEG_INF + for v_start in tl.range(0, V, BLOCK_V): + v_offsets = v_start + tl.arange(0, BLOCK_V) + mask = v_offsets < V + chunk = tl.load( + logits_ptr + row_offset + v_offsets, mask=mask, other=NEG_INF + ).to(tl.float32) + block_max = tl.max(chunk, axis=0) + max_val = tl.maximum(max_val, block_max) + + sum_exp = tl.zeros((), dtype=tl.float32) + for v_start in tl.range(0, V, BLOCK_V): + v_offsets = v_start + tl.arange(0, BLOCK_V) + mask = v_offsets < V + chunk = tl.load( + logits_ptr + row_offset + v_offsets, mask=mask, other=0.0 + ).to(tl.float32) + chunk_centered = chunk - max_val + exp_chunk = tl.where(mask, tl.exp(chunk_centered), 0.0) + sum_exp += tl.sum(exp_chunk, axis=0) + + log_sum_exp = max_val + tl.log(sum_exp) + log_p_y = target_logit - log_sum_exp + log_p_h = hint_logit - log_sum_exp + + tl.store(log_p_y_out_ptr + pid, log_p_y) + tl.store(log_q_h_out_ptr + pid, log_p_h) + + +def fused_log_softmax_dual_gather(logits, target_ids, hint_ids): + """Triton wrapper — replaces F.log_softmax + 2 gather pattern. + Returns (log_p_y, log_q_h) where p = softmax(logits). + """ + bsz, sl, V = logits.shape + BT = bsz * sl + logits_flat = logits.reshape(BT, V).contiguous() + target_flat = target_ids.reshape(BT).contiguous() + hint_flat = hint_ids.reshape(BT).contiguous() + + log_p_y_out = torch.empty(BT, dtype=torch.float32, device=logits.device) + log_q_h_out = torch.empty(BT, dtype=torch.float32, device=logits.device) + + BLOCK_V = 1024 + grid = (BT,) + fused_log_softmax_dual_gather_kernel[grid]( + logits_flat, + target_flat, + hint_flat, + log_p_y_out, + log_q_h_out, + BT, + V, + BLOCK_V=BLOCK_V, + num_warps=8, + ) + return log_p_y_out.reshape(bsz, sl), log_q_h_out.reshape(bsz, sl) + + +@triton.jit +def linear_leaky_relu_square_kernel( + a_desc, + b_desc, + c_desc, + aux_desc, + M, + N, + K, + BLOCK_SIZE_M: tl.constexpr, + BLOCK_SIZE_N: tl.constexpr, + BLOCK_SIZE_K: tl.constexpr, + NUM_SMS: tl.constexpr, + FORWARD: tl.constexpr, +): + dtype = tl.bfloat16 + start_pid = tl.program_id(axis=0) + num_pid_m = tl.cdiv(M, BLOCK_SIZE_M) + num_pid_n = tl.cdiv(N, BLOCK_SIZE_N) + k_tiles = tl.cdiv(K, BLOCK_SIZE_K) + num_tiles = num_pid_m * num_pid_n + tile_id_c = start_pid - NUM_SMS + for tile_id in tl.range(start_pid, num_tiles, NUM_SMS, flatten=True): + pid_m = tile_id // num_pid_n + pid_n = tile_id % num_pid_n + offs_am = pid_m * BLOCK_SIZE_M + offs_bn = pid_n * BLOCK_SIZE_N + accumulator = tl.zeros((BLOCK_SIZE_M, BLOCK_SIZE_N), dtype=tl.float32) + for ki in range(k_tiles): + offs_k = ki * BLOCK_SIZE_K + a = a_desc.load([offs_am, offs_k]) + b = b_desc.load([offs_bn, offs_k]) + accumulator = tl.dot(a, b.T, accumulator) + tile_id_c += NUM_SMS + offs_am_c = offs_am + offs_bn_c = offs_bn + acc = tl.reshape(accumulator, (BLOCK_SIZE_M, 2, BLOCK_SIZE_N // 2)) + acc = tl.permute(acc, (0, 2, 1)) + acc0, acc1 = tl.split(acc) + c0 = acc0.to(dtype) + c1 = acc1.to(dtype) + if not FORWARD: + pre0 = aux_desc.load([offs_am_c, offs_bn_c]) + pre1 = aux_desc.load([offs_am_c, offs_bn_c + BLOCK_SIZE_N // 2]) + c0 = c0 * tl.where(pre0 > 0, 2.0 * pre0, 0.18 * pre0) + c1 = c1 * tl.where(pre1 > 0, 2.0 * pre1, 0.18 * pre1) + c_desc.store([offs_am_c, offs_bn_c], c0) + c_desc.store([offs_am_c, offs_bn_c + BLOCK_SIZE_N // 2], c1) + if FORWARD: + aux0 = tl.where(c0 > 0, c0, 0.3 * c0) + aux1 = tl.where(c1 > 0, c1, 0.3 * c1) + aux_desc.store([offs_am_c, offs_bn_c], aux0 * aux0) + aux_desc.store([offs_am_c, offs_bn_c + BLOCK_SIZE_N // 2], aux1 * aux1) + + +def linear_leaky_relu_square(a, b, aux=None): + M, K = a.shape + N, K2 = b.shape + assert K == K2 + c = torch.empty((M, N), device=a.device, dtype=a.dtype) + forward = aux is None + if aux is None: + aux = torch.empty((M, N), device=a.device, dtype=a.dtype) + num_sms = torch.cuda.get_device_properties(a.device).multi_processor_count + BLOCK_SIZE_M, BLOCK_SIZE_N, BLOCK_SIZE_K = 256, 128, 64 + num_stages = 4 if forward else 3 + a_desc = TensorDescriptor.from_tensor(a, [BLOCK_SIZE_M, BLOCK_SIZE_K]) + b_desc = TensorDescriptor.from_tensor(b, [BLOCK_SIZE_N, BLOCK_SIZE_K]) + c_desc = TensorDescriptor.from_tensor(c, [BLOCK_SIZE_M, BLOCK_SIZE_N // 2]) + aux_desc = TensorDescriptor.from_tensor(aux, [BLOCK_SIZE_M, BLOCK_SIZE_N // 2]) + grid = lambda _meta: ( + min(num_sms, triton.cdiv(M, BLOCK_SIZE_M) * triton.cdiv(N, BLOCK_SIZE_N)), + ) + linear_leaky_relu_square_kernel[grid]( + a_desc, + b_desc, + c_desc, + aux_desc, + M, + N, + K, + BLOCK_SIZE_M=BLOCK_SIZE_M, + BLOCK_SIZE_N=BLOCK_SIZE_N, + BLOCK_SIZE_K=BLOCK_SIZE_K, + NUM_SMS=num_sms, + FORWARD=forward, + num_stages=num_stages, + num_warps=8, + ) + if forward: + return c, aux + return c + + +class FusedLinearLeakyReLUSquareFunction(torch.autograd.Function): + @staticmethod + def forward(ctx, x, w1, w2): + x_flat = x.reshape(-1, x.shape[-1]) + pre, post = linear_leaky_relu_square(x_flat, w1) + out = F.linear(post, w2) + ctx.save_for_backward(x, w1, w2, pre, post) + return out.view(*x.shape[:-1], out.shape[-1]) + + @staticmethod + def backward(ctx, grad_output): + x, w1, w2, pre, post = ctx.saved_tensors + x_flat = x.reshape(-1, x.shape[-1]) + grad_output_flat = grad_output.reshape(-1, grad_output.shape[-1]) + dw2 = grad_output_flat.T @ post + dpre = linear_leaky_relu_square(grad_output_flat, w2.T.contiguous(), aux=pre) + dw1 = dpre.T @ x_flat + dx = dpre @ w1 + return dx.view_as(x), dw1, dw2 + + +FusedLeakyReLUSquareMLP = FusedLinearLeakyReLUSquareFunction.apply + + +class Rotary(nn.Module): + def __init__(self, dim, base=1e4, train_seq_len=1024, rope_dims=0, yarn=True): + super().__init__() + self.dim = dim + self.base = base + self.train_seq_len = train_seq_len + self.yarn = yarn + self.rope_dims = rope_dims if rope_dims > 0 else dim + inv_freq = 1.0 / base ** ( + torch.arange(0, self.rope_dims, 2, dtype=torch.float32) / self.rope_dims + ) + self.register_buffer("inv_freq", inv_freq, persistent=False) + self._seq_len_cached = 0 + self._cos_cached = None + self._sin_cached = None + + def forward(self, seq_len, device, dtype): + if ( + self._cos_cached is None + or self._sin_cached is None + or self._seq_len_cached < seq_len + or self._cos_cached.device != device + ): + rd = self.rope_dims + if self.yarn and seq_len > self.train_seq_len: + scale = seq_len / self.train_seq_len + new_base = self.base * scale ** (rd / (rd - 2)) + inv_freq = 1.0 / new_base ** ( + torch.arange(0, rd, 2, dtype=torch.float32, device=device) / rd + ) + else: + inv_freq = self.inv_freq.float().to(device) + t = torch.arange(seq_len, device=device, dtype=torch.float32) + freqs = torch.outer(t, inv_freq) + self._cos_cached = freqs.cos()[None, :, None, :] + self._sin_cached = freqs.sin()[None, :, None, :] + self._seq_len_cached = seq_len + return self._cos_cached[:, :seq_len].to(dtype=dtype), self._sin_cached[:, :seq_len].to(dtype=dtype) + + +def apply_rotary_emb(x, cos, sin, rope_dims=0): + if rope_dims > 0 and rope_dims < x.size(-1): + x_rope, x_pass = x[..., :rope_dims], x[..., rope_dims:] + half = rope_dims // 2 + x1, x2 = x_rope[..., :half], x_rope[..., half:] + x_rope = torch.cat((x1 * cos + x2 * sin, x1 * -sin + x2 * cos), dim=-1) + return torch.cat((x_rope, x_pass), dim=-1) + half = x.size(-1) // 2 + x1, x2 = x[..., :half], x[..., half:] + return torch.cat((x1 * cos + x2 * sin, x1 * -sin + x2 * cos), dim=-1) + + +class CausalSelfAttention(nn.Module): + def __init__( + self, dim, num_heads, num_kv_heads, rope_base, qk_gain_init, train_seq_len, yarn=True, + attn_out_gate=False, attn_out_gate_src="proj", gate_window=12, + gated_attn=False, gated_attn_init_std=0.01, + sparse_attn_gate=False, sparse_attn_gate_init_std=0.0, sparse_attn_gate_scale=1.0, + ): + super().__init__() + if dim % num_heads != 0: + raise ValueError("model_dim must be divisible by num_heads") + if num_heads % num_kv_heads != 0: + raise ValueError("num_heads must be divisible by num_kv_heads") + if int(attn_out_gate) + int(gated_attn) + int(sparse_attn_gate) > 1: + raise ValueError( + "attn_out_gate, gated_attn, and sparse_attn_gate are mutually exclusive" + ) + self.num_heads = num_heads + self.num_kv_heads = num_kv_heads + self.head_dim = dim // num_heads + if self.head_dim % 2 != 0: + raise ValueError("head_dim must be even for RoPE") + self.q_gain = nn.Parameter( + torch.full((num_heads,), qk_gain_init, dtype=torch.float32) + ) + self.rope_dims = 0 + self.rotary = Rotary(self.head_dim, base=rope_base, train_seq_len=train_seq_len, yarn=yarn) + self.use_xsa = False + # AttnOutGate (PR #1667 MarioPaerle): per-head multiplicative gate on attention + # output. CastedLinear so restore_fp32_params casts back to fp32 for GPTQ. + # _zero_init -> 2*sigmoid(0)=1 -> transparent at init. + self.attn_out_gate = attn_out_gate + self.attn_out_gate_src = attn_out_gate_src + self.gate_window = gate_window + if attn_out_gate: + self.attn_gate_proj = CastedLinear(gate_window, num_heads, bias=False) + self.attn_gate_proj._zero_init = True + # Gated Attention (arXiv:2505.06708, Qwen, NeurIPS 2025). Per-head sigmoid + # gate on SDPA output, BEFORE out_proj. Gate projection W_g: (num_heads, dim). + # Name "attn_gate_w" contains "attn_gate" substring so it matches + # CONTROL_TENSOR_NAME_PATTERNS and routes to the scalar AdamW group. + # fp32 Parameter -> restore_fp32_params path covers it via the ndim<2 OR + # name-pattern check (name matches "attn_gate"). Cast to x.dtype on use. + self.gated_attn = gated_attn + if gated_attn: + W = torch.empty(num_heads, dim, dtype=torch.float32) + nn.init.normal_(W, mean=0.0, std=gated_attn_init_std) + self.attn_gate_w = nn.Parameter(W) + # Sparse attention head-output gate (modded-nanogpt style). Keeps dense SDPA + # and only narrows the gate input to the first gate_window residual dims. + # W_g: (num_heads, gate_window). y_{t,h} <- sigmoid(scale * W_g_h @ x_t[:gate_window]) * y_{t,h}. + # Shares attn_gate_w name with dense GatedAttn so the quant routing + # (CONTROL_TENSOR_NAME_PATTERNS / attn_gate_w int8 passthrough) is unchanged. + self.sparse_attn_gate = sparse_attn_gate + self.sparse_attn_gate_scale = sparse_attn_gate_scale + if sparse_attn_gate: + W = torch.empty(num_heads, gate_window, dtype=torch.float32) + if sparse_attn_gate_init_std > 0: + nn.init.normal_(W, mean=0.0, std=sparse_attn_gate_init_std) + else: + nn.init.zeros_(W) + self.attn_gate_w = nn.Parameter(W) + + def _xsa_efficient(self, y, v): + B, T, H, D = y.shape + Hkv = v.size(-2) + group = H // Hkv + y_g = y.reshape(B, T, Hkv, group, D) + vn = F.normalize(v, dim=-1).unsqueeze(-2) + proj = (y_g * vn).sum(dim=-1, keepdim=True) * vn + return (y_g - proj).reshape(B, T, H, D) + + def forward(self, x, q_w, k_w, v_w, out_w, cu_seqlens=None, max_seqlen=0): + bsz, seqlen, dim = x.shape + # q_raw kept around as a tap point for attn_out_gate_src='q' (post-projection, + # pre-reshape, pre-RoPE). + q_raw = F.linear(x, q_w.to(x.dtype)) + q = q_raw.reshape(bsz, seqlen, self.num_heads, self.head_dim) + k = F.linear(x, k_w.to(x.dtype)).reshape(bsz, seqlen, self.num_kv_heads, self.head_dim) + v = F.linear(x, v_w.to(x.dtype)).reshape(bsz, seqlen, self.num_kv_heads, self.head_dim) + q = F.rms_norm(q, (q.size(-1),)) + k = F.rms_norm(k, (k.size(-1),)) + cos, sin = self.rotary(seqlen, x.device, q.dtype) + q = apply_rotary_emb(q, cos, sin, self.rope_dims) + k = apply_rotary_emb(k, cos, sin, self.rope_dims) + q = q * self.q_gain.to(dtype=q.dtype)[None, None, :, None] + if cu_seqlens is not None: + y = flash_attn_varlen_func( + q[0], + k[0], + v[0], + cu_seqlens_q=cu_seqlens, + cu_seqlens_k=cu_seqlens, + max_seqlen_q=max_seqlen, + max_seqlen_k=max_seqlen, + causal=True, + window_size=(-1, -1), + )[None] + else: + y = flash_attn_3_func(q, k, v, causal=True) + if self.use_xsa: + y = self._xsa_efficient(y, v) + # AttnOutGate inlined (PR #1667). Inline + .contiguous() barrier so torch.compile + # fullgraph=True is happy (this avoids the @torch.compiler.disable trap that + # crashed gates v3). Per-head gate on (B,T,H,D) tensor: g shape [B,T,H], broadcast + # over D via [..., None]. zero-init weight -> 2*sigmoid(0)=1 -> transparent. + if self.attn_out_gate: + gate_src = q_raw if self.attn_out_gate_src == "q" else x + gate_in = gate_src[..., : self.gate_window].contiguous() + g = 2.0 * torch.sigmoid(self.attn_gate_proj(gate_in)) + y = y * g[..., None] + # Gated Attention (arXiv:2505.06708 G1). Inline + .contiguous() barrier so + # torch.compile fullgraph=True is happy. Per-head gate on (B,T,H,D): g shape + # [B,T,H], broadcast over D via [..., None]. Paper: g = sigmoid(x @ W_g.T) + # where W_g: (H, dim). .to(x.dtype) on fp32 param before broadcast with bf16. + if self.gated_attn: + x_c = x.contiguous() + g = torch.sigmoid(F.linear(x_c, self.attn_gate_w.to(x.dtype))) + y = y * g[..., None] + # Sparse head-output gate: narrower (gate_window) input, same shape g as GatedAttn. + if self.sparse_attn_gate: + gate_in = x[..., : self.gate_window].contiguous() + g = torch.sigmoid( + self.sparse_attn_gate_scale + * F.linear(gate_in, self.attn_gate_w.to(x.dtype)) + ) + y = y * g[..., None] + y = y.reshape(bsz, seqlen, dim) + self._last_proj_input = y.detach() if getattr(self, "_calib", False) else None + return F.linear(y, out_w.to(x.dtype)) + + +class MLP(nn.Module): + def __init__(self, dim, mlp_mult): + super().__init__() + self.use_fused = True + + def forward(self, x, up_w, down_w): + if self.training and self.use_fused: + return FusedLeakyReLUSquareMLP(x, up_w.to(x.dtype), down_w.to(x.dtype)) + hidden = F.leaky_relu(F.linear(x, up_w.to(x.dtype)), negative_slope=0.3).square() + self._last_down_input = hidden.detach() if getattr(self, "_calib", False) else None + return F.linear(hidden, down_w.to(x.dtype)) + + +class Block(nn.Module): + def __init__( + self, + dim, + num_heads, + num_kv_heads, + mlp_mult, + rope_base, + qk_gain_init, + train_seq_len, + layer_idx=0, + ln_scale=False, + yarn=True, + attn_out_gate=False, + attn_out_gate_src="proj", + gate_window=12, + gated_attn=False, + gated_attn_init_std=0.01, + sparse_attn_gate=False, + sparse_attn_gate_init_std=0.0, + sparse_attn_gate_scale=1.0, + ): + super().__init__() + self.attn_norm = RMSNorm() + self.mlp_norm = RMSNorm() + self.attn = CausalSelfAttention( + dim, num_heads, num_kv_heads, rope_base, qk_gain_init, train_seq_len, yarn=yarn, + attn_out_gate=attn_out_gate, attn_out_gate_src=attn_out_gate_src, gate_window=gate_window, + gated_attn=gated_attn, gated_attn_init_std=gated_attn_init_std, + sparse_attn_gate=sparse_attn_gate, + sparse_attn_gate_init_std=sparse_attn_gate_init_std, + sparse_attn_gate_scale=sparse_attn_gate_scale, + ) + self.mlp = MLP(dim, mlp_mult) + self.attn_scale = nn.Parameter(torch.ones(dim, dtype=torch.float32)) + self.mlp_scale = nn.Parameter(torch.ones(dim, dtype=torch.float32)) + self.resid_mix = nn.Parameter( + torch.stack((torch.ones(dim), torch.zeros(dim))).float() + ) + self.ln_scale_factor = 1.0 / math.sqrt(layer_idx + 1) if ln_scale else 1.0 + + def forward(self, x, x0, q_w, k_w, v_w, out_w, up_w, down_w, cu_seqlens=None, max_seqlen=0): + mix = self.resid_mix.to(dtype=x.dtype) + x_in = mix[0][None, None, :] * x + mix[1][None, None, :] * x0 + attn_out = self.attn( + self.attn_norm(x_in) * self.ln_scale_factor, + q_w, k_w, v_w, out_w, + cu_seqlens=cu_seqlens, + max_seqlen=max_seqlen, + ) + x_out = x_in + self.attn_scale.to(dtype=x_in.dtype)[None, None, :] * attn_out + x_out = x_out + self.mlp_scale.to(dtype=x_out.dtype)[ + None, None, : + ] * self.mlp(self.mlp_norm(x_out) * self.ln_scale_factor, up_w, down_w) + return x_out + +class GPT(nn.Module): + def __init__(self, h): + super().__init__() + if h.logit_softcap <= 0.0: + raise ValueError(f"logit_softcap must be positive, got {h.logit_softcap}") + self.tie_embeddings = h.tie_embeddings + self.tied_embed_init_std = h.tied_embed_init_std + self.logit_softcap = h.logit_softcap + self.fused_ce_enabled = bool(h.fused_ce_enabled) + self.tok_emb = nn.Embedding(h.vocab_size, h.model_dim) + self.num_layers = h.num_layers + head_dim = h.model_dim // h.num_heads + kv_dim = h.num_kv_heads * head_dim + hidden_dim = int(h.mlp_mult * h.model_dim) + self.qo_bank = nn.Parameter(torch.empty(2 * h.num_layers, h.model_dim, h.model_dim)) + self.kv_bank = nn.Parameter(torch.empty(2 * h.num_layers, kv_dim, h.model_dim)) + self.mlp_up_bank = nn.Parameter(torch.empty(h.num_layers, hidden_dim, h.model_dim)) + self.mlp_down_bank = nn.Parameter(torch.empty(h.num_layers, h.model_dim, hidden_dim)) + self.num_encoder_layers = h.num_layers // 2 + self.num_decoder_layers = h.num_layers - self.num_encoder_layers + self.blocks = nn.ModuleList( + [ + Block( + h.model_dim, + h.num_heads, + h.num_kv_heads, + h.mlp_mult, + h.rope_base, + h.qk_gain_init, + h.train_seq_len, + layer_idx=i, + ln_scale=h.ln_scale, + yarn=h.rope_yarn, + attn_out_gate=h.attn_out_gate_enabled, + attn_out_gate_src=h.attn_out_gate_src, + gate_window=h.gate_window, + gated_attn=h.gated_attn_enabled, + gated_attn_init_std=h.gated_attn_init_std, + sparse_attn_gate=h.sparse_attn_gate_enabled, + sparse_attn_gate_init_std=h.sparse_attn_gate_init_std, + sparse_attn_gate_scale=h.sparse_attn_gate_scale, + ) + for i in range(h.num_layers) + ] + ) + if h.rope_dims > 0: + head_dim = h.model_dim // h.num_heads + for block in self.blocks: + block.attn.rope_dims = h.rope_dims + block.attn.rotary = Rotary( + head_dim, + base=h.rope_base, + train_seq_len=h.train_seq_len, + rope_dims=h.rope_dims, + yarn=h.rope_yarn, + ) + self.final_norm = RMSNorm() + self.lm_head = ( + None + if h.tie_embeddings + else CastedLinear(h.model_dim, h.vocab_size, bias=False) + ) + if self.lm_head is not None: + self.lm_head._zero_init = True + if h.xsa_last_n > 0: + for i in range(max(0, h.num_layers - h.xsa_last_n), h.num_layers): + self.blocks[i].attn.use_xsa = True + self.looping_active = False + if h.num_loops > 0: + loop_seg = list(range(h.loop_start, h.loop_end + 1)) + all_indices = list(range(h.loop_start)) + for _ in range(h.num_loops + 1): + all_indices.extend(loop_seg) + all_indices.extend(range(h.loop_end + 1, h.num_layers)) + num_enc = len(all_indices) // 2 + self.encoder_indices = all_indices[:num_enc] + self.decoder_indices = all_indices[num_enc:] + else: + self.encoder_indices = list(range(self.num_encoder_layers)) + self.decoder_indices = list(range(self.num_encoder_layers, h.num_layers)) + self.num_skip_weights = min( + len(self.encoder_indices), len(self.decoder_indices) + ) + self.skip_weights = nn.Parameter( + torch.ones(self.num_skip_weights, h.model_dim, dtype=torch.float32) + ) + self.skip_gates = ( + nn.Parameter( + torch.zeros(self.num_skip_weights, h.model_dim, dtype=torch.float32) + ) + if h.skip_gates_enabled + else None + ) + self.parallel_start_layer = h.parallel_start_layer + self.parallel_final_lane = h.parallel_final_lane.lower() + self.parallel_post_lambdas = nn.Parameter( + torch.ones(h.num_layers, 2, 2, dtype=torch.float32) + ) + self.parallel_resid_lambdas = nn.Parameter( + torch.full((h.num_layers, 2), 1.1, dtype=torch.float32) + ) + # SmearGate (PR #1667 / modded-nanogpt @classiclarryd): + # x_t <- x_t + lam * sigmoid(W * x_t[:gate_window]) * x_{t-1}. + # Per-token forward-1 smear of the embedding lane. W zero-init + lam=0 -> + # transparent at init. Uses CastedLinear so restore_fp32_params handles dtype. + self.smear_gate_enabled = h.smear_gate_enabled + if self.smear_gate_enabled: + self.smear_window = h.gate_window + self.smear_gate = CastedLinear(self.smear_window, 1, bias=False) + self.smear_gate._zero_init = True + self.smear_lambda = nn.Parameter(torch.zeros(1, dtype=torch.float32)) + # V19: Asymmetric Logit Rescale (PR #1923 jorge-asenjo). + # Two learnable softcap scales applied on the EVAL path (forward_logits + + # forward_ttt). Init to logit_softcap so the layer is identity at step 0. + # Train path keeps the single fused softcap to preserve PR #1855 numerics. + self.asym_logit_enabled = bool(int(os.environ.get("ASYM_LOGIT_RESCALE", "0"))) + if self.asym_logit_enabled: + self.softcap_pos = nn.Parameter(torch.tensor(float(h.logit_softcap), dtype=torch.float32)) + self.softcap_neg = nn.Parameter(torch.tensor(float(h.logit_softcap), dtype=torch.float32)) + # v5 Stage 2C: temperature scaling on logits before softcap (eval-only TTT path). + self.temperature_scale = float(getattr(h, "temperature_scale", 1.0)) + self._init_weights() + + def _init_weights(self): + if self.tie_embeddings: + nn.init.normal_(self.tok_emb.weight, mean=0.0, std=self.tied_embed_init_std) + n = self.num_layers + proj_scale = 1.0 / math.sqrt(2 * n) + for i in range(n): + nn.init.orthogonal_(self.qo_bank.data[i], gain=1.0) + nn.init.zeros_(self.qo_bank.data[n + i]) + self.qo_bank.data[n + i].mul_(proj_scale) + nn.init.orthogonal_(self.kv_bank.data[i], gain=1.0) + nn.init.orthogonal_(self.kv_bank.data[n + i], gain=1.0) + for i in range(n): + nn.init.orthogonal_(self.mlp_up_bank.data[i], gain=1.0) + nn.init.zeros_(self.mlp_down_bank.data[i]) + self.mlp_down_bank.data[i].mul_(proj_scale) + for name, module in self.named_modules(): + if isinstance(module, nn.Linear): + if getattr(module, "_zero_init", False): + nn.init.zeros_(module.weight) + elif ( + module.weight.ndim == 2 + and module.weight.shape[0] >= 64 + and module.weight.shape[1] >= 64 + ): + nn.init.orthogonal_(module.weight, gain=1.0) + + def _bank_weights(self, i): + n = self.num_layers + return ( + self.qo_bank[i], + self.kv_bank[i], + self.kv_bank[n + i], + self.qo_bank[n + i], + self.mlp_up_bank[i], + self.mlp_down_bank[i], + ) + + def _parallel_block( + self, block_idx, lane0, lane1, x0, + q_w, k_w, v_w, out_w, up_w, down_w, + cu_seqlens=None, max_seqlen=0, + ): + block = self.blocks[block_idx] + mix = block.resid_mix.to(dtype=lane0.dtype) + attn_read = mix[0][None, None, :] * lane0 + mix[1][None, None, :] * x0 + attn_out = block.attn( + block.attn_norm(attn_read) * block.ln_scale_factor, + q_w, k_w, v_w, out_w, + cu_seqlens=cu_seqlens, max_seqlen=max_seqlen, + ) + attn_out = block.attn_scale.to(dtype=attn_out.dtype)[None, None, :] * attn_out + mlp_read = lane1 + mlp_out = block.mlp_scale.to(dtype=lane1.dtype)[None, None, :] * block.mlp( + block.mlp_norm(mlp_read) * block.ln_scale_factor, up_w, down_w + ) + attn_resid = self.parallel_resid_lambdas[block_idx, 0].to(dtype=lane0.dtype) + attn_post = self.parallel_post_lambdas[block_idx, 0].to(dtype=lane0.dtype) + mlp_resid = self.parallel_resid_lambdas[block_idx, 1].to(dtype=lane0.dtype) + mlp_post = self.parallel_post_lambdas[block_idx, 1].to(dtype=lane0.dtype) + lane0 = attn_resid * lane0 + attn_post[0] * attn_out + mlp_post[0] * mlp_out + lane1 = mlp_resid * lane1 + attn_post[1] * attn_out + mlp_post[1] * mlp_out + return lane0, lane1 + + def _final_parallel_hidden(self, lane0, lane1): + if self.parallel_final_lane == "mlp": + return lane1 + if self.parallel_final_lane == "attn": + return lane0 + return 0.5 * (lane0 + lane1) + + def _forward_hidden(self, input_ids, cu_seqlens=None, max_seqlen=0): + """Run the encoder/decoder stack to the final RMSNorm; returns pre-projection hidden. + Shared by eval (softcap+projection via forward_logits) and train (fused CE path).""" + x = self.tok_emb(input_ids) + # SmearGate (PR #1667). lam=0 + W=0 -> identity at init. + # Cross-doc leak fix: zero the prev-token smear at any position whose current token + # is BOS, so the BOS embedding starting doc N+1 in a packed stream is not + # contaminated by doc N's last token (audited issue on PR#1797 base). + if self.smear_gate_enabled: + sl = self.smear_lambda.to(dtype=x.dtype) + gate_in = x[:, 1:, : self.smear_window].contiguous() + g = sl * torch.sigmoid(self.smear_gate(gate_in)) + not_bos = (input_ids[:, 1:] != BOS_ID).to(x.dtype).unsqueeze(-1) + x = torch.cat([x[:, :1], x[:, 1:] + g * x[:, :-1] * not_bos], dim=1) + x = F.rms_norm(x, (x.size(-1),)) + x0 = x + skips = [] + enc_iter = ( + self.encoder_indices + if self.looping_active + else range(self.num_encoder_layers) + ) + dec_iter = ( + self.decoder_indices + if self.looping_active + else range( + self.num_encoder_layers, + self.num_encoder_layers + self.num_decoder_layers, + ) + ) + for i in enc_iter: + q_w, k_w, v_w, out_w, up_w, down_w = self._bank_weights(i) + x = self.blocks[i](x, x0, q_w, k_w, v_w, out_w, up_w, down_w, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen) + skips.append(x) + psl = self.parallel_start_layer + lane0 = None + lane1 = None + for skip_idx, i in enumerate(dec_iter): + q_w, k_w, v_w, out_w, up_w, down_w = self._bank_weights(i) + if i >= psl and psl > 0: + if lane0 is None: + lane0 = x + lane1 = x + if skip_idx < self.num_skip_weights and skips: + skip = skips.pop() + w = self.skip_weights[skip_idx].to(dtype=lane0.dtype)[None, None, :] + if self.skip_gates is not None: + g = torch.sigmoid(self.skip_gates[skip_idx].to(dtype=lane0.dtype))[None, None, :] + lane0 = torch.lerp(w * skip, lane0, g) + else: + lane0 = lane0 + w * skip + lane0, lane1 = self._parallel_block( + i, lane0, lane1, x0, q_w, k_w, v_w, out_w, up_w, down_w, + cu_seqlens=cu_seqlens, max_seqlen=max_seqlen, + ) + else: + if skip_idx < self.num_skip_weights and skips: + scaled_skip = ( + self.skip_weights[skip_idx].to(dtype=x.dtype)[None, None, :] + * skips.pop() + ) + if self.skip_gates is not None: + g = torch.sigmoid(self.skip_gates[skip_idx].to(dtype=x.dtype))[None, None, :] + x = torch.lerp(scaled_skip, x, g) + else: + x = x + scaled_skip + x = self.blocks[i](x, x0, q_w, k_w, v_w, out_w, up_w, down_w, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen) + if lane0 is not None: + x = self._final_parallel_hidden(lane0, lane1) + x = self.final_norm(x) + return x + + def _project_logits(self, hidden): + if self.tie_embeddings: + return F.linear(hidden, self.tok_emb.weight) + return self.lm_head(hidden) + + def _apply_asym_softcap(self, logits): + # V19: Asymmetric softcap (PR #1923). Splits the logit_softcap scalar into + # learnable positive/negative branches. Score-first preserved: still a + # bounded, normalized post-projection nonlinearity feeding a standard + # softmax over the full vocab. + sp = self.softcap_pos.to(logits.dtype) + sn = self.softcap_neg.to(logits.dtype) + return torch.where(logits > 0, sp * torch.tanh(logits / sp), sn * torch.tanh(logits / sn)) + + def forward_logits(self, input_ids, cu_seqlens=None, max_seqlen=0): + hidden = self._forward_hidden(input_ids, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen) + logits_proj = self._project_logits(hidden) + if self.asym_logit_enabled: + return self._apply_asym_softcap(logits_proj) + return self.logit_softcap * torch.tanh(logits_proj / self.logit_softcap) + + def forward(self, input_ids, target_ids, cu_seqlens=None, max_seqlen=0): + hidden = self._forward_hidden(input_ids, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen) + logits_proj = self._project_logits(hidden) + flat_targets = target_ids.reshape(-1) + # Fused softcapped-CE kernel (training path only). Applies softcap inside the + # Triton kernel; takes pre-softcap logits_proj. Non-fused path matches stock + # PR-1736 numerics exactly (softcap in fp32, then F.cross_entropy on fp32). + if self.fused_ce_enabled: + return softcapped_cross_entropy( + logits_proj.reshape(-1, logits_proj.size(-1)), + flat_targets, + self.logit_softcap, + reduction="mean", + ) + logits = self.logit_softcap * torch.tanh(logits_proj / self.logit_softcap) + return F.cross_entropy( + logits.reshape(-1, logits.size(-1)).float(), + flat_targets, + reduction="mean", + ) + + def forward_ttt(self, input_ids, target_ids, lora, hint_ids=None): + x = self.tok_emb(input_ids) + # SmearGate on the TTT path — same inline compute as forward_logits. + # Cross-doc leak fix: see _forward_hidden comment. + if self.smear_gate_enabled: + sl = self.smear_lambda.to(dtype=x.dtype) + gate_in = x[:, 1:, : self.smear_window].contiguous() + g = sl * torch.sigmoid(self.smear_gate(gate_in)) + not_bos = (input_ids[:, 1:] != BOS_ID).to(x.dtype).unsqueeze(-1) + x = torch.cat([x[:, :1], x[:, 1:] + g * x[:, :-1] * not_bos], dim=1) + x = F.rms_norm(x, (x.size(-1),)) + x0 = x + skips = [] + enc_iter = ( + self.encoder_indices + if self.looping_active + else list(range(self.num_encoder_layers)) + ) + dec_iter = ( + self.decoder_indices + if self.looping_active + else list( + range( + self.num_encoder_layers, + self.num_encoder_layers + self.num_decoder_layers, + ) + ) + ) + slot = 0 + for i in enc_iter: + q_w, k_w, v_w, out_w, up_w, down_w = self._bank_weights(i) + x = self._block_with_lora(self.blocks[i], x, x0, lora, slot, q_w, k_w, v_w, out_w, up_w, down_w) + slot += 1 + skips.append(x) + psl = self.parallel_start_layer + lane0 = None + lane1 = None + for skip_idx, i in enumerate(dec_iter): + q_w, k_w, v_w, out_w, up_w, down_w = self._bank_weights(i) + if i >= psl and psl > 0: + if lane0 is None: + lane0 = x + lane1 = x + if skip_idx < self.num_skip_weights and skips: + skip = skips.pop() + w = self.skip_weights[skip_idx].to(dtype=lane0.dtype)[None, None, :] + if self.skip_gates is not None: + g = torch.sigmoid(self.skip_gates[skip_idx].to(dtype=lane0.dtype))[None, None, :] + lane0 = torch.lerp(w * skip, lane0, g) + else: + lane0 = lane0 + w * skip + lane0, lane1 = self._parallel_block_with_lora( + i, lane0, lane1, x0, lora, slot, + q_w, k_w, v_w, out_w, up_w, down_w, + ) + else: + if skip_idx < self.num_skip_weights and skips: + scaled_skip = ( + self.skip_weights[skip_idx].to(dtype=x.dtype)[None, None, :] + * skips.pop() + ) + if self.skip_gates is not None: + g = torch.sigmoid(self.skip_gates[skip_idx].to(dtype=x.dtype))[None, None, :] + x = torch.lerp(scaled_skip, x, g) + else: + x = x + scaled_skip + x = self._block_with_lora(self.blocks[i], x, x0, lora, slot, q_w, k_w, v_w, out_w, up_w, down_w) + slot += 1 + if lane0 is not None: + x = self._final_parallel_hidden(lane0, lane1) + x = self.final_norm(x) + if self.tie_embeddings: + logits = F.linear(x, self.tok_emb.weight) + else: + logits = self.lm_head(x) + logits = logits + lora.lm_head_lora(x) + # v5 Stage 2C: temperature scaling. T=1.0 (default) -> no-op. + # Applied BEFORE softcap so cap acts on calibrated logits. + if getattr(self, "temperature_scale", 1.0) != 1.0: + logits = logits / self.temperature_scale + # V19: same asymmetric softcap on the TTT eval path. + if self.asym_logit_enabled: + logits = self._apply_asym_softcap(logits) + else: + logits = self.logit_softcap * torch.tanh(logits / self.logit_softcap) + bsz, sl, V = logits.shape + if hint_ids is None: + return F.cross_entropy( + logits.float().reshape(-1, V), target_ids.reshape(-1), reduction="none" + ).reshape(bsz, sl) + # PR #1145 tilt branch (v4): Triton fused kernel for eval scoring (no_grad). + # TTT learning path needs autograd, so fall back to vanilla F.log_softmax + # when logits require grad. Triton kernel is forward-only (no backward). + if logits.requires_grad: + ls = F.log_softmax(logits.float(), dim=-1) + log_p_y = ls.gather(-1, target_ids.unsqueeze(-1)).squeeze(-1) + log_q_h = ls.gather(-1, hint_ids.clamp(min=0).unsqueeze(-1)).squeeze(-1) + return -log_p_y, log_q_h + log_p_y, log_q_h = fused_log_softmax_dual_gather( + logits, target_ids, hint_ids.clamp(min=0) + ) + return -log_p_y, log_q_h + + def _block_with_lora(self, block, x, x0, lora, slot, q_w, k_w, v_w, out_w, up_w, down_w): + mix = block.resid_mix.to(dtype=x.dtype) + x_in = mix[0][None, None, :] * x + mix[1][None, None, :] * x0 + n = block.attn_norm(x_in) * block.ln_scale_factor + attn = block.attn + bsz, seqlen, dim = n.shape + # Keep raw Q for AttnOutGate src='q' (matches forward path semantics). + q_raw = F.linear(n, q_w.to(n.dtype)) + lora.q_loras[slot](n) + q = q_raw.reshape(bsz, seqlen, attn.num_heads, attn.head_dim) + k = F.linear(n, k_w.to(n.dtype)) + if lora.k_loras is not None: + k = k + lora.k_loras[slot](n) + k = k.reshape(bsz, seqlen, attn.num_kv_heads, attn.head_dim) + v = (F.linear(n, v_w.to(n.dtype)) + lora.v_loras[slot](n)).reshape( + bsz, seqlen, attn.num_kv_heads, attn.head_dim + ) + q = F.rms_norm(q, (q.size(-1),)) + k = F.rms_norm(k, (k.size(-1),)) + cos, sin = attn.rotary(seqlen, n.device, q.dtype) + q = apply_rotary_emb(q, cos, sin, attn.rope_dims) + k = apply_rotary_emb(k, cos, sin, attn.rope_dims) + q = q * attn.q_gain.to(dtype=q.dtype)[None, None, :, None] + y = flash_attn_3_func(q, k, v, causal=True) + if attn.use_xsa: + y = attn._xsa_efficient(y, v) + # AttnOutGate (TTT path) — inline + .contiguous() barrier, same as the eval path. + if attn.attn_out_gate: + gate_src = q_raw if attn.attn_out_gate_src == "q" else n + gate_in = gate_src[..., : attn.gate_window].contiguous() + g = 2.0 * torch.sigmoid(attn.attn_gate_proj(gate_in)) + y = y * g[..., None] + # Gated Attention (TTT path). Gate input is n (post-norm block input), same + # as eval path. .to(n.dtype) on fp32 param before bf16 broadcast. + if attn.gated_attn: + n_c = n.contiguous() + g = torch.sigmoid(F.linear(n_c, attn.attn_gate_w.to(n.dtype))) + y = y * g[..., None] + # Sparse attention head-output gate (TTT path) — must match the eval path in + # forward() exactly, else training (which applied the gate) and TTT eval (which + # skipped it) produce mismatched representations and catastrophic BPB regression. + if attn.sparse_attn_gate: + gate_in = n[..., : attn.gate_window].contiguous() + g = torch.sigmoid( + attn.sparse_attn_gate_scale + * F.linear(gate_in, attn.attn_gate_w.to(n.dtype)) + ) + y = y * g[..., None] + y = y.reshape(bsz, seqlen, dim) + attn_out = F.linear(y, out_w.to(n.dtype)) + if lora.o_loras is not None: + attn_out = attn_out + lora.o_loras[slot](n) + x_out = x_in + block.attn_scale.to(dtype=x_in.dtype)[None, None, :] * attn_out + mlp_n = block.mlp_norm(x_out) * block.ln_scale_factor + mlp_out = block.mlp(mlp_n, up_w, down_w) + if lora.mlp_loras is not None: + mlp_out = mlp_out + lora.mlp_loras[slot](mlp_n) + x_out = x_out + block.mlp_scale.to(dtype=x_out.dtype)[None, None, :] * mlp_out + return x_out + + def _parallel_block_with_lora( + self, block_idx, lane0, lane1, x0, lora, slot, + q_w, k_w, v_w, out_w, up_w, down_w, + ): + block = self.blocks[block_idx] + mix = block.resid_mix.to(dtype=lane0.dtype) + attn_read = mix[0][None, None, :] * lane0 + mix[1][None, None, :] * x0 + n = block.attn_norm(attn_read) * block.ln_scale_factor + attn = block.attn + bsz, seqlen, dim = n.shape + q_raw = F.linear(n, q_w.to(n.dtype)) + lora.q_loras[slot](n) + q = q_raw.reshape(bsz, seqlen, attn.num_heads, attn.head_dim) + k = F.linear(n, k_w.to(n.dtype)) + if lora.k_loras is not None: + k = k + lora.k_loras[slot](n) + k = k.reshape(bsz, seqlen, attn.num_kv_heads, attn.head_dim) + v = (F.linear(n, v_w.to(n.dtype)) + lora.v_loras[slot](n)).reshape( + bsz, seqlen, attn.num_kv_heads, attn.head_dim + ) + q = F.rms_norm(q, (q.size(-1),)) + k = F.rms_norm(k, (k.size(-1),)) + cos, sin = attn.rotary(seqlen, n.device, q.dtype) + q = apply_rotary_emb(q, cos, sin, attn.rope_dims) + k = apply_rotary_emb(k, cos, sin, attn.rope_dims) + q = q * attn.q_gain.to(dtype=q.dtype)[None, None, :, None] + y = flash_attn_3_func(q, k, v, causal=True) + if attn.use_xsa: + y = attn._xsa_efficient(y, v) + # AttnOutGate (TTT parallel path) — inline + .contiguous() barrier. + if attn.attn_out_gate: + gate_src = q_raw if attn.attn_out_gate_src == "q" else n + gate_in = gate_src[..., : attn.gate_window].contiguous() + g = 2.0 * torch.sigmoid(attn.attn_gate_proj(gate_in)) + y = y * g[..., None] + # Gated Attention (TTT parallel path). Gate input is n (post-norm block input). + if attn.gated_attn: + n_c = n.contiguous() + g = torch.sigmoid(F.linear(n_c, attn.attn_gate_w.to(n.dtype))) + y = y * g[..., None] + # Sparse attention head-output gate (TTT parallel path) — must match the + # eval path in forward() to keep train/eval semantics in sync. + if attn.sparse_attn_gate: + gate_in = n[..., : attn.gate_window].contiguous() + g = torch.sigmoid( + attn.sparse_attn_gate_scale + * F.linear(gate_in, attn.attn_gate_w.to(n.dtype)) + ) + y = y * g[..., None] + y = y.reshape(bsz, seqlen, dim) + attn_out = F.linear(y, out_w.to(n.dtype)) + if lora.o_loras is not None: + attn_out = attn_out + lora.o_loras[slot](n) + attn_out = block.attn_scale.to(dtype=attn_out.dtype)[None, None, :] * attn_out + mlp_read = lane1 + mlp_n = block.mlp_norm(mlp_read) * block.ln_scale_factor + mlp_out = block.mlp(mlp_n, up_w, down_w) + if lora.mlp_loras is not None: + mlp_out = mlp_out + lora.mlp_loras[slot](mlp_n) + mlp_out = block.mlp_scale.to(dtype=lane1.dtype)[None, None, :] * mlp_out + attn_resid = self.parallel_resid_lambdas[block_idx, 0].to(dtype=lane0.dtype) + attn_post = self.parallel_post_lambdas[block_idx, 0].to(dtype=lane0.dtype) + mlp_resid = self.parallel_resid_lambdas[block_idx, 1].to(dtype=lane0.dtype) + mlp_post = self.parallel_post_lambdas[block_idx, 1].to(dtype=lane0.dtype) + lane0 = attn_resid * lane0 + attn_post[0] * attn_out + mlp_post[0] * mlp_out + lane1 = mlp_resid * lane1 + attn_post[1] * attn_out + mlp_post[1] * mlp_out + return lane0, lane1 + + +class BatchedLinearLoRA(nn.Module): + # PR-1767: rank-scaled output (alpha/rank), like standard LoRA. Decouples + # effective magnitude from rank so changing rank does not change LR scale. + _ALPHA = float(os.environ.get("TTT_LORA_ALPHA", "144")) + # PR-1767: optionally keep A warm across per-doc resets (only B is zeroed). + # Accumulates useful feature directions across documents within a TTT phase. + _WARM_START_A = bool(int(os.environ.get("TTT_WARM_START_A", "1"))) + + def __init__(self, bsz, in_features, out_features, rank): + super().__init__() + self._bound = 1.0 / math.sqrt(in_features) + self._scale = self._ALPHA / rank + self.A = nn.Parameter( + torch.empty(bsz, rank, in_features).uniform_(-self._bound, self._bound) + ) + self.B = nn.Parameter(torch.zeros(bsz, out_features, rank)) + + def reset(self): + with torch.no_grad(): + if not self._WARM_START_A: + self.A.uniform_(-self._bound, self._bound) + self.B.zero_() + + def forward(self, x): + return ((x @ self.A.transpose(1, 2)) @ self.B.transpose(1, 2)) * self._scale + + +class BatchedTTTLoRA(nn.Module): + def __init__(self, bsz, model, rank, k_lora=True, mlp_lora=True, o_lora=True): + super().__init__() + self.bsz = bsz + dim = model.qo_bank.shape[-1] + vocab = model.tok_emb.num_embeddings + if getattr(model, "looping_active", False): + num_slots = len(model.encoder_indices) + len(model.decoder_indices) + else: + num_slots = len(model.blocks) + kv_dim = model.blocks[0].attn.num_kv_heads * ( + dim // model.blocks[0].attn.num_heads + ) + embed_dim = model.tok_emb.embedding_dim + self.lm_head_lora = BatchedLinearLoRA(bsz, embed_dim, vocab, rank) + self.q_loras = nn.ModuleList( + [BatchedLinearLoRA(bsz, dim, dim, rank) for _ in range(num_slots)] + ) + self.v_loras = nn.ModuleList( + [BatchedLinearLoRA(bsz, dim, kv_dim, rank) for _ in range(num_slots)] + ) + self.k_loras = ( + nn.ModuleList( + [BatchedLinearLoRA(bsz, dim, kv_dim, rank) for _ in range(num_slots)] + ) + if k_lora + else None + ) + self.mlp_loras = ( + nn.ModuleList( + [BatchedLinearLoRA(bsz, dim, dim, rank) for _ in range(num_slots)] + ) + if mlp_lora + else None + ) + self.o_loras = ( + nn.ModuleList( + [BatchedLinearLoRA(bsz, dim, dim, rank) for _ in range(num_slots)] + ) + if o_lora + else None + ) + + def reset(self): + with torch.no_grad(): + self.lm_head_lora.reset() + for loras in [self.q_loras, self.v_loras, self.k_loras, + self.mlp_loras, self.o_loras]: + if loras is not None: + for lora in loras: + lora.reset() + + +# Polar Express per-iteration minimax Newton-Schulz coefficients (PR #1344). +# Replaces the fixed (3.4445, -4.775, 2.0315) coefficients of stock Muon. +# Applied at backend_steps=5 — taking more than 5 iterations from this list +# falls back to the final (converged) tuple via the slice guard below. +_PE_COEFFS = ( + (8.156554524902461, -22.48329292557795, 15.878769915207462), + (4.042929935166739, -2.808917465908714, 0.5000178451051316), + (3.8916678022926607, -2.772484153217685, 0.5060648178503393), + (3.285753657755655, -2.3681294933425376, 0.46449024233003106), + (2.3465413258596377, -1.7097828382687081, 0.42323551169305323), +) + + +@torch.compile +def zeropower_via_newtonschulz5(G, steps=10, eps=1e-07): + was_2d = G.ndim == 2 + if was_2d: + G = G.unsqueeze(0) + X = G.bfloat16() + transposed = X.size(-2) > X.size(-1) + if transposed: + X = X.mT + X = X / (X.norm(dim=(-2, -1), keepdim=True) + eps) + coeffs = _PE_COEFFS[:steps] if steps <= len(_PE_COEFFS) else _PE_COEFFS + for a, b, c in coeffs: + A = X @ X.mT + B = b * A + c * (A @ A) + X = a * X + B @ X + if transposed: + X = X.mT + if was_2d: + X = X.squeeze(0) + return X + + +class Muon(torch.optim.Optimizer): + def __init__( + self, + params, + lr, + momentum, + backend_steps, + nesterov=True, + weight_decay=0.0, + row_normalize=False, + ): + super().__init__( + params, + dict( + lr=lr, + momentum=momentum, + backend_steps=backend_steps, + nesterov=nesterov, + weight_decay=weight_decay, + row_normalize=row_normalize, + ), + ) + self._built = False + + def _build(self): + self._distributed = dist.is_available() and dist.is_initialized() + self._world_size = dist.get_world_size() if self._distributed else 1 + self._rank = dist.get_rank() if self._distributed else 0 + ws = self._world_size + self._bank_meta = [] + for group in self.param_groups: + for p in group["params"]: + B = p.shape[0] + padded_B = ((B + ws - 1) // ws) * ws + shard_B = padded_B // ws + tail = p.shape[1:] + dev = p.device + self._bank_meta.append({ + "p": p, + "B": B, + "padded_grad": torch.zeros(padded_B, *tail, device=dev, dtype=torch.bfloat16), + "shard": torch.zeros(shard_B, *tail, device=dev, dtype=torch.bfloat16), + "shard_mom": torch.zeros(shard_B, *tail, device=dev, dtype=torch.bfloat16), + "full_update": torch.zeros(padded_B, *tail, device=dev, dtype=torch.bfloat16), + "scale": max(1, p.shape[-2] / p.shape[-1]) ** 0.5, + }) + self._bank_meta.sort(key=lambda m: -m["p"].numel()) + self._built = True + + def launch_reduce_scatters(self): + if not self._built: + self._build() + if not self._distributed: + return + self._rs_futures = [] + for m in self._bank_meta: + p = m["p"] + if p.grad is None: + self._rs_futures.append(None) + continue + pg = m["padded_grad"] + pg[: m["B"]].copy_(p.grad) + fut = dist.reduce_scatter_tensor( + m["shard"], pg, op=dist.ReduceOp.AVG, async_op=True + ) + self._rs_futures.append(fut) + + @torch.no_grad() + def step(self, closure=None): + loss = None + if closure is not None: + with torch.enable_grad(): + loss = closure() + if not self._built: + self._build() + for group in self.param_groups: + lr = group["lr"] + momentum = group["momentum"] + backend_steps = group["backend_steps"] + nesterov = group["nesterov"] + wd = group.get("weight_decay", 0.0) + row_normalize = group.get("row_normalize", False) + prev_ag_handle = None + prev_m = None + sharded = self._distributed and hasattr(self, "_rs_futures") + for idx, m in enumerate(self._bank_meta): + p = m["p"] + if p.grad is None: + continue + if prev_ag_handle is not None: + prev_ag_handle.wait() + pp = prev_m["p"] + upd = prev_m["full_update"][: prev_m["B"]] + if wd > 0.0: + pp.data.mul_(1.0 - lr * wd) + pp.add_(upd, alpha=-lr * prev_m["scale"]) + if sharded and self._rs_futures[idx] is not None: + self._rs_futures[idx].wait() + g = m["shard"] + buf = m["shard_mom"] + else: + g = p.grad.bfloat16() + state = self.state[p] + if "momentum_buffer" not in state: + state["momentum_buffer"] = torch.zeros_like(g) + buf = state["momentum_buffer"] + buf.mul_(momentum).add_(g) + if nesterov: + update = g.add(buf, alpha=momentum) + else: + update = buf + if row_normalize: + rn = update.float().norm(dim=-1, keepdim=True).clamp_min(1e-07) + update = update / rn.to(update.dtype) + update = zeropower_via_newtonschulz5(update, steps=backend_steps) + if sharded: + prev_ag_handle = dist.all_gather_into_tensor( + m["full_update"], update, async_op=True + ) + prev_m = m + else: + if wd > 0.0: + p.data.mul_(1.0 - lr * wd) + p.add_(update, alpha=-lr * m["scale"]) + if prev_ag_handle is not None: + prev_ag_handle.wait() + pp = prev_m["p"] + upd = prev_m["full_update"][: prev_m["B"]] + if wd > 0.0: + pp.data.mul_(1.0 - lr * wd) + pp.add_(upd, alpha=-lr * prev_m["scale"]) + if hasattr(self, "_rs_futures"): + del self._rs_futures + return loss + + +CONTROL_TENSOR_NAME_PATTERNS = tuple( + pattern + for pattern in os.environ.get( + "CONTROL_TENSOR_NAME_PATTERNS", + "attn_scale,attn_scales,mlp_scale,mlp_scales,resid_mix,resid_mixes,q_gain,skip_weight,skip_weights,skip_gates,parallel_post_lambdas,parallel_resid_lambdas,attn_gate_proj,attn_gate_w,smear_gate,smear_lambda", + ).split(",") + if pattern +) + + +PACKED_REPLICATED_GRAD_MAX_NUMEL = 1 << 15 + + +class Optimizers: + def __init__(self, h, base_model): + matrix_params = [ + base_model.qo_bank, + base_model.kv_bank, + base_model.mlp_up_bank, + base_model.mlp_down_bank, + ] + block_named_params = list(base_model.blocks.named_parameters()) + scalar_params = [ + p + for (name, p) in block_named_params + if p.ndim < 2 + or any(pattern in name for pattern in CONTROL_TENSOR_NAME_PATTERNS) + ] + if base_model.skip_weights.numel() > 0: + scalar_params.append(base_model.skip_weights) + if base_model.skip_gates is not None and base_model.skip_gates.numel() > 0: + scalar_params.append(base_model.skip_gates) + if base_model.parallel_post_lambdas is not None: + scalar_params.append(base_model.parallel_post_lambdas) + if base_model.parallel_resid_lambdas is not None: + scalar_params.append(base_model.parallel_resid_lambdas) + # SmearGate params live on GPT root (not in .blocks), so add them by hand. + # Both are tiny (gate_window scalars + 1 lambda). Optimized via scalar Adam. + if getattr(base_model, "smear_gate_enabled", False): + scalar_params.append(base_model.smear_gate.weight) + scalar_params.append(base_model.smear_lambda) + token_lr = h.tied_embed_lr if h.tie_embeddings else h.embed_lr + tok_params = [ + {"params": [base_model.tok_emb.weight], "lr": token_lr, "base_lr": token_lr} + ] + self.optimizer_tok = torch.optim.AdamW( + tok_params, + betas=(h.beta1, h.beta2), + eps=h.adam_eps, + weight_decay=h.embed_wd, + fused=True, + ) + self.optimizer_muon = Muon( + matrix_params, + lr=h.matrix_lr, + momentum=h.muon_momentum, + backend_steps=h.muon_backend_steps, + weight_decay=h.muon_wd, + row_normalize=h.muon_row_normalize, + ) + for group in self.optimizer_muon.param_groups: + group["base_lr"] = h.matrix_lr + self.optimizer_scalar = torch.optim.AdamW( + [{"params": scalar_params, "lr": h.scalar_lr, "base_lr": h.scalar_lr}], + betas=(h.beta1, h.beta2), + eps=h.adam_eps, + weight_decay=h.adam_wd, + fused=True, + ) + self.optimizers = [ + self.optimizer_tok, + self.optimizer_muon, + self.optimizer_scalar, + ] + self.replicated_params = list(tok_params[0]["params"]) + self.replicated_params.extend(scalar_params) + self.replicated_large_params = [] + self.replicated_packed_params = [] + for p in self.replicated_params: + if p.numel() <= PACKED_REPLICATED_GRAD_MAX_NUMEL: + self.replicated_packed_params.append(p) + else: + self.replicated_large_params.append(p) + self._aux_stream = torch.cuda.Stream() + + def __iter__(self): + return iter(self.optimizers) + + def zero_grad_all(self): + for opt in self.optimizers: + opt.zero_grad(set_to_none=True) + + def _all_reduce_packed_grads(self): + grads_by_key = collections.defaultdict(list) + for p in self.replicated_packed_params: + if p.grad is not None: + grads_by_key[(p.grad.device, p.grad.dtype)].append(p.grad) + for grads in grads_by_key.values(): + flat = torch.empty( + sum(g.numel() for g in grads), + device=grads[0].device, + dtype=grads[0].dtype, + ) + offset = 0 + for g in grads: + n = g.numel() + flat[offset : offset + n].copy_(g.contiguous().view(-1)) + offset += n + dist.all_reduce(flat, op=dist.ReduceOp.AVG) + offset = 0 + for g in grads: + n = g.numel() + g.copy_(flat[offset : offset + n].view_as(g)) + offset += n + + def step(self, distributed=False): + self.optimizer_muon.launch_reduce_scatters() + if distributed: + reduce_handles = [ + dist.all_reduce(p.grad, op=dist.ReduceOp.AVG, async_op=True) + for p in self.replicated_large_params + if p.grad is not None + ] + self._all_reduce_packed_grads() + for handle in reduce_handles: + handle.wait() + self._aux_stream.wait_stream(torch.cuda.current_stream()) + with torch.cuda.stream(self._aux_stream): + self.optimizer_tok.step() + self.optimizer_scalar.step() + self.optimizer_muon.step() + torch.cuda.current_stream().wait_stream(self._aux_stream) + self.zero_grad_all() + + +def restore_fp32_params(model): + for module in model.modules(): + if isinstance(module, CastedLinear): + module.float() + for name, param in model.named_parameters(): + if ( + param.ndim < 2 + or any(pattern in name for pattern in CONTROL_TENSOR_NAME_PATTERNS) + ) and param.dtype != torch.float32: + param.data = param.data.float() + if hasattr(model, "qo_bank") and model.qo_bank is not None: + model.qo_bank.data = model.qo_bank.data.float() + model.kv_bank.data = model.kv_bank.data.float() + model.mlp_up_bank.data = model.mlp_up_bank.data.float() + model.mlp_down_bank.data = model.mlp_down_bank.data.float() + + +def collect_hessians(model, train_loader, h, device, n_calibration_batches=64): + hessians = {} + act_sumsq = {} + act_counts = {} + hooks = [] + for i, block in enumerate(model.blocks): + block.attn._calib = True + block.mlp._calib = True + block.mlp.use_fused = False + + def make_attn_hook(layer_idx): + def hook_fn(module, inp, out): + x = inp[0].detach().float() + if x.ndim == 3: + x = x.reshape(-1, x.shape[-1]) + x_sq = x.square().sum(dim=0) + x_count = x.shape[0] + for suffix in ["c_q", "c_k", "c_v"]: + name = f"blocks.{layer_idx}.attn.{suffix}.weight" + if name not in hessians: + hessians[name] = torch.zeros( + x.shape[1], x.shape[1], dtype=torch.float32, device=device + ) + hessians[name].addmm_(x.T, x) + if name not in act_sumsq: + act_sumsq[name] = torch.zeros( + x.shape[1], dtype=torch.float32, device=device + ) + act_counts[name] = 0 + act_sumsq[name] += x_sq + act_counts[name] += x_count + y = module._last_proj_input + if y is not None: + y = y.float() + if y.ndim == 3: + y = y.reshape(-1, y.shape[-1]) + name = f"blocks.{layer_idx}.attn.proj.weight" + if name not in hessians: + hessians[name] = torch.zeros( + y.shape[1], y.shape[1], dtype=torch.float32, device=device + ) + hessians[name].addmm_(y.T, y) + if name not in act_sumsq: + act_sumsq[name] = torch.zeros( + y.shape[1], dtype=torch.float32, device=device + ) + act_counts[name] = 0 + act_sumsq[name] += y.square().sum(dim=0) + act_counts[name] += y.shape[0] + return hook_fn + + def make_mlp_hook(layer_idx): + def hook_fn(module, inp, out): + x = inp[0].detach().float() + if x.ndim == 3: + x = x.reshape(-1, x.shape[-1]) + name = f"blocks.{layer_idx}.mlp.fc.weight" + if name not in hessians: + hessians[name] = torch.zeros( + x.shape[1], x.shape[1], dtype=torch.float32, device=device + ) + hessians[name].addmm_(x.T, x) + if name not in act_sumsq: + act_sumsq[name] = torch.zeros( + x.shape[1], dtype=torch.float32, device=device + ) + act_counts[name] = 0 + act_sumsq[name] += x.square().sum(dim=0) + act_counts[name] += x.shape[0] + h_act = module._last_down_input + if h_act is not None: + h_act = h_act.float() + if h_act.ndim == 3: + h_act = h_act.reshape(-1, h_act.shape[-1]) + name = f"blocks.{layer_idx}.mlp.proj.weight" + if name not in hessians: + hessians[name] = torch.zeros( + h_act.shape[1], h_act.shape[1], dtype=torch.float32, device=device + ) + hessians[name].addmm_(h_act.T, h_act) + if name not in act_sumsq: + act_sumsq[name] = torch.zeros( + h_act.shape[1], dtype=torch.float32, device=device + ) + act_counts[name] = 0 + act_sumsq[name] += h_act.square().sum(dim=0) + act_counts[name] += h_act.shape[0] + return hook_fn + + for i, block in enumerate(model.blocks): + hooks.append(block.attn.register_forward_hook(make_attn_hook(i))) + hooks.append(block.mlp.register_forward_hook(make_mlp_hook(i))) + + # Hessian hooks for embedding factorization projection layers + def make_linear_input_hook(weight_name): + def hook_fn(module, inp, out): + x = inp[0].detach().float() + if x.ndim == 3: + x = x.reshape(-1, x.shape[-1]) + if weight_name not in hessians: + hessians[weight_name] = torch.zeros( + x.shape[1], x.shape[1], dtype=torch.float32, device=device + ) + hessians[weight_name].addmm_(x.T, x) + return hook_fn + + if model.tie_embeddings: + hook_module = model.final_norm + + def make_output_hook(name): + def hook_fn(module, inp, out): + x = out.detach().float() + if x.ndim == 3: + x = x.reshape(-1, x.shape[-1]) + if name not in hessians: + hessians[name] = torch.zeros( + x.shape[1], x.shape[1], dtype=torch.float32, device=device + ) + hessians[name].addmm_(x.T, x) + if name not in act_sumsq: + act_sumsq[name] = torch.zeros( + x.shape[1], dtype=torch.float32, device=device + ) + act_counts[name] = 0 + act_sumsq[name] += x.square().sum(dim=0) + act_counts[name] += x.shape[0] + return hook_fn + + hooks.append( + hook_module.register_forward_hook(make_output_hook("tok_emb.weight")) + ) + model.eval() + with torch.no_grad(): + for _ in range(n_calibration_batches): + x, _ = train_loader.next_batch(h.train_batch_tokens, h.grad_accum_steps) + model.forward_logits(x) + for hook in hooks: + hook.remove() + for i, block in enumerate(model.blocks): + block.attn._calib = False + block.mlp._calib = False + block.mlp.use_fused = True + for name in hessians: + hessians[name] = hessians[name].cpu() / n_calibration_batches + act_stats = {} + for name, sumsq in act_sumsq.items(): + count = max(act_counts.get(name, 0), 1) + act_stats[name] = (sumsq / count).sqrt().cpu() + return hessians, act_stats + + +def gptq_quantize_weight( + w, + H, + clip_sigmas=3.0, + clip_range=63, + block_size=128, + protect_groups=None, + group_size=None, + protect_clip_range=None, +): + W_orig = w.float().clone() + rows, cols = W_orig.shape + H = H.float().clone() + dead = torch.diag(H) == 0 + H[dead, dead] = 1 + damp = 0.01 * H.diag().mean() + H.diagonal().add_(damp) + perm = torch.argsort(H.diag(), descending=True) + invperm = torch.argsort(perm) + W_perm = W_orig[:, perm].clone() + W_perm[:, dead[perm]] = 0 + H = H[perm][:, perm] + Hinv = torch.cholesky_inverse(torch.linalg.cholesky(H)) + Hinv = torch.linalg.cholesky(Hinv, upper=True) + row_std = W_orig.std(dim=1) + s = (clip_sigmas * row_std / clip_range).clamp_min(1e-10).to(torch.float16) + sf = s.float() + protect_meta = None + protect_mask_perm = None + s_hi = None + sf_hi = None + if ( + protect_groups + and group_size is not None + and protect_clip_range is not None + and protect_clip_range > clip_range + ): + protect_mask = torch.zeros(cols, dtype=torch.bool) + starts = [] + for (start, end) in protect_groups: + if start < 0 or end > cols or end <= start: + continue + protect_mask[start:end] = True + starts.append(start) + if starts: + protect_mask_perm = protect_mask[perm] + s_hi = (clip_sigmas * row_std / protect_clip_range).clamp_min(1e-10).to( + torch.float16 + ) + sf_hi = s_hi.float() + protect_meta = { + "starts": torch.tensor(starts, dtype=torch.int16), + "size": int(group_size), + "s_hi": s_hi, + } + Q = torch.zeros(rows, cols, dtype=torch.int8) + W_work = W_perm.clone() + for i1 in range(0, cols, block_size): + i2 = min(i1 + block_size, cols) + W_block = W_work[:, i1:i2].clone() + Hinv_block = Hinv[i1:i2, i1:i2] + Err = torch.zeros(rows, i2 - i1) + for j in range(i2 - i1): + w_col = W_block[:, j] + d = Hinv_block[j, j] + if protect_mask_perm is not None and bool(protect_mask_perm[i1 + j]): + q_col = torch.clamp( + torch.round(w_col / sf_hi), + -protect_clip_range, + protect_clip_range, + ) + w_recon = q_col.float() * sf_hi + else: + q_col = torch.clamp(torch.round(w_col / sf), -clip_range, clip_range) + w_recon = q_col.float() * sf + Q[:, i1 + j] = q_col.to(torch.int8) + err = (w_col - w_recon) / d + Err[:, j] = err + W_block[:, j:] -= err.unsqueeze(1) * Hinv_block[j, j:].unsqueeze(0) + if i2 < cols: + W_work[:, i2:] -= Err @ Hinv[i1:i2, i2:] + return Q[:, invperm], s, protect_meta + + +def _quantize_gate_int8_row(w): + # Symmetric int8-per-row quantization for small gate tensors. w shape + # (R, C) -> (R,) scales in fp16, int8 values in [-127, 127]. Single scale + # per row keeps accuracy high while halving storage vs fp16. + W = w.float().contiguous() + row_max = W.abs().amax(dim=1).clamp_min(1e-10) + s = (row_max / 127.0).to(torch.float16) + sf = s.float().view(-1, 1) + q = torch.clamp(torch.round(W / sf), -127, 127).to(torch.int8) + return q, s + + +def _lqer_pack(A, B, bits): + rng = 2 ** (bits - 1) - 1 + sA = (A.abs().amax(dim=1).clamp_min(1e-10) / rng).to(torch.float16) + sB = (B.abs().amax(dim=1).clamp_min(1e-10) / rng).to(torch.float16) + qA = torch.clamp(torch.round(A / sA.float().view(-1, 1)), -rng, rng).to(torch.int8) + qB = torch.clamp(torch.round(B / sB.float().view(-1, 1)), -rng, rng).to(torch.int8) + return qA, sA, qB, sB + + +def _lqer_pack_asym(A, B, g=64): + # A: INT2 per-matrix scalar (signed [-2,1], scale = |A|max/1.5). + sA = (A.abs().amax().clamp_min(1e-10) / 1.5).to(torch.float16) + qA = torch.clamp(torch.round(A / sA.float()), -2, 1).to(torch.int8) + # B: INT4 groupwise g over flattened B (signed [-8,7], per-group scale). + Bf = B.reshape(-1, g) + Bmax = Bf.abs().amax(dim=-1, keepdim=True).clamp_min(1e-10) + sB = (Bmax / 7.5).to(torch.float16).reshape(-1) + qB = torch.clamp(torch.round(Bf / sB.float().reshape(-1, 1)), -8, 7).to( + torch.int8 + ).reshape(B.shape) + return qA, sA, qB, sB + + +def _lqer_fit_quantized(E, h): + U, S, Vh = torch.linalg.svd(E, full_matrices=False) + r = min(h.lqer_rank, S.numel()) + if r <= 0: + return None + A = (U[:, :r] * S[:r]).contiguous() + B = Vh[:r, :].contiguous() + asym_on = bool(getattr(h, "lqer_asym_enabled", False)) + asym_g = int(getattr(h, "lqer_asym_group", 64)) + if asym_on and B.numel() % asym_g == 0: + qA, sA, qB, sB = _lqer_pack_asym(A, B, asym_g) + A_hat = qA.float() * float(sA) + g_sz = qB.numel() // sB.numel() + B_hat = (qB.reshape(-1, g_sz).float() * sB.float().view(-1, 1)).reshape( + qB.shape + ) + return { + "kind": "asym", + "qA": qA, + "sA": sA, + "qB": qB, + "sB": sB, + "delta": A_hat @ B_hat, + } + qA, sA, qB, sB = _lqer_pack(A, B, h.lqer_factor_bits) + A_hat = qA.float() * sA.float().view(-1, 1) + B_hat = qB.float() * sB.float().view(-1, 1) + return { + "kind": "sym", + "qA": qA, + "sA": sA, + "qB": qB, + "sB": sB, + "delta": A_hat @ B_hat, + } + + +def _awq_lite_group_candidates(w, act_rms, group_size): + cols = w.shape[1] + n_groups = cols // group_size + if n_groups <= 0: + return [] + weight_score = w.float().abs().mean(dim=0) + saliency = act_rms.float() * weight_score + cands = [] + for gi in range(n_groups): + start = gi * group_size + end = start + group_size + score = float(saliency[start:end].sum()) + cands.append((score, start, end)) + return cands + + +def gptq_mixed_quantize(state_dict, hessians, act_stats, h): + result = {} + meta = {} + quant_gate = bool(getattr(h, "gated_attn_quant_gate", False)) + lqer_on = bool(getattr(h, "lqer_enabled", False)) + awq_on = bool(getattr(h, "awq_lite_enabled", False)) + lqer_cands = {} + awq_selected = collections.defaultdict(list) + if awq_on: + awq_cands = [] + for (name, tensor) in state_dict.items(): + t = tensor.detach().cpu().contiguous() + if t.is_floating_point() and t.numel() > 65536 and name in act_stats: + bits = h.embed_bits if "tok_emb" in name else h.matrix_bits + if bits < h.awq_lite_bits: + for score, start, end in _awq_lite_group_candidates( + t, act_stats[name], h.awq_lite_group_size + ): + awq_cands.append((score, name, start, end)) + awq_cands.sort(key=lambda x: -x[0]) + for (_score, name, start, end) in awq_cands[: h.awq_lite_group_top_k]: + awq_selected[name].append((start, end)) + for (name, tensor) in state_dict.items(): + t = tensor.detach().cpu().contiguous() + # Dedicated int8-per-row path for attn_gate_w (bypasses both GPTQ and + # fp16 passthrough). Applied BEFORE the numel<=65536 passthrough check + # so the gate tensor is routed here instead of to fp16. + if ( + quant_gate + and t.is_floating_point() + and t.ndim == 2 + and name.endswith(".attn_gate_w") + # Dense GatedAttn: (num_heads, dim) = (8, 512) = 4096. + # Sparse gate: (num_heads, gate_window) = (8, 12) = 96. + # Both need int8-per-row routing; the 1024 lower bound in stock + # PR-1736 presumed dense-only. Widen to catch both. + and 32 <= t.numel() <= 8192 + ): + gq, gs = _quantize_gate_int8_row(t) + result[name + ".gq"] = gq + result[name + ".gs"] = gs + meta[name] = "gate_int8_row" + continue + if not t.is_floating_point() or t.numel() <= 65536: + result[name] = t.to(torch.float16) if t.is_floating_point() else t + meta[name] = "passthrough (float16)" + continue + if "tok_emb" in name: + cs = h.embed_clip_sigmas + elif ".mlp." in name: + cs = h.mlp_clip_sigmas + elif ".attn." in name: + cs = h.attn_clip_sigmas + else: + cs = h.matrix_clip_sigmas + bits = h.embed_bits if "tok_emb" in name else h.matrix_bits + clip_range = 2 ** (bits - 1) - 1 + q, s, protect_meta = gptq_quantize_weight( + t, + hessians[name], + clip_sigmas=cs, + clip_range=clip_range, + protect_groups=awq_selected.get(name), + group_size=h.awq_lite_group_size if name in awq_selected else None, + protect_clip_range=(2 ** (h.awq_lite_bits - 1) - 1) + if name in awq_selected + else None, + ) + result[name + ".q"] = q + result[name + ".scale"] = s + meta[name] = f"gptq (int{bits})" + W_q = q.float() * s.float().view(-1, 1) + if protect_meta is not None: + result[name + ".awqg_start"] = protect_meta["starts"] + result[name + ".awqg_s_hi"] = protect_meta["s_hi"] + result[name + ".awqg_size"] = torch.tensor( + protect_meta["size"], dtype=torch.int16 + ) + meta[name] = meta[name] + f"+awqgrpint{h.awq_lite_bits}" + gsz = protect_meta["size"] + for start in protect_meta["starts"].tolist(): + W_q[:, start : start + gsz] = ( + q[:, start : start + gsz].float() + * protect_meta["s_hi"].float().view(-1, 1) + ) + if lqer_on: + # LQER is fit on top of the fully realized GPTQ base, which already + # includes any higher-precision AWQ-protected groups. + scope = str(getattr(h, "lqer_scope", "all")).lower() + scope_ok = ( + scope == "all" + or (scope == "mlp" and ".mlp." in name) + or (scope == "attn" and ".attn." in name) + or (scope == "embed" and "tok_emb" in name) + ) + if scope_ok: + E = t.float() - W_q + err_norm = float(E.norm()) + if err_norm > 0: + lqer_cands[name] = (E, err_norm) + if lqer_on and lqer_cands: + if bool(getattr(h, "lqer_gain_select", False)): + scored = [] + for (name, (E, base_err)) in lqer_cands.items(): + fit = _lqer_fit_quantized(E, h) + if fit is None: + continue + new_err = float((E - fit["delta"]).norm()) + gain = base_err - new_err + if gain > 0: + scored.append((gain, name, fit)) + scored.sort(key=lambda x: -x[0]) + for (_gain, name, fit) in scored[: h.lqer_top_k]: + if fit["kind"] == "asym": + result[name + ".lqA_a"] = fit["qA"] + result[name + ".lqAs_a"] = fit["sA"] + result[name + ".lqB_a"] = fit["qB"] + result[name + ".lqBs_a"] = fit["sB"] + meta[name] = meta[name] + "+lqer_asym" + else: + result[name + ".lqA"] = fit["qA"] + result[name + ".lqAs"] = fit["sA"] + result[name + ".lqB"] = fit["qB"] + result[name + ".lqBs"] = fit["sB"] + meta[name] = meta[name] + "+lqer" + else: + top = sorted(lqer_cands.items(), key=lambda kv: -kv[1][1])[: h.lqer_top_k] + asym_on = bool(getattr(h, "lqer_asym_enabled", False)) + asym_g = int(getattr(h, "lqer_asym_group", 64)) + for (name, (E, _)) in top: + U, S, Vh = torch.linalg.svd(E, full_matrices=False) + r = min(h.lqer_rank, S.numel()) + A = (U[:, :r] * S[:r]).contiguous() + B = Vh[:r, :].contiguous() + if asym_on and B.numel() % asym_g == 0: + qA, sA, qB, sB = _lqer_pack_asym(A, B, asym_g) + result[name + ".lqA_a"] = qA + result[name + ".lqAs_a"] = sA + result[name + ".lqB_a"] = qB + result[name + ".lqBs_a"] = sB + meta[name] = meta[name] + "+lqer_asym" + else: + qA, sA, qB, sB = _lqer_pack(A, B, h.lqer_factor_bits) + result[name + ".lqA"] = qA + result[name + ".lqAs"] = sA + result[name + ".lqB"] = qB + result[name + ".lqBs"] = sB + meta[name] = meta[name] + "+lqer" + categories = collections.defaultdict(set) + for (name, cat) in meta.items(): + short = re.sub("\\.\\d+$", "", re.sub("blocks\\.\\d+", "blocks", name)) + categories[cat].add(short) + log("Quantized weights:") + for cat in sorted(categories): + log(f" {cat}: {', '.join(sorted(categories[cat]))}") + return result, meta + +def dequantize_mixed(result, meta, template_sd): + out = {} + for (name, orig) in template_sd.items(): + info = meta.get(name) + if info is None: + continue + orig_dtype = orig.dtype + if "passthrough" in info: + t = result[name] + if t.dtype == torch.float16 and orig_dtype in ( + torch.float32, + torch.bfloat16, + ): + t = t.to(orig_dtype) + out[name] = t + continue + if info == "gate_int8_row": + gq = result[name + ".gq"] + gs = result[name + ".gs"] + out[name] = (gq.float() * gs.float().view(-1, 1)).to(orig_dtype) + continue + q, s = result[name + ".q"], result[name + ".scale"] + if s.ndim > 0: + W = q.float() * s.float().view(q.shape[0], *[1] * (q.ndim - 1)) + else: + W = q.float() * float(s.item()) + if "awqgrpint" in info: + starts = result[name + ".awqg_start"].tolist() + s_hi = result[name + ".awqg_s_hi"].float() + gsz = int(result[name + ".awqg_size"].item()) + for start in starts: + W[:, start : start + gsz] = ( + q[:, start : start + gsz].float() * s_hi.view(-1, 1) + ) + if "lqer_asym" in info: + qA_t = result[name + ".lqA_a"] + sA_t = result[name + ".lqAs_a"] + qB_t = result[name + ".lqB_a"] + sB_t = result[name + ".lqBs_a"] + qA = qA_t.float() * float(sA_t) + g_sz = qB_t.numel() // sB_t.numel() + qB = (qB_t.reshape(-1, g_sz).float() * sB_t.float().view(-1, 1)).reshape( + qB_t.shape + ) + W = W + qA @ qB + elif "lqer" in info: + qA = result[name + ".lqA"].float() * result[name + ".lqAs"].float().view(-1, 1) + qB = result[name + ".lqB"].float() * result[name + ".lqBs"].float().view(-1, 1) + W = W + qA @ qB + out[name] = W.to(orig_dtype) + return out + + +_BSHF_MAGIC = b"BSHF" + + +# ── Per-group lrzip compression (ported from PR#1586 via PR#1667/1729) ──────── + +_GROUP_ORDER = [ + "_tok_emb.weight.q", + "attn.c_k.weight.q", "attn.c_q.weight.q", + "attn.c_v.weight.q", "attn.proj.weight.q", + "mlp.fc.weight.q", "mlp.proj.weight.q", +] +_SIMSORT_KEYS = {"_tok_emb.weight.q", "attn.c_q.weight.q", "mlp.fc.weight.q"} +_PACK_MAGIC = b"PGRP" + + +def _similarity_sort_l1(matrix): + import numpy as _np + n = matrix.shape[0] + used = _np.zeros(n, dtype=bool) + order = [0] + used[0] = True + cur = matrix[0].astype(_np.float32) + for _ in range(n - 1): + dists = _np.sum(_np.abs(matrix[~used].astype(_np.float32) - cur), axis=1) + unused = _np.where(~used)[0] + best = unused[_np.argmin(dists)] + order.append(best) + used[best] = True + cur = matrix[best].astype(_np.float32) + return _np.array(order, dtype=_np.uint16) + + +def _lrzip_compress(data, tmpdir, label): + inp = os.path.join(tmpdir, f"{label}.bin") + out = f"{inp}.lrz" + with open(inp, "wb") as f: + f.write(data) + subprocess.run(["lrzip", "-z", "-L", "9", "-o", out, inp], capture_output=True, check=True) + with open(out, "rb") as f: + result = f.read() + os.remove(inp); os.remove(out) + return result + + +def _lrzip_decompress(data, tmpdir, label): + inp = os.path.join(tmpdir, f"{label}.lrz") + out = os.path.join(tmpdir, f"{label}.bin") + with open(inp, "wb") as f: + f.write(data) + subprocess.run(["lrzip", "-d", "-f", "-o", out, inp], capture_output=True, check=True) + with open(out, "rb") as f: + result = f.read() + os.remove(inp); os.remove(out) + return result + + +def _pack_streams(streams): + import struct + n = len(streams) + hdr = _PACK_MAGIC + struct.pack("= 2 + docs.append((start, end - start)) + return docs + + +def _build_ttt_global_batches(doc_entries, h, ascending=False): + batch_size = h.ttt_batch_size + global_doc_entries = sorted(doc_entries, key=lambda x: x[1][1]) + global_batches = [ + global_doc_entries[i : i + batch_size] + for i in range(0, len(global_doc_entries), batch_size) + ] + indexed = list(enumerate(global_batches)) + if not ascending: + indexed.sort(key=lambda ib: -max(dl for _, (_, dl) in ib[1])) + return indexed + + +def _init_batch_counter(path): + with open(path, "wb") as f: + f.write((0).to_bytes(4, "little")) + + +def _claim_next_batch(counter_path, queue_len): + try: + with open(counter_path, "r+b") as f: + fcntl.flock(f, fcntl.LOCK_EX) + idx = int.from_bytes(f.read(4), "little") + f.seek(0) + f.write((idx + 1).to_bytes(4, "little")) + f.flush() + except FileNotFoundError: + return queue_len + return idx + + +def _compute_chunk_window(ci, pred_len, num_chunks, chunk_size, eval_seq_len): + chunk_end = pred_len if ci == num_chunks - 1 else (ci + 1) * chunk_size + win_start = max(0, chunk_end - eval_seq_len) + win_len = chunk_end - win_start + chunk_start = ci * chunk_size + chunk_offset = chunk_start - win_start + chunk_len = chunk_end - chunk_start + return win_start, win_len, chunk_offset, chunk_len + + +def _accumulate_bpb( + ptl, + x, + y, + chunk_offsets, + chunk_lens, + pos_idx, + base_bytes_lut, + has_leading_space_lut, + is_boundary_token_lut, + loss_sum, + byte_sum, + token_count, + y_bytes=None, +): + pos = pos_idx[: x.size(1)].unsqueeze(0) + mask = ( + (chunk_lens.unsqueeze(1) > 0) + & (pos >= chunk_offsets.unsqueeze(1)) + & (pos < (chunk_offsets + chunk_lens).unsqueeze(1)) + ) + mask_f64 = mask.to(torch.float64) + if y_bytes is not None: + tok_bytes = y_bytes.to(torch.float64) + else: + tok_bytes = base_bytes_lut[y].to(torch.float64) + tok_bytes += (has_leading_space_lut[y] & ~is_boundary_token_lut[x]).to( + torch.float64 + ) + loss_sum += (ptl.to(torch.float64) * mask_f64).sum() + byte_sum += (tok_bytes * mask_f64).sum() + token_count += chunk_lens.to(torch.float64).sum() + + +def _loss_bpb_from_sums(loss_sum, token_count, byte_sum): + val_loss = (loss_sum / token_count).item() + val_bpb = val_loss / math.log(2.0) * (token_count.item() / byte_sum.item()) + return val_loss, val_bpb + + +def _add_to_counter(path, delta): + try: + with open(path, "r+b") as f: + fcntl.flock(f, fcntl.LOCK_EX) + cur = int.from_bytes(f.read(8), "little", signed=True) + cur += int(delta) + f.seek(0) + f.write(int(cur).to_bytes(8, "little", signed=True)) + f.flush() + return cur + except FileNotFoundError: + return int(delta) + + +def _init_int64_counter(path): + with open(path, "wb") as f: + f.write((0).to_bytes(8, "little", signed=True)) + + +def _select_ttt_doc_entries(docs, h): + doc_entries = list(enumerate(docs)) + if h.val_doc_fraction < 1.0: + sample_n = max(1, int(round(len(docs) * h.val_doc_fraction))) + sampled_indices = sorted( + random.Random(h.seed).sample(range(len(docs)), sample_n) + ) + return [(i, docs[i]) for i in sampled_indices] + return doc_entries + + +def train_val_ttt_global_sgd_distributed(h, device, val_data, base_model, val_tokens, batch_seqs=None): + global BOS_ID + if BOS_ID is None: + BOS_ID = 1 + base_model.eval() + seq_len = h.eval_seq_len + total_tokens = val_tokens.numel() - 1 + ttt_chunk = h.global_ttt_chunk_tokens + batch_seqs = h.global_ttt_batch_seqs if batch_seqs is None else batch_seqs + num_chunks = (total_tokens + ttt_chunk - 1) // ttt_chunk + ttt_params = [p for p in base_model.parameters()] + for p in ttt_params: + p.requires_grad_(True) + optimizer = torch.optim.SGD( + ttt_params, lr=h.global_ttt_lr, momentum=h.global_ttt_momentum + ) + t_start = time.perf_counter() + for ci in range(num_chunks): + chunk_start = ci * ttt_chunk + chunk_end = min((ci + 1) * ttt_chunk, total_tokens) + is_last_chunk = ci == num_chunks - 1 + if is_last_chunk or h.global_ttt_epochs <= 0: + continue + base_model.train() + chunk_seqs = (chunk_end - chunk_start) // seq_len + if chunk_seqs <= 0: + continue + warmup_chunks = max(0, min(h.global_ttt_warmup_chunks, num_chunks - 1)) + if warmup_chunks > 0 and ci < warmup_chunks: + warmup_denom = max(warmup_chunks - 1, 1) + warmup_t = ci / warmup_denom + lr_now = ( + h.global_ttt_warmup_start_lr + + (h.global_ttt_lr - h.global_ttt_warmup_start_lr) * warmup_t + ) + else: + decay_steps = max(num_chunks - 1 - warmup_chunks, 1) + decay_ci = max(ci - warmup_chunks, 0) + lr_now = h.global_ttt_lr * 0.5 * ( + 1.0 + math.cos(math.pi * decay_ci / decay_steps) + ) + for pg in optimizer.param_groups: + pg["lr"] = lr_now + my_seq_s = chunk_seqs * h.rank // h.world_size + my_seq_e = chunk_seqs * (h.rank + 1) // h.world_size + my_chunk_seqs = my_seq_e - my_seq_s + for _ in range(h.global_ttt_epochs): + for bs in range(0, my_chunk_seqs, batch_seqs): + be = min(bs + batch_seqs, my_chunk_seqs) + actual_bs = my_seq_s + bs + start_tok = chunk_start + actual_bs * seq_len + end_tok = chunk_start + (my_seq_s + be) * seq_len + 1 + if end_tok > val_tokens.numel(): + continue + local = val_tokens[start_tok:end_tok].to(device=device, dtype=torch.int64) + x_flat = local[:-1] + y_flat = local[1:] + optimizer.zero_grad(set_to_none=True) + with torch.enable_grad(): + with torch.autocast(device_type="cuda", dtype=torch.bfloat16): + if h.global_ttt_respect_doc_boundaries: + bos_pos = (x_flat == BOS_ID).nonzero(as_tuple=True)[0].tolist() + cu_seqlens, max_seqlen = _build_cu_seqlens( + bos_pos, x_flat.numel(), x_flat.device, h.eval_seq_len, 64 + ) + loss = base_model( + x_flat[None], + y_flat[None], + cu_seqlens=cu_seqlens, + max_seqlen=max_seqlen, + ) + else: + x = x_flat.reshape(-1, seq_len) + y = y_flat.reshape(-1, seq_len) + loss = base_model(x, y) + loss.backward() + if dist.is_available() and dist.is_initialized(): + for p in ttt_params: + if p.grad is not None: + dist.all_reduce(p.grad, op=dist.ReduceOp.SUM) + p.grad.mul_(1.0 / h.world_size) + if h.global_ttt_grad_clip > 0: + torch.nn.utils.clip_grad_norm_(ttt_params, h.global_ttt_grad_clip) + optimizer.step() + base_model.eval() + if h.rank == 0: + elapsed = time.perf_counter() - t_start + log( + f"tttg: c{ci+1}/{num_chunks} lr:{lr_now:.6f} t:{elapsed:.1f}s" + ) + for p in base_model.parameters(): + p.requires_grad_(True) + base_model.eval() + + +def _compute_ngram_hints_for_val(h, val_data, log0=print): + """Stage 1A: precompute ngram hints over full val token sequence. + Returns (hint_global, gate_global, boost_global) tensors on CPU, or None if tilt disabled. + + Compliance: single L->R pass over val tokens; uses val data only; produces hint + aligned to target positions [t] for predicting all_tokens[t+1] from prefix [:t+1]. + Same compute as inline precompute, just relocated to run BEFORE eval timer. + """ + if not getattr(h, "ngram_tilt_enabled", False): + return None + from online_ngram_tilt import build_hints_for_targets + all_tokens = val_data.val_tokens + targets_np_all = all_tokens.cpu().numpy().astype("uint16", copy=False)[1:] + t_h0 = time.perf_counter() + hints_pkg = build_hints_for_targets( + target_token_ids_np=targets_np_all, + tokenizer_path=h.tokenizer_path, + vocab_size=h.vocab_size, + log0=log0, + token_order=h.token_order, + token_threshold=h.token_threshold, + token_boost=h.token_boost, + within_tau=h.within_tau, + within_boost=h.within_boost, + word_order=h.word_order, + word_normalize=h.word_normalize, + word_tau=h.word_tau, + word_boost=h.word_boost, + agree_add_boost=h.agree_add_boost, + ) + hint_global = torch.from_numpy(hints_pkg["hint_ids"].astype("int64")) + gate_global = torch.from_numpy(hints_pkg["gate_mask"]) + boost_global = torch.from_numpy(hints_pkg["boost"].astype("float32")) + log0( + f"ngram_tilt:precompute_outside_timer_done elapsed={time.perf_counter()-t_h0:.2f}s " + f"total_targets={hint_global.numel()}" + ) + return (hint_global, gate_global, boost_global) + + +def eval_val_ttt_phased(h, base_model, device, val_data, forward_ttt_train, precomputed_hints=None): + global BOS_ID + if BOS_ID is None: + BOS_ID = 1 + base_model.eval() + for p in base_model.parameters(): + p.requires_grad_(False) + all_tokens = val_data.val_tokens + all_tokens_idx = all_tokens.to(torch.int32) + # === PR #1145 n-gram tilt: precompute prefix-only hints over val targets === + # Hints are aligned to target positions: hint_global[i] is the hint for + # predicting token all_tokens[i+1] given prefix all_tokens[:i+1]. + # Stored on CPU as int64; gathered per-chunk to GPU alongside y indices. + ngram_hint_global = None + ngram_gate_global = None + ngram_boost_global = None + if precomputed_hints is not None: + # v5 Stage 1A: hints were precomputed BEFORE eval timer started. + # Save measured eval time = the precompute elapsed (~168s for full tilt). + ngram_hint_global, ngram_gate_global, ngram_boost_global = precomputed_hints + log( + f"ngram_tilt:using_precomputed_hints " + f"total_targets={ngram_hint_global.numel()} (precompute time excluded from eval)" + ) + elif getattr(h, "ngram_tilt_enabled", False): + from online_ngram_tilt import build_hints_for_targets + targets_np_all = all_tokens.cpu().numpy().astype("uint16", copy=False)[1:] + t_h0 = time.perf_counter() + hints_pkg = build_hints_for_targets( + target_token_ids_np=targets_np_all, + tokenizer_path=h.tokenizer_path, + vocab_size=h.vocab_size, + log0=log, + token_order=h.token_order, + token_threshold=h.token_threshold, + token_boost=h.token_boost, + within_tau=h.within_tau, + within_boost=h.within_boost, + word_order=h.word_order, + word_normalize=h.word_normalize, + word_tau=h.word_tau, + word_boost=h.word_boost, + agree_add_boost=h.agree_add_boost, + ) + ngram_hint_global = torch.from_numpy(hints_pkg["hint_ids"].astype("int64")) + ngram_gate_global = torch.from_numpy(hints_pkg["gate_mask"]) + ngram_boost_global = torch.from_numpy(hints_pkg["boost"].astype("float32")) + log( + f"ngram_tilt:precompute_done elapsed={time.perf_counter()-t_h0:.2f}s " + f"total_targets={ngram_hint_global.numel()}" + ) + docs = _find_docs(all_tokens) + doc_entries = _select_ttt_doc_entries(docs, h) + prefix_doc_limit = max(0, min(len(doc_entries), int(h.phased_ttt_prefix_docs))) + num_phases = max(1, int(h.phased_ttt_num_phases)) + phase_boundaries = [] + for pi in range(num_phases): + boundary = prefix_doc_limit * (pi + 1) // num_phases + phase_boundaries.append(boundary) + current_phase = 0 + current_phase_boundary = phase_boundaries[0] + log( + "ttt_phased:" + f" total_docs:{len(doc_entries)} prefix_docs:{prefix_doc_limit} " + f"suffix_docs:{len(doc_entries) - prefix_doc_limit}" + f" num_phases:{num_phases} boundaries:{phase_boundaries}" + ) + chunk_size, eval_seq_len = h.ttt_chunk_size, h.ttt_eval_seq_len + eval_batch_set = None + if h.ttt_eval_batches: + eval_batch_set = set(int(x) for x in h.ttt_eval_batches.split(",") if x.strip()) + use_ascending = eval_batch_set is not None + global_batches_sorted = _build_ttt_global_batches( + doc_entries, h, ascending=use_ascending + ) + queue_len = len(global_batches_sorted) + counter_path = f"/tmp/ttt_counter_{h.run_id}" + prefix_counter_path = f"/tmp/ttt_prefix_counter_{h.run_id}" + pause_flag_path = f"/tmp/ttt_pause_flag_{h.run_id}" + if h.rank == 0: + _init_batch_counter(counter_path) + _init_int64_counter(prefix_counter_path) + try: + os.remove(pause_flag_path) + except FileNotFoundError: + pass + if dist.is_available() and dist.is_initialized(): + path_list = [counter_path, prefix_counter_path, pause_flag_path] + dist.broadcast_object_list(path_list, src=0) + counter_path, prefix_counter_path, pause_flag_path = path_list + dist.barrier() + loss_sum = torch.zeros((), device=device, dtype=torch.float64) + byte_sum = torch.zeros((), device=device, dtype=torch.float64) + token_count = torch.zeros((), device=device, dtype=torch.float64) + t_start = time.perf_counter() + reusable_lora = BatchedTTTLoRA( + h.ttt_batch_size, base_model, h.ttt_lora_rank, + k_lora=h.ttt_k_lora, mlp_lora=h.ttt_mlp_lora, o_lora=h.ttt_o_lora, + ).to(device) + + def _build_opt(lora): + if h.ttt_optimizer == "sgd": + return torch.optim.SGD( + lora.parameters(), lr=h.ttt_lora_lr, + momentum=h.ttt_beta1, weight_decay=h.ttt_weight_decay, + ) + return torch.optim.AdamW( + lora.parameters(), lr=h.ttt_lora_lr, + betas=(h.ttt_beta1, h.ttt_beta2), + eps=1e-10, weight_decay=h.ttt_weight_decay, fused=True, + ) + + reusable_opt = _build_opt(reusable_lora) + local_scored_docs = [] + global_ttt_done = prefix_doc_limit == 0 + try: + while True: + queue_idx = _claim_next_batch(counter_path, queue_len) + if queue_idx >= queue_len: + break + orig_batch_idx, batch_entries = global_batches_sorted[queue_idx] + batch = [doc for _, doc in batch_entries] + bsz = len(batch) + prev_loss = loss_sum.item() + prev_bytes = byte_sum.item() + prev_tokens = token_count.item() + if bsz == reusable_lora.bsz: + reusable_lora.reset() + for s in reusable_opt.state.values(): + for k, v in s.items(): + if isinstance(v, torch.Tensor): + v.zero_() + elif k == "step": + s[k] = 0 + cur_lora = reusable_lora + cur_opt = reusable_opt + else: + cur_lora = BatchedTTTLoRA( + bsz, base_model, h.ttt_lora_rank, + k_lora=h.ttt_k_lora, mlp_lora=h.ttt_mlp_lora, o_lora=h.ttt_o_lora, + ).to(device) + cur_opt = _build_opt(cur_lora) + pred_lens = [doc_len - 1 for _, doc_len in batch] + num_chunks = [(pl + chunk_size - 1) // chunk_size for pl in pred_lens] + max_nc = max(num_chunks) + num_chunks_t = torch.tensor(num_chunks, dtype=torch.int64, device=device) + for ci in range(max_nc): + active = [ci < nc for nc in num_chunks] + needs_train = any(ci < nc - 1 for nc in num_chunks) + tok_starts = torch.zeros(bsz, dtype=torch.int64) + tok_wls = torch.zeros(bsz, dtype=torch.int64) + chunk_offsets_cpu = torch.zeros(bsz, dtype=torch.int64) + chunk_lens_cpu = torch.zeros(bsz, dtype=torch.int64) + for b in range(bsz): + if not active[b]: + continue + doc_start, doc_len = batch[b] + win_start, win_len, chunk_offset, chunk_len = _compute_chunk_window( + ci, pred_lens[b], num_chunks[b], chunk_size, eval_seq_len + ) + tok_starts[b] = doc_start + win_start + tok_wls[b] = win_len + chunk_offsets_cpu[b] = chunk_offset + chunk_lens_cpu[b] = chunk_len + _, context_size, chunk_offset, _ = _compute_chunk_window( + ci, (ci + 1) * chunk_size, ci + 1, chunk_size, eval_seq_len + ) + col_idx = torch.arange(context_size + 1) + idx = tok_starts.unsqueeze(1) + col_idx.unsqueeze(0) + idx.clamp_(max=all_tokens.numel() - 1) + gathered_gpu = all_tokens_idx[idx].to( + device=device, dtype=torch.int64, non_blocking=True + ) + valid = (col_idx[:context_size].unsqueeze(0) < tok_wls.unsqueeze(1)).to( + device, non_blocking=True + ) + chunk_offsets = chunk_offsets_cpu.to(device, non_blocking=True) + chunk_lens = chunk_lens_cpu.to(device, non_blocking=True) + x = torch.where(valid, gathered_gpu[:, :context_size], 0) + y = torch.where(valid, gathered_gpu[:, 1 : context_size + 1], 0) + ctx_pos = torch.arange(context_size, device=device, dtype=torch.int64) + # n-gram tilt path: gather hints aligned to y, pass into forward_ttt + hint_ids_gpu = None + gate_mask_gpu = None + boost_gpu = None + if ngram_hint_global is not None: + hint_idx_cpu = ( + tok_starts.unsqueeze(1) + col_idx[:context_size].unsqueeze(0) + ).clamp_(min=0, max=ngram_hint_global.numel() - 1) + hint_ids_gpu = ngram_hint_global[hint_idx_cpu].to( + device=device, dtype=torch.int64, non_blocking=True + ) + gate_mask_gpu = ngram_gate_global[hint_idx_cpu].to( + device=device, non_blocking=True + ) + boost_gpu = ngram_boost_global[hint_idx_cpu].to( + device=device, dtype=torch.float32, non_blocking=True + ) + hint_ids_gpu = torch.where(valid, hint_ids_gpu, torch.zeros_like(hint_ids_gpu)) + gate_mask_gpu = gate_mask_gpu & valid + with torch.autocast(device_type="cuda", dtype=torch.bfloat16): + if hint_ids_gpu is not None: + per_tok_loss, log_q_hint = forward_ttt_train( + x, y, lora=cur_lora, hint_ids=hint_ids_gpu + ) + else: + per_tok_loss = forward_ttt_train(x, y, lora=cur_lora) + log_q_hint = None + # CaseOps sidecar-driven byte budget. Mirror the index pattern + # used to build y from all_tokens: y[b, j] corresponds to the + # token at global position tok_starts[b] + 1 + j (when valid). + y_bytes_arg = None + if val_data.caseops_enabled and val_data.val_bytes is not None: + y_idx = ( + tok_starts.unsqueeze(1) + + 1 + + col_idx[:context_size].unsqueeze(0) + ) + y_idx = y_idx.clamp_(max=val_data.val_bytes.numel() - 1) + y_bytes_arg = val_data.val_bytes[y_idx].to( + device=device, dtype=torch.int32, non_blocking=True + ) + # Mirror the `valid` masking used for y so out-of-range tokens + # contribute zero bytes (matches y=0 substitution above). + y_bytes_arg = torch.where( + valid, y_bytes_arg, torch.zeros_like(y_bytes_arg) + ) + # n-gram tilt application: use tilted ptl for BPB accumulation, + # but keep original per_tok_loss for TTT-LoRA backward (training + # objective is base NLL — tilt is a scoring-time overlay). + if hint_ids_gpu is not None and log_q_hint is not None: + from online_ngram_tilt import apply_tilt_to_ptl_torch_fast as apply_tilt_to_ptl_torch + tilted_loss = apply_tilt_to_ptl_torch( + ptl=per_tok_loss, + log_q_hint=log_q_hint, + target_ids=y, + hint_ids=hint_ids_gpu, + gate_mask=gate_mask_gpu, + boost=boost_gpu, + ) + else: + tilted_loss = per_tok_loss + with torch.no_grad(): + _accumulate_bpb( + tilted_loss, + x, + y, + chunk_offsets, + chunk_lens, + ctx_pos, + val_data.base_bytes_lut, + val_data.has_leading_space_lut, + val_data.is_boundary_token_lut, + loss_sum, + byte_sum, + token_count, + y_bytes=y_bytes_arg, + ) + if needs_train: + activate_chunk_mask = (num_chunks_t - 1 > ci).float() + for gi in range(h.ttt_grad_steps): + if gi > 0: + with torch.autocast(device_type="cuda", dtype=torch.bfloat16): + per_tok_loss = forward_ttt_train(x, y, lora=cur_lora) + per_doc = per_tok_loss[ + :, chunk_offset : chunk_offset + chunk_size + ].mean(dim=-1) + cur_opt.zero_grad(set_to_none=True) + (per_doc * activate_chunk_mask).sum().backward() + cur_opt.step() + else: + del per_tok_loss + batch_num = orig_batch_idx + 1 + doc_lens = [dl for _, dl in batch] + should_report = batch_num in eval_batch_set if eval_batch_set is not None else True + if should_report: + cur_tokens = token_count.item() + cur_loss_val = loss_sum.item() + cur_bytes_val = byte_sum.item() + dt = cur_tokens - prev_tokens + db = cur_bytes_val - prev_bytes + if dt > 0 and db > 0: + b_loss = (cur_loss_val - prev_loss) / dt + b_bpb = b_loss / math.log(2.0) * (dt / db) + else: + b_loss = b_bpb = 0.0 + r_loss = cur_loss_val / max(cur_tokens, 1) + r_bpb = r_loss / math.log(2.0) * (cur_tokens / max(cur_bytes_val, 1)) + elapsed = time.perf_counter() - t_start + log( + f"ttp: b{batch_num}/{queue_len} bl:{b_loss:.4f} bb:{b_bpb:.4f} " + f"rl:{r_loss:.4f} rb:{r_bpb:.4f} dl:{min(doc_lens)}-{max(doc_lens)} " + f"gd:{int(global_ttt_done)}" + ) + if not global_ttt_done: + local_scored_docs.extend( + (orig_batch_idx, pos, doc_start, doc_len) + for pos, (doc_start, doc_len) in enumerate(batch) + ) + prefix_done = _add_to_counter(prefix_counter_path, len(batch_entries)) + if prefix_done >= current_phase_boundary: + try: + with open(pause_flag_path, "x"): + pass + except FileExistsError: + pass + should_pause = os.path.exists(pause_flag_path) + if should_pause: + if dist.is_available() and dist.is_initialized(): + dist.barrier() + gathered_scored_docs = [None] * h.world_size + if dist.is_available() and dist.is_initialized(): + dist.all_gather_object(gathered_scored_docs, local_scored_docs) + else: + gathered_scored_docs = [local_scored_docs] + scored_docs_for_global = [] + for rank_docs in gathered_scored_docs: + if rank_docs: + scored_docs_for_global.extend(rank_docs) + scored_docs_for_global.sort(key=lambda x: (x[0], x[1])) + scored_docs_for_global = scored_docs_for_global[:current_phase_boundary] + scored_token_chunks = [ + val_data.val_tokens[doc_start : doc_start + doc_len] + for _, _, doc_start, doc_len in scored_docs_for_global + ] + if scored_token_chunks: + global_ttt_tokens = torch.cat(scored_token_chunks) + else: + global_ttt_tokens = val_data.val_tokens[:0] + if h.rank == 0: + prefix_done = 0 + try: + with open(prefix_counter_path, "rb") as f: + prefix_done = int.from_bytes( + f.read(8), "little", signed=True + ) + except FileNotFoundError: + pass + log( + f"ttpp: phase:{current_phase + 1}/{num_phases} pd:{prefix_done} " + f"gd:{len(scored_docs_for_global)} " + f"t:{time.perf_counter() - t_start:.1f}s" + ) + train_val_ttt_global_sgd_distributed( + h, device, val_data, base_model, global_ttt_tokens + ) + for p in base_model.parameters(): + p.requires_grad_(False) + reusable_lora = BatchedTTTLoRA( + h.ttt_batch_size, base_model, h.ttt_lora_rank, + k_lora=h.ttt_k_lora, mlp_lora=h.ttt_mlp_lora, o_lora=h.ttt_o_lora, + ).to(device) + reusable_opt = _build_opt(reusable_lora) + current_phase += 1 + if current_phase >= num_phases: + global_ttt_done = True + else: + current_phase_boundary = phase_boundaries[current_phase] + if h.rank == 0: + try: + os.remove(pause_flag_path) + except FileNotFoundError: + pass + if dist.is_available() and dist.is_initialized(): + dist.barrier() + if h.rank == 0: + log(f"ttpr: phase:{current_phase}/{num_phases} t:{time.perf_counter() - t_start:.1f}s") + del cur_lora, cur_opt + finally: + pass + if dist.is_available() and dist.is_initialized(): + dist.all_reduce(loss_sum, op=dist.ReduceOp.SUM) + dist.all_reduce(byte_sum, op=dist.ReduceOp.SUM) + dist.all_reduce(token_count, op=dist.ReduceOp.SUM) + for p in base_model.parameters(): + p.requires_grad_(True) + base_model.train() + return _loss_bpb_from_sums(loss_sum, token_count, byte_sum) + + +def timed_eval(label, fn, *args, **kwargs): + torch.cuda.synchronize() + t0 = time.perf_counter() + val_loss, val_bpb = fn(*args, **kwargs) + torch.cuda.synchronize() + elapsed_ms = 1e3 * (time.perf_counter() - t0) + log( + f"{label} val_loss:{val_loss:.8f} val_bpb:{val_bpb:.8f} eval_time:{elapsed_ms:.0f}ms" + ) + return val_loss, val_bpb + + +def train_model(h, device, val_data): + base_model = GPT(h).to(device).bfloat16() + restore_fp32_params(base_model) + compiled_model = torch.compile(base_model, dynamic=False, fullgraph=True) + compiled_forward_logits = torch.compile( + base_model.forward_logits, dynamic=False, fullgraph=True + ) + model = compiled_model + log(f"model_params:{sum(p.numel()for p in base_model.parameters())}") + optimizers = Optimizers(h, base_model) + train_loader = DocumentPackingLoader(h, device) + max_wallclock_ms = ( + 1e3 * h.max_wallclock_seconds if h.max_wallclock_seconds > 0 else None + ) + if max_wallclock_ms is not None: + max_wallclock_ms -= h.gptq_reserve_seconds * 1e3 + log( + f"gptq:reserving {h.gptq_reserve_seconds:.0f}s, effective={max_wallclock_ms:.0f}ms" + ) + + def training_frac(step, elapsed_ms): + if max_wallclock_ms is None: + return step / max(h.iterations, 1) + return elapsed_ms / max(max_wallclock_ms, 1e-09) + + def lr_mul(frac): + if h.warmdown_frac <= 0: + return 1.0 + if frac >= 1.0 - h.warmdown_frac: + return max((1.0 - frac) / h.warmdown_frac, h.min_lr) + return 1.0 + + _clip_params = [p for p in base_model.parameters() if p.requires_grad] + def step_fn(step, lr_scale): + train_loss = torch.zeros((), device=device) + for micro_step in range(h.grad_accum_steps): + x, y, cu_seqlens, _max_seqlen = train_loader.next_batch( + h.train_batch_tokens, h.grad_accum_steps + ) + with torch.autocast(device_type="cuda", dtype=torch.bfloat16, enabled=True): + loss = model(x, y, cu_seqlens=cu_seqlens, max_seqlen=h.train_seq_len) + train_loss += loss.detach() + (loss / h.grad_accum_steps).backward() + train_loss /= h.grad_accum_steps + if step <= h.muon_momentum_warmup_steps: + + frac = ( + + min(step / h.muon_momentum_warmup_steps, 1.0) + + if h.muon_momentum_warmup_steps > 0 + + else 1.0 + + ) + + muon_momentum = ( + + 1 - frac + + ) * h.muon_momentum_warmup_start + frac * h.muon_momentum + + for group in optimizers.optimizer_muon.param_groups: + + group["momentum"] = muon_momentum + for opt in optimizers: + for group in opt.param_groups: + group["lr"] = group["base_lr"] * lr_scale + if h.grad_clip_norm > 0: + torch.nn.utils.clip_grad_norm_(_clip_params, h.grad_clip_norm) + optimizers.step(distributed=h.distributed) + return train_loss + + if h.warmup_steps > 0: + initial_model_state = { + name: tensor.detach().cpu().clone() + for (name, tensor) in base_model.state_dict().items() + } + initial_optimizer_states = [ + copy.deepcopy(opt.state_dict()) for opt in optimizers + ] + model.train() + num_tokens_local = h.train_batch_tokens // h.world_size + for blk in base_model.blocks: + blk.attn.rotary(num_tokens_local, device, torch.bfloat16) + cu_bucket_size = train_loader.cu_bucket_size + warmup_cu_buckets = tuple(cu_bucket_size * i for i in range(1, 5)) + warmup_cu_iters = 3 + x, y, cu_seqlens, _ = train_loader.next_batch( + h.train_batch_tokens, h.grad_accum_steps + ) + log(f"warmup_cu_buckets:{','.join(str(b) for b in warmup_cu_buckets)} iters_each:{warmup_cu_iters}") + def _run_cu_bucket_warmup(): + for bucket_len in warmup_cu_buckets: + boundaries = list(range(0, x.size(1), max(h.train_seq_len, 1))) + if boundaries[-1] != x.size(1): + boundaries.append(x.size(1)) + cu = torch.full((bucket_len,), x.size(1), dtype=torch.int32, device=device) + cu[: len(boundaries)] = torch.tensor(boundaries, dtype=torch.int32, device=device) + for _ in range(warmup_cu_iters): + optimizers.zero_grad_all() + with torch.autocast(device_type="cuda", dtype=torch.bfloat16, enabled=True): + wloss = model(x, y, cu_seqlens=cu, max_seqlen=h.train_seq_len) + (wloss / h.grad_accum_steps).backward() + optimizers.zero_grad_all() + _run_cu_bucket_warmup() + if h.num_loops > 0: + base_model.looping_active = True + _run_cu_bucket_warmup() + base_model.looping_active = False + for warmup_step in range(h.warmup_steps): + step_fn(warmup_step, 1.0) + if ( + warmup_step <= 5 + or (warmup_step + 1) % 10 == 0 + or warmup_step + 1 == h.warmup_steps + ): + log(f"warmup_step: {warmup_step+1}/{h.warmup_steps}") + if h.num_loops > 0: + base_model.looping_active = True + log( + f"loop_warmup:enabled encoder:{base_model.encoder_indices} decoder:{base_model.decoder_indices}" + ) + for warmup_step in range(h.warmup_steps): + step_fn(warmup_step, 1.0) + if ( + warmup_step <= 5 + or (warmup_step + 1) % 10 == 0 + or warmup_step + 1 == h.warmup_steps + ): + log(f"loop_warmup_step: {warmup_step+1}/{h.warmup_steps}") + base_model.looping_active = False + base_model.load_state_dict(initial_model_state, strict=True) + for (opt, state) in zip(optimizers, initial_optimizer_states, strict=True): + opt.load_state_dict(state) + optimizers.zero_grad_all() + train_loader = DocumentPackingLoader(h, device) + _live_state = base_model.state_dict(keep_vars=True) + ema_state = { + name: t.detach().float().clone() + for (name, t) in _live_state.items() + } + _ema_pairs = [(ema_state[name], t) for (name, t) in _live_state.items()] + ema_decay = h.ema_decay + training_time_ms = 0.0 + forced_stop_step = int(os.environ.get("FORCE_STOP_STEP", "0")) + stop_after_step = forced_stop_step if forced_stop_step > 0 else None + torch.cuda.synchronize() + t0 = time.perf_counter() + step = 0 + while True: + last_step = ( + step == h.iterations + or stop_after_step is not None + and step >= stop_after_step + ) + should_validate = ( + last_step or h.val_loss_every > 0 and step % h.val_loss_every == 0 + ) + if should_validate: + torch.cuda.synchronize() + training_time_ms += 1e3 * (time.perf_counter() - t0) + val_loss, val_bpb = eval_val( + h, device, val_data, model, compiled_forward_logits + ) + log( + f"{step}/{h.iterations} val_loss: {val_loss:.4f} val_bpb: {val_bpb:.4f}" + ) + torch.cuda.synchronize() + t0 = time.perf_counter() + if last_step: + if stop_after_step is not None and step < h.iterations: + log( + f"stopping_early: wallclock_cap train_time: {training_time_ms:.0f}ms step: {step}/{h.iterations}" + ) + break + elapsed_ms = training_time_ms + 1e3 * (time.perf_counter() - t0) + frac = training_frac(step, elapsed_ms) + scale = lr_mul(frac) + if ( + h.num_loops > 0 + and not base_model.looping_active + and frac >= h.enable_looping_at + ): + base_model.looping_active = True + log( + f"layer_loop:enabled step:{step} frac:{frac:.3f} encoder:{base_model.encoder_indices} decoder:{base_model.decoder_indices}" + ) + train_loss = step_fn(step, scale) + with torch.no_grad(): + for ema_t, t in _ema_pairs: + ema_t.mul_(ema_decay).add_(t.detach(), alpha=1.0 - ema_decay) + step += 1 + approx_training_time_ms = training_time_ms + 1e3 * (time.perf_counter() - t0) + should_log_train = h.train_log_every > 0 and ( + step <= 5 or step % h.train_log_every == 0 or stop_after_step is not None + ) + if should_log_train: + tok_per_sec = step * h.train_batch_tokens / (approx_training_time_ms / 1e3) + log( + f"{step}/{h.iterations} train_loss: {train_loss.item():.4f} train_time: {approx_training_time_ms/60000:.1f}m tok/s: {tok_per_sec:.0f}" + ) + reached_cap = ( + forced_stop_step <= 0 + and max_wallclock_ms is not None + and approx_training_time_ms >= max_wallclock_ms + ) + if h.distributed and forced_stop_step <= 0 and max_wallclock_ms is not None: + reached_cap_tensor = torch.tensor(int(reached_cap), device=device) + dist.all_reduce(reached_cap_tensor, op=dist.ReduceOp.MAX) + reached_cap = bool(reached_cap_tensor.item()) + if stop_after_step is None and reached_cap: + stop_after_step = step + log( + f"peak memory allocated: {torch.cuda.max_memory_allocated()//1024//1024} MiB reserved: {torch.cuda.max_memory_reserved()//1024//1024} MiB" + ) + log("ema:applying EMA weights") + current_state = base_model.state_dict() + avg_state = { + name: t.to(dtype=current_state[name].dtype) for (name, t) in ema_state.items() + } + base_model.load_state_dict(avg_state, strict=True) + return base_model, compiled_model, compiled_forward_logits + + +def train_and_eval(h, device): + global BOS_ID + random.seed(h.seed) + np.random.seed(h.seed) + torch.manual_seed(h.seed) + torch.cuda.manual_seed_all(h.seed) + if h.artifact_dir and h.is_main_process: + os.makedirs(h.artifact_dir, exist_ok=True) + val_data = ValidationData(h, device) + log( + f"train_shards: {len(list(Path(h.datasets_dir).resolve().glob('fineweb_train_*.bin')))}" + ) + log(f"val_tokens: {val_data.val_tokens.numel()-1}") + # TTT_EVAL_ONLY: skip training + GPTQ, jump straight to TTT eval on a + # pre-existing quantized artifact. Used to test TTT-only improvements + # (e.g., PR-1767's alpha/warm-start/WD) without retraining. + ttt_eval_only = os.environ.get("TTT_EVAL_ONLY", "0") == "1" + quantize_only = os.environ.get("QUANTIZE_ONLY", "0") == "1" + if ttt_eval_only: + log("TTT_EVAL_ONLY=1 — skipping training + GPTQ, loading saved artifact for TTT eval") + log(f"ttt_lora_alpha: {BatchedLinearLoRA._ALPHA}") + log(f"ttt_warm_start_a: {BatchedLinearLoRA._WARM_START_A}") + log(f"ttt_weight_decay: {h.ttt_weight_decay}") + elif quantize_only: + log("QUANTIZE_ONLY=1 — skipping training, loading saved full-precision checkpoint") + log(f"quantize_only checkpoint: {h.model_path}") + if BOS_ID is None: + BOS_ID = 1 + base_model = GPT(h).to(device).bfloat16() + state = torch.load(h.model_path, map_location="cpu") + base_model.load_state_dict(state, strict=True) + del state + serialize(h, base_model, Path(__file__).read_text(encoding="utf-8")) + if h.distributed: + dist.barrier() + else: + base_model, compiled_model, compiled_forward_logits = train_model( + h, device, val_data + ) + torch._dynamo.reset() + timed_eval( + "diagnostic pre-quantization post-ema", + eval_val, + h, + device, + val_data, + compiled_model, + compiled_forward_logits, + ) + if os.environ.get("PREQUANT_ONLY", "0") == "1": + log("PREQUANT_ONLY=1 — skipping serialize/GPTQ/post-quant eval/TTT") + return + serialize(h, base_model, Path(__file__).read_text(encoding="utf-8")) + if h.distributed: + dist.barrier() + eval_model = deserialize(h, device) + if h.num_loops > 0: + eval_model.looping_active = True + if not ttt_eval_only: + compiled_model = torch.compile(eval_model, dynamic=False, fullgraph=True) + compiled_forward_logits = torch.compile( + eval_model.forward_logits, dynamic=False, fullgraph=True + ) + timed_eval( + "diagnostic quantized", + eval_val, + h, + device, + val_data, + compiled_model, + compiled_forward_logits, + ) + del eval_model + if h.ttt_enabled: + if not ttt_eval_only: + del compiled_model + if ttt_eval_only: + del eval_model + torch._dynamo.reset() + torch.cuda.empty_cache() + ttt_model = deserialize(h, device) + if h.num_loops > 0: + ttt_model.looping_active = True + for p in ttt_model.parameters(): + p.requires_grad_(False) + + if h.rope_yarn: + _yarn_seqlen = h.train_batch_tokens // h.grad_accum_steps + for block in ttt_model.blocks: + block.attn.rotary(_yarn_seqlen, device, torch.bfloat16) + else: + for block in ttt_model.blocks: + block.attn.rotary._cos_cached = None + block.attn.rotary._sin_cached = None + block.attn.rotary._seq_len_cached = 0 + block.attn.rotary(h.ttt_eval_seq_len, device, torch.bfloat16) + + def _fwd_ttt_inner(input_ids, target_ids, lora): + return ttt_model.forward_ttt(input_ids, target_ids, lora=lora) + + def _fwd_ttt_inner_with_hints(input_ids, target_ids, lora, hint_ids): + return ttt_model.forward_ttt(input_ids, target_ids, lora=lora, hint_ids=hint_ids) + + _fwd_ttt_compiled_inner = None + _fwd_ttt_compiled_inner_hints = None + + def _fwd_ttt(input_ids, target_ids, lora, hint_ids=None): + nonlocal _fwd_ttt_compiled_inner, _fwd_ttt_compiled_inner_hints + if hint_ids is None: + if _fwd_ttt_compiled_inner is None: + _fwd_ttt_compiled_inner = torch.compile(_fwd_ttt_inner, dynamic=True) + return _fwd_ttt_compiled_inner(input_ids, target_ids, lora=lora) + if _fwd_ttt_compiled_inner_hints is None: + _fwd_ttt_compiled_inner_hints = torch.compile( + _fwd_ttt_inner_with_hints, dynamic=True + ) + return _fwd_ttt_compiled_inner_hints( + input_ids, target_ids, lora=lora, hint_ids=hint_ids + ) + + fwd_ttt_compiled = _fwd_ttt + log(f"ttt_lora:warming up compile (random tokens, no val data)") + if BOS_ID is None: + BOS_ID = 1 + t_warmup = time.perf_counter() + warmup_bszes = [h.ttt_batch_size] + for bsz in warmup_bszes: + wl = BatchedTTTLoRA( + bsz, ttt_model, h.ttt_lora_rank, + k_lora=h.ttt_k_lora, mlp_lora=h.ttt_mlp_lora, o_lora=h.ttt_o_lora, + ).to(device) + wo = torch.optim.AdamW( + wl.parameters(), + lr=h.ttt_lora_lr, + betas=(h.ttt_beta1, h.ttt_beta2), + eps=1e-10, + weight_decay=h.ttt_weight_decay, + fused=True, + ) + for ctx_len in (h.ttt_chunk_size, h.ttt_eval_seq_len): + xw = torch.randint(0, h.vocab_size, (bsz, ctx_len), device=device, dtype=torch.int64) + yw = torch.randint(0, h.vocab_size, (bsz, ctx_len), device=device, dtype=torch.int64) + with torch.autocast(device_type="cuda", dtype=torch.bfloat16): + ptl = fwd_ttt_compiled(xw, yw, lora=wl) + ptl[:, : min(h.ttt_chunk_size, ctx_len)].mean(dim=-1).sum().backward() + wo.step() + wo.zero_grad(set_to_none=True) + del wl, wo + torch.cuda.empty_cache() + compile_elapsed = time.perf_counter() - t_warmup + log(f"ttt_lora:compile warmup done ({compile_elapsed:.1f}s)") + # v5 Stage 1A: precompute ngram hints BEFORE eval timer (single pass causal, + # uses val tokens only — same compliance as inline). For full tilt this saves + # ~168s of measured eval time without losing any tilt benefit. + precomputed_hints = None + if h.ngram_tilt_enabled and getattr(h, "ngram_hint_precompute_outside", True): + log("v5:precomputing ngram hints OUTSIDE eval timer") + precomputed_hints = _compute_ngram_hints_for_val(h, val_data, log0=log) + log("\nbeginning TTT eval timer") + torch.cuda.synchronize() + t_ttt = time.perf_counter() + ttt_val_loss, ttt_val_bpb = eval_val_ttt_phased( + h, ttt_model, device, val_data, forward_ttt_train=fwd_ttt_compiled, + precomputed_hints=precomputed_hints, + ) + torch.cuda.synchronize() + ttt_eval_elapsed = time.perf_counter() - t_ttt + log( + "quantized_ttt_phased " + f"val_loss:{ttt_val_loss:.8f} val_bpb:{ttt_val_bpb:.8f} " + f"eval_time:{1e3*ttt_eval_elapsed:.0f}ms" + ) + log(f"total_eval_time:{ttt_eval_elapsed:.1f}s") + del ttt_model + + +def main(): + world_size = int(os.environ.get("WORLD_SIZE", "1")) + local_rank = int(os.environ.get("LOCAL_RANK", "0")) + distributed = "RANK" in os.environ and "WORLD_SIZE" in os.environ + if not torch.cuda.is_available(): + raise RuntimeError("CUDA is required") + if world_size <= 0: + raise ValueError(f"WORLD_SIZE must be positive, got {world_size}") + if 8 % world_size != 0: + raise ValueError( + f"WORLD_SIZE={world_size} must divide 8 so grad_accum_steps stays integral" + ) + device = torch.device("cuda", local_rank) + torch.cuda.set_device(device) + if distributed: + dist.init_process_group(backend="nccl", device_id=device) + dist.barrier() + torch.backends.cuda.matmul.allow_tf32 = True + torch.backends.cudnn.allow_tf32 = True + torch.set_float32_matmul_precision("high") + from torch.backends.cuda import ( + enable_cudnn_sdp, + enable_flash_sdp, + enable_math_sdp, + enable_mem_efficient_sdp, + ) + + enable_cudnn_sdp(False) + enable_flash_sdp(True) + enable_mem_efficient_sdp(False) + enable_math_sdp(False) + torch._dynamo.config.optimize_ddp = False + torch._dynamo.config.cache_size_limit = 64 + h = Hyperparameters() + set_logging_hparams(h) + if h.is_main_process: + os.makedirs(h.artifact_dir if h.artifact_dir else "logs", exist_ok=True) + log(100 * "=", console=False) + log("Hyperparameters:", console=True) + for (k, v) in sorted(vars(type(h)).items()): + if not k.startswith("_"): + log(f" {k}: {v}", console=True) + log("=" * 100, console=False) + log("Source code:", console=False) + log("=" * 100, console=False) + with open(__file__, "r", encoding="utf-8") as _src: + log(_src.read(), console=False) + log("=" * 100, console=False) + log(f"Running Python {sys.version}", console=False) + log(f"Running PyTorch {torch.__version__}", console=False) + log("=" * 100, console=False) + train_and_eval(h, device) + if distributed: + dist.destroy_process_group() + + +if __name__ == "__main__": + main() diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed0.log b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed0.log new file mode 100644 index 0000000000..9d0d2203d9 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed0.log @@ -0,0 +1,5846 @@ +nohup: ignoring input +==================================================== + v5 PRIMARY noLC fulltilt + precompute outside timer: V21 + #1953 + #1948 + fulltilt-tilt SEED=0 Thu Apr 30 06:31:00 UTC 2026 + LeakyReLU slope 0.3 (code patch + v5 hint-precompute-outside-timer), EVAL_SEQ_LEN 2048 (no long-ctx for cap), no_qv, fulltilt-tilt +==================================================== +W0430 06:31:01.197000 1039730 torch/distributed/run.py:803] +W0430 06:31:01.197000 1039730 torch/distributed/run.py:803] ***************************************** +W0430 06:31:01.197000 1039730 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0430 06:31:01.197000 1039730 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + agree_add_boost: 0.5 + artifact_dir: + attn_clip_sigmas: 13.0 + attn_out_gate_enabled: False + attn_out_gate_src: proj + awq_lite_bits: 8 + awq_lite_enabled: True + awq_lite_group_size: 64 + awq_lite_group_top_k: 1 + beta1: 0.9 + beta2: 0.99 + caseops_enabled: True + compressor: pergroup + data_dir: /runpod-volume/caseops_data/datasets + datasets_dir: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved + distributed: True + ema_decay: 0.9965 + embed_bits: 7 + embed_clip_sigmas: 14.0 + embed_lr: 0.6 + embed_wd: 0.085 + enable_looping_at: 0.35 + eval_seq_len: 2048 + eval_stride: 64 + fused_ce_enabled: True + gate_window: 12 + gated_attn_enabled: False + gated_attn_init_std: 0.01 + gated_attn_quant_gate: True + global_ttt_batch_seqs: 32 + global_ttt_chunk_tokens: 32768 + global_ttt_epochs: 1 + global_ttt_grad_clip: 1.0 + global_ttt_lr: 0.001 + global_ttt_momentum: 0.9 + global_ttt_respect_doc_boundaries: True + global_ttt_warmup_chunks: 0 + global_ttt_warmup_start_lr: 0.0 + gptq_calibration_batches: 16 + gptq_reserve_seconds: 0.5 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/f52a14a3-f337-475d-ae4d-917bd1d29ebb.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + lqer_asym_enabled: True + lqer_asym_group: 64 + lqer_enabled: True + lqer_factor_bits: 4 + lqer_gain_select: False + lqer_rank: 4 + lqer_scope: all + lqer_top_k: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.026 + max_wallclock_seconds: 600.0 + min_lr: 0.1 + mlp_clip_sigmas: 11.5 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + ngram_hint_precompute_outside: True + ngram_tilt_enabled: True + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_final_lane: mean + parallel_start_layer: 8 + phased_ttt_num_phases: 3 + phased_ttt_prefix_docs: 2500 + qk_gain_init: 5.25 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: f52a14a3-f337-475d-ae4d-917bd1d29ebb + scalar_lr: 0.02 + seed: 0 + skip_gates_enabled: True + smear_gate_enabled: True + sparse_attn_gate_enabled: True + sparse_attn_gate_init_std: 0.0 + sparse_attn_gate_scale: 0.5 + temperature_scale: 1.0 + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + token_boost: 2.625 + token_order: 16 + token_threshold: 0.8 + tokenizer_path: /runpod-volume/caseops_data/datasets/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model + train_batch_tokens: 786432 + train_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.99 + ttt_chunk_size: 48 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 80 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_bytes_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_bytes_*.bin + val_doc_fraction: 1.0 + val_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_*.bin + val_loss_every: 0 + vocab_size: 8192 + warmdown_frac: 0.85 + warmup_steps: 20 + within_boost: 0.75 + within_tau: 0.45 + word_boost: 0.75 + word_normalize: strip_punct_lower + word_order: 4 + word_tau: 0.65 + world_size: 8 + xsa_last_n: 11 +train_shards: 1499 +val_tokens: 47851520 +model_params:35945673 +gptq:reserving 0s, effective=599500ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +1/20000 train_loss: 9.0105 train_time: 0.0m tok/s: 17751024 +2/20000 train_loss: 12.9581 train_time: 0.0m tok/s: 11612482 +3/20000 train_loss: 10.2705 train_time: 0.0m tok/s: 10325348 +4/20000 train_loss: 8.7742 train_time: 0.0m tok/s: 9758222 +5/20000 train_loss: 8.0014 train_time: 0.0m tok/s: 9451468 +6/20000 train_loss: 7.5156 train_time: 0.0m tok/s: 9277800 +7/20000 train_loss: 7.2839 train_time: 0.0m tok/s: 9129780 +8/20000 train_loss: 6.9483 train_time: 0.0m tok/s: 9050891 +9/20000 train_loss: 6.5860 train_time: 0.0m tok/s: 8983588 +10/20000 train_loss: 6.4463 train_time: 0.0m tok/s: 8909806 +11/20000 train_loss: 6.1455 train_time: 0.0m tok/s: 8781752 +12/20000 train_loss: 5.8724 train_time: 0.0m tok/s: 8716445 +13/20000 train_loss: 5.7235 train_time: 0.0m tok/s: 8669421 +14/20000 train_loss: 5.3392 train_time: 0.0m tok/s: 8638126 +15/20000 train_loss: 5.3102 train_time: 0.0m tok/s: 8620471 +16/20000 train_loss: 5.2933 train_time: 0.0m tok/s: 8607356 +17/20000 train_loss: 5.1377 train_time: 0.0m tok/s: 8592032 +18/20000 train_loss: 5.0844 train_time: 0.0m tok/s: 8585383 +19/20000 train_loss: 5.0099 train_time: 0.0m tok/s: 8574284 +20/20000 train_loss: 4.9165 train_time: 0.0m tok/s: 8565371 +21/20000 train_loss: 4.8210 train_time: 0.0m tok/s: 8549746 +22/20000 train_loss: 4.8559 train_time: 0.0m tok/s: 8540030 +23/20000 train_loss: 4.7980 train_time: 0.0m tok/s: 8526436 +24/20000 train_loss: 4.9132 train_time: 0.0m tok/s: 8513113 +25/20000 train_loss: 4.6963 train_time: 0.0m tok/s: 8504926 +26/20000 train_loss: 4.7245 train_time: 0.0m tok/s: 8500667 +27/20000 train_loss: 4.6048 train_time: 0.0m tok/s: 8496905 +28/20000 train_loss: 4.6617 train_time: 0.0m tok/s: 8495571 +29/20000 train_loss: 4.6002 train_time: 0.0m tok/s: 8495370 +30/20000 train_loss: 4.5709 train_time: 0.0m tok/s: 8490619 +31/20000 train_loss: 4.5644 train_time: 0.0m tok/s: 8484751 +32/20000 train_loss: 4.5421 train_time: 0.0m tok/s: 8477705 +33/20000 train_loss: 4.5252 train_time: 0.1m tok/s: 8471873 +34/20000 train_loss: 4.4573 train_time: 0.1m tok/s: 8464569 +35/20000 train_loss: 4.3670 train_time: 0.1m tok/s: 8456342 +36/20000 train_loss: 4.5019 train_time: 0.1m tok/s: 8447991 +37/20000 train_loss: 4.4654 train_time: 0.1m tok/s: 8447650 +38/20000 train_loss: 4.3741 train_time: 0.1m tok/s: 8445256 +39/20000 train_loss: 4.5231 train_time: 0.1m tok/s: 8445273 +40/20000 train_loss: 4.4834 train_time: 0.1m tok/s: 8438097 +41/20000 train_loss: 4.3528 train_time: 0.1m tok/s: 8435987 +42/20000 train_loss: 4.2740 train_time: 0.1m tok/s: 8432526 +43/20000 train_loss: 4.3038 train_time: 0.1m tok/s: 8429744 +44/20000 train_loss: 4.2526 train_time: 0.1m tok/s: 8426438 +45/20000 train_loss: 4.3834 train_time: 0.1m tok/s: 8421537 +46/20000 train_loss: 4.2997 train_time: 0.1m tok/s: 8417575 +47/20000 train_loss: 4.1680 train_time: 0.1m tok/s: 8416414 +48/20000 train_loss: 4.2051 train_time: 0.1m tok/s: 8413488 +49/20000 train_loss: 4.1551 train_time: 0.1m tok/s: 8413390 +50/20000 train_loss: 4.1042 train_time: 0.1m tok/s: 8410172 +51/20000 train_loss: 4.3031 train_time: 0.1m tok/s: 8409375 +52/20000 train_loss: 4.2456 train_time: 0.1m tok/s: 8403704 +53/20000 train_loss: 4.1840 train_time: 0.1m tok/s: 8401315 +54/20000 train_loss: 4.1960 train_time: 0.1m tok/s: 8398949 +55/20000 train_loss: 4.1977 train_time: 0.1m tok/s: 8397614 +56/20000 train_loss: 4.1101 train_time: 0.1m tok/s: 8395810 +57/20000 train_loss: 4.1572 train_time: 0.1m tok/s: 8393608 +58/20000 train_loss: 4.0902 train_time: 0.1m tok/s: 8391305 +59/20000 train_loss: 4.0499 train_time: 0.1m tok/s: 8390804 +60/20000 train_loss: 3.9716 train_time: 0.1m tok/s: 8391160 +61/20000 train_loss: 3.9698 train_time: 0.1m tok/s: 8389518 +62/20000 train_loss: 4.0810 train_time: 0.1m tok/s: 8388845 +63/20000 train_loss: 4.1709 train_time: 0.1m tok/s: 8387858 +64/20000 train_loss: 3.9689 train_time: 0.1m tok/s: 8388269 +65/20000 train_loss: 4.0895 train_time: 0.1m tok/s: 8385569 +66/20000 train_loss: 4.0442 train_time: 0.1m tok/s: 8384422 +67/20000 train_loss: 3.9574 train_time: 0.1m tok/s: 8382318 +68/20000 train_loss: 3.9705 train_time: 0.1m tok/s: 8382313 +69/20000 train_loss: 3.8863 train_time: 0.1m tok/s: 8381121 +70/20000 train_loss: 3.9976 train_time: 0.1m tok/s: 8380991 +71/20000 train_loss: 3.9085 train_time: 0.1m tok/s: 8381230 +72/20000 train_loss: 4.0867 train_time: 0.1m tok/s: 8380366 +73/20000 train_loss: 3.8928 train_time: 0.1m tok/s: 8379742 +74/20000 train_loss: 3.9314 train_time: 0.1m tok/s: 8378604 +75/20000 train_loss: 3.9082 train_time: 0.1m tok/s: 8377173 +76/20000 train_loss: 3.8849 train_time: 0.1m tok/s: 8375511 +77/20000 train_loss: 3.8310 train_time: 0.1m tok/s: 8374999 +78/20000 train_loss: 3.7644 train_time: 0.1m tok/s: 8376061 +79/20000 train_loss: 3.8934 train_time: 0.1m tok/s: 8375460 +80/20000 train_loss: 3.8134 train_time: 0.1m tok/s: 8374503 +81/20000 train_loss: 3.7480 train_time: 0.1m tok/s: 8373020 +82/20000 train_loss: 3.7796 train_time: 0.1m tok/s: 8371661 +83/20000 train_loss: 3.6426 train_time: 0.1m tok/s: 8369820 +84/20000 train_loss: 3.7002 train_time: 0.1m tok/s: 8368942 +85/20000 train_loss: 3.6587 train_time: 0.1m tok/s: 8368513 +86/20000 train_loss: 3.4554 train_time: 0.1m tok/s: 8368176 +87/20000 train_loss: 3.6848 train_time: 0.1m tok/s: 8367957 +88/20000 train_loss: 3.5622 train_time: 0.1m tok/s: 8367459 +89/20000 train_loss: 3.5792 train_time: 0.1m tok/s: 8366796 +90/20000 train_loss: 3.6046 train_time: 0.1m tok/s: 8366447 +91/20000 train_loss: 3.6406 train_time: 0.1m tok/s: 8366409 +92/20000 train_loss: 3.7234 train_time: 0.1m tok/s: 8365772 +93/20000 train_loss: 3.6273 train_time: 0.1m tok/s: 8365245 +94/20000 train_loss: 3.6507 train_time: 0.1m tok/s: 8364384 +95/20000 train_loss: 3.6212 train_time: 0.1m tok/s: 8364490 +96/20000 train_loss: 3.5848 train_time: 0.2m tok/s: 8364564 +97/20000 train_loss: 3.4911 train_time: 0.2m tok/s: 8362939 +98/20000 train_loss: 3.5426 train_time: 0.2m tok/s: 8362268 +99/20000 train_loss: 3.5103 train_time: 0.2m tok/s: 8362408 +100/20000 train_loss: 3.4167 train_time: 0.2m tok/s: 8361685 +101/20000 train_loss: 3.4378 train_time: 0.2m tok/s: 8361600 +102/20000 train_loss: 3.4955 train_time: 0.2m tok/s: 8360739 +103/20000 train_loss: 3.3744 train_time: 0.2m tok/s: 8360847 +104/20000 train_loss: 3.4856 train_time: 0.2m tok/s: 8360680 +105/20000 train_loss: 3.3645 train_time: 0.2m tok/s: 8360072 +106/20000 train_loss: 3.4981 train_time: 0.2m tok/s: 8360320 +107/20000 train_loss: 3.2241 train_time: 0.2m tok/s: 8359165 +108/20000 train_loss: 3.4149 train_time: 0.2m tok/s: 8358105 +109/20000 train_loss: 3.4086 train_time: 0.2m tok/s: 8357400 +110/20000 train_loss: 3.4323 train_time: 0.2m tok/s: 8357205 +111/20000 train_loss: 3.4358 train_time: 0.2m tok/s: 8357400 +112/20000 train_loss: 3.4377 train_time: 0.2m tok/s: 8356744 +113/20000 train_loss: 3.3437 train_time: 0.2m tok/s: 8356772 +114/20000 train_loss: 3.3927 train_time: 0.2m tok/s: 8357091 +115/20000 train_loss: 3.4280 train_time: 0.2m tok/s: 8356696 +116/20000 train_loss: 3.2275 train_time: 0.2m tok/s: 8356292 +117/20000 train_loss: 3.4415 train_time: 0.2m tok/s: 8355531 +118/20000 train_loss: 3.3799 train_time: 0.2m tok/s: 8354809 +119/20000 train_loss: 3.3620 train_time: 0.2m tok/s: 8354197 +120/20000 train_loss: 3.3466 train_time: 0.2m tok/s: 8353454 +121/20000 train_loss: 3.2950 train_time: 0.2m tok/s: 8353931 +122/20000 train_loss: 3.3135 train_time: 0.2m tok/s: 8354104 +123/20000 train_loss: 3.2969 train_time: 0.2m tok/s: 8353151 +124/20000 train_loss: 3.3434 train_time: 0.2m tok/s: 8352155 +125/20000 train_loss: 3.2474 train_time: 0.2m tok/s: 8351813 +126/20000 train_loss: 3.2639 train_time: 0.2m tok/s: 8352017 +127/20000 train_loss: 3.2927 train_time: 0.2m tok/s: 8350771 +128/20000 train_loss: 3.3352 train_time: 0.2m tok/s: 8349666 +129/20000 train_loss: 3.3029 train_time: 0.2m tok/s: 8349368 +130/20000 train_loss: 3.2789 train_time: 0.2m tok/s: 8348553 +131/20000 train_loss: 3.2390 train_time: 0.2m tok/s: 8348426 +132/20000 train_loss: 3.1912 train_time: 0.2m tok/s: 8348185 +133/20000 train_loss: 3.2460 train_time: 0.2m tok/s: 8348365 +134/20000 train_loss: 3.1474 train_time: 0.2m tok/s: 8348510 +135/20000 train_loss: 2.9773 train_time: 0.2m tok/s: 8346370 +136/20000 train_loss: 3.2477 train_time: 0.2m tok/s: 8344932 +137/20000 train_loss: 3.0866 train_time: 0.2m tok/s: 8344432 +138/20000 train_loss: 3.2948 train_time: 0.2m tok/s: 8343768 +139/20000 train_loss: 3.2481 train_time: 0.2m tok/s: 8343348 +140/20000 train_loss: 3.1884 train_time: 0.2m tok/s: 8342973 +141/20000 train_loss: 3.0969 train_time: 0.2m tok/s: 8342797 +142/20000 train_loss: 3.3045 train_time: 0.2m tok/s: 8343192 +143/20000 train_loss: 3.3626 train_time: 0.2m tok/s: 8342650 +144/20000 train_loss: 3.2974 train_time: 0.2m tok/s: 8342508 +145/20000 train_loss: 3.2659 train_time: 0.2m tok/s: 8342426 +146/20000 train_loss: 3.2864 train_time: 0.2m tok/s: 8342642 +147/20000 train_loss: 3.1766 train_time: 0.2m tok/s: 8341825 +148/20000 train_loss: 3.2077 train_time: 0.2m tok/s: 8342372 +149/20000 train_loss: 3.2732 train_time: 0.2m tok/s: 8341855 +150/20000 train_loss: 3.2082 train_time: 0.2m tok/s: 8341874 +151/20000 train_loss: 3.5619 train_time: 0.2m tok/s: 8341042 +152/20000 train_loss: 3.1797 train_time: 0.2m tok/s: 8340396 +153/20000 train_loss: 3.3076 train_time: 0.2m tok/s: 8340518 +154/20000 train_loss: 3.2043 train_time: 0.2m tok/s: 8339940 +155/20000 train_loss: 3.1550 train_time: 0.2m tok/s: 8339161 +156/20000 train_loss: 3.0516 train_time: 0.2m tok/s: 8338495 +157/20000 train_loss: 3.1076 train_time: 0.2m tok/s: 8337962 +158/20000 train_loss: 3.1996 train_time: 0.2m tok/s: 8338015 +159/20000 train_loss: 3.0644 train_time: 0.2m tok/s: 8337946 +160/20000 train_loss: 3.1831 train_time: 0.3m tok/s: 8338087 +161/20000 train_loss: 3.1466 train_time: 0.3m tok/s: 8338053 +162/20000 train_loss: 3.0814 train_time: 0.3m tok/s: 8337265 +163/20000 train_loss: 3.1532 train_time: 0.3m tok/s: 8337328 +164/20000 train_loss: 3.0355 train_time: 0.3m tok/s: 8336380 +165/20000 train_loss: 3.2211 train_time: 0.3m tok/s: 8336058 +166/20000 train_loss: 3.1528 train_time: 0.3m tok/s: 8335884 +167/20000 train_loss: 3.1464 train_time: 0.3m tok/s: 8335556 +168/20000 train_loss: 3.1921 train_time: 0.3m tok/s: 8335519 +169/20000 train_loss: 3.1188 train_time: 0.3m tok/s: 8335255 +170/20000 train_loss: 2.8159 train_time: 0.3m tok/s: 8334906 +171/20000 train_loss: 3.1409 train_time: 0.3m tok/s: 8334385 +172/20000 train_loss: 3.0965 train_time: 0.3m tok/s: 8333404 +173/20000 train_loss: 3.2364 train_time: 0.3m tok/s: 8334548 +174/20000 train_loss: 3.1239 train_time: 0.3m tok/s: 8334528 +175/20000 train_loss: 3.1495 train_time: 0.3m tok/s: 8334680 +176/20000 train_loss: 3.1667 train_time: 0.3m tok/s: 8334789 +177/20000 train_loss: 3.1315 train_time: 0.3m tok/s: 8333952 +178/20000 train_loss: 2.9748 train_time: 0.3m tok/s: 8333712 +179/20000 train_loss: 3.3340 train_time: 0.3m tok/s: 8333381 +180/20000 train_loss: 2.9790 train_time: 0.3m tok/s: 8333289 +181/20000 train_loss: 2.9740 train_time: 0.3m tok/s: 8333688 +182/20000 train_loss: 3.0624 train_time: 0.3m tok/s: 8333299 +183/20000 train_loss: 3.0095 train_time: 0.3m tok/s: 8332846 +184/20000 train_loss: 3.0259 train_time: 0.3m tok/s: 8332886 +185/20000 train_loss: 2.7218 train_time: 0.3m tok/s: 8331738 +186/20000 train_loss: 3.1241 train_time: 0.3m tok/s: 8329917 +187/20000 train_loss: 3.0579 train_time: 0.3m tok/s: 8329758 +188/20000 train_loss: 3.2102 train_time: 0.3m tok/s: 8329720 +189/20000 train_loss: 3.5327 train_time: 0.3m tok/s: 8329731 +190/20000 train_loss: 3.0839 train_time: 0.3m tok/s: 8329363 +191/20000 train_loss: 3.0545 train_time: 0.3m tok/s: 8329492 +192/20000 train_loss: 3.0238 train_time: 0.3m tok/s: 8329286 +193/20000 train_loss: 3.0141 train_time: 0.3m tok/s: 8328888 +194/20000 train_loss: 3.0243 train_time: 0.3m tok/s: 8328371 +195/20000 train_loss: 2.9070 train_time: 0.3m tok/s: 8328491 +196/20000 train_loss: 3.1385 train_time: 0.3m tok/s: 8327997 +197/20000 train_loss: 3.0585 train_time: 0.3m tok/s: 8328053 +198/20000 train_loss: 3.0623 train_time: 0.3m tok/s: 8327837 +199/20000 train_loss: 3.0606 train_time: 0.3m tok/s: 8327901 +200/20000 train_loss: 3.0731 train_time: 0.3m tok/s: 8327860 +201/20000 train_loss: 3.1123 train_time: 0.3m tok/s: 8327736 +202/20000 train_loss: 3.3427 train_time: 0.3m tok/s: 8327002 +203/20000 train_loss: 3.0791 train_time: 0.3m tok/s: 8326692 +204/20000 train_loss: 3.0854 train_time: 0.3m tok/s: 8326437 +205/20000 train_loss: 3.0654 train_time: 0.3m tok/s: 8326428 +206/20000 train_loss: 2.9530 train_time: 0.3m tok/s: 8325930 +207/20000 train_loss: 3.0990 train_time: 0.3m tok/s: 8325818 +208/20000 train_loss: 2.9411 train_time: 0.3m tok/s: 8325674 +209/20000 train_loss: 3.0097 train_time: 0.3m tok/s: 8325673 +210/20000 train_loss: 3.0890 train_time: 0.3m tok/s: 8325072 +211/20000 train_loss: 3.2688 train_time: 0.3m tok/s: 8324541 +212/20000 train_loss: 3.0275 train_time: 0.3m tok/s: 8324604 +213/20000 train_loss: 2.9514 train_time: 0.3m tok/s: 8324119 +214/20000 train_loss: 3.1062 train_time: 0.3m tok/s: 8323875 +215/20000 train_loss: 3.0388 train_time: 0.3m tok/s: 8323917 +216/20000 train_loss: 3.0995 train_time: 0.3m tok/s: 8323945 +217/20000 train_loss: 3.0265 train_time: 0.3m tok/s: 8324106 +218/20000 train_loss: 3.0386 train_time: 0.3m tok/s: 8324046 +219/20000 train_loss: 3.1288 train_time: 0.3m tok/s: 8323651 +220/20000 train_loss: 3.3544 train_time: 0.3m tok/s: 8322833 +221/20000 train_loss: 2.9475 train_time: 0.3m tok/s: 8322463 +222/20000 train_loss: 2.9841 train_time: 0.3m tok/s: 8322462 +223/20000 train_loss: 3.0011 train_time: 0.4m tok/s: 8321978 +224/20000 train_loss: 3.0052 train_time: 0.4m tok/s: 8321573 +225/20000 train_loss: 3.0811 train_time: 0.4m tok/s: 8321048 +226/20000 train_loss: 3.0510 train_time: 0.4m tok/s: 8321201 +227/20000 train_loss: 3.0695 train_time: 0.4m tok/s: 8321486 +228/20000 train_loss: 3.0887 train_time: 0.4m tok/s: 8321773 +229/20000 train_loss: 3.0970 train_time: 0.4m tok/s: 8321865 +230/20000 train_loss: 2.9666 train_time: 0.4m tok/s: 8322032 +231/20000 train_loss: 3.1176 train_time: 0.4m tok/s: 8321984 +232/20000 train_loss: 2.9898 train_time: 0.4m tok/s: 8321943 +233/20000 train_loss: 3.0269 train_time: 0.4m tok/s: 8321368 +234/20000 train_loss: 3.0240 train_time: 0.4m tok/s: 8320782 +235/20000 train_loss: 2.9339 train_time: 0.4m tok/s: 8320711 +236/20000 train_loss: 3.0191 train_time: 0.4m tok/s: 8320749 +237/20000 train_loss: 2.9203 train_time: 0.4m tok/s: 8320384 +238/20000 train_loss: 3.0891 train_time: 0.4m tok/s: 8320099 +239/20000 train_loss: 3.0147 train_time: 0.4m tok/s: 8319745 +240/20000 train_loss: 3.1668 train_time: 0.4m tok/s: 8320011 +241/20000 train_loss: 3.0297 train_time: 0.4m tok/s: 8319989 +242/20000 train_loss: 3.1043 train_time: 0.4m tok/s: 8320093 +243/20000 train_loss: 3.0150 train_time: 0.4m tok/s: 8319800 +244/20000 train_loss: 3.0493 train_time: 0.4m tok/s: 8320381 +245/20000 train_loss: 2.9968 train_time: 0.4m tok/s: 8319510 +246/20000 train_loss: 3.0474 train_time: 0.4m tok/s: 8319516 +247/20000 train_loss: 2.9836 train_time: 0.4m tok/s: 8319352 +248/20000 train_loss: 2.9086 train_time: 0.4m tok/s: 8319602 +249/20000 train_loss: 2.9823 train_time: 0.4m tok/s: 8319600 +250/20000 train_loss: 2.9924 train_time: 0.4m tok/s: 8319499 +251/20000 train_loss: 2.9438 train_time: 0.4m tok/s: 8319590 +252/20000 train_loss: 2.9413 train_time: 0.4m tok/s: 8319665 +253/20000 train_loss: 3.0321 train_time: 0.4m tok/s: 8319493 +254/20000 train_loss: 3.0922 train_time: 0.4m tok/s: 8319478 +255/20000 train_loss: 3.1038 train_time: 0.4m tok/s: 8319508 +256/20000 train_loss: 2.9682 train_time: 0.4m tok/s: 8319107 +257/20000 train_loss: 2.9689 train_time: 0.4m tok/s: 8318673 +258/20000 train_loss: 3.0234 train_time: 0.4m tok/s: 8318531 +259/20000 train_loss: 2.9497 train_time: 0.4m tok/s: 8318533 +260/20000 train_loss: 3.1592 train_time: 0.4m tok/s: 8318332 +261/20000 train_loss: 2.9402 train_time: 0.4m tok/s: 8318232 +262/20000 train_loss: 2.7902 train_time: 0.4m tok/s: 8318357 +263/20000 train_loss: 2.8018 train_time: 0.4m tok/s: 8318425 +264/20000 train_loss: 2.9812 train_time: 0.4m tok/s: 8318074 +265/20000 train_loss: 2.9997 train_time: 0.4m tok/s: 8317775 +266/20000 train_loss: 2.9256 train_time: 0.4m tok/s: 8317297 +267/20000 train_loss: 2.9390 train_time: 0.4m tok/s: 8317281 +268/20000 train_loss: 3.0120 train_time: 0.4m tok/s: 8317301 +269/20000 train_loss: 3.0136 train_time: 0.4m tok/s: 8317194 +270/20000 train_loss: 3.0072 train_time: 0.4m tok/s: 8317187 +271/20000 train_loss: 3.0125 train_time: 0.4m tok/s: 8317415 +272/20000 train_loss: 3.0722 train_time: 0.4m tok/s: 8317007 +273/20000 train_loss: 2.9296 train_time: 0.4m tok/s: 8317103 +274/20000 train_loss: 3.0293 train_time: 0.4m tok/s: 8316968 +275/20000 train_loss: 2.9473 train_time: 0.4m tok/s: 8317130 +276/20000 train_loss: 2.8838 train_time: 0.4m tok/s: 8316906 +277/20000 train_loss: 2.8777 train_time: 0.4m tok/s: 8316656 +278/20000 train_loss: 2.8458 train_time: 0.4m tok/s: 8316214 +279/20000 train_loss: 2.9762 train_time: 0.4m tok/s: 8315692 +280/20000 train_loss: 3.0162 train_time: 0.4m tok/s: 8315348 +281/20000 train_loss: 2.7724 train_time: 0.4m tok/s: 8315552 +282/20000 train_loss: 3.0694 train_time: 0.4m tok/s: 8315443 +283/20000 train_loss: 2.8821 train_time: 0.4m tok/s: 8315272 +284/20000 train_loss: 2.9196 train_time: 0.4m tok/s: 8315257 +285/20000 train_loss: 2.9746 train_time: 0.4m tok/s: 8315425 +286/20000 train_loss: 2.9966 train_time: 0.5m tok/s: 8315469 +287/20000 train_loss: 2.8318 train_time: 0.5m tok/s: 8315342 +288/20000 train_loss: 2.9704 train_time: 0.5m tok/s: 8314710 +289/20000 train_loss: 2.8875 train_time: 0.5m tok/s: 8314488 +290/20000 train_loss: 2.9187 train_time: 0.5m tok/s: 8314460 +291/20000 train_loss: 2.8888 train_time: 0.5m tok/s: 8314539 +292/20000 train_loss: 2.7252 train_time: 0.5m tok/s: 8314210 +293/20000 train_loss: 2.9414 train_time: 0.5m tok/s: 8314088 +294/20000 train_loss: 3.0647 train_time: 0.5m tok/s: 8313816 +295/20000 train_loss: 3.0103 train_time: 0.5m tok/s: 8313881 +296/20000 train_loss: 3.0701 train_time: 0.5m tok/s: 8313810 +297/20000 train_loss: 2.9548 train_time: 0.5m tok/s: 8313545 +298/20000 train_loss: 2.9967 train_time: 0.5m tok/s: 8313643 +299/20000 train_loss: 2.8260 train_time: 0.5m tok/s: 8313647 +300/20000 train_loss: 3.0274 train_time: 0.5m tok/s: 8313386 +301/20000 train_loss: 2.9712 train_time: 0.5m tok/s: 8313458 +302/20000 train_loss: 2.8679 train_time: 0.5m tok/s: 8313346 +303/20000 train_loss: 2.9336 train_time: 0.5m tok/s: 8313547 +304/20000 train_loss: 2.9505 train_time: 0.5m tok/s: 8313187 +305/20000 train_loss: 2.9434 train_time: 0.5m tok/s: 8313224 +306/20000 train_loss: 3.0113 train_time: 0.5m tok/s: 8313073 +307/20000 train_loss: 2.9189 train_time: 0.5m tok/s: 8313041 +308/20000 train_loss: 2.9108 train_time: 0.5m tok/s: 8312663 +309/20000 train_loss: 3.0495 train_time: 0.5m tok/s: 8312574 +310/20000 train_loss: 2.8708 train_time: 0.5m tok/s: 8312726 +311/20000 train_loss: 2.9414 train_time: 0.5m tok/s: 8312084 +312/20000 train_loss: 2.8375 train_time: 0.5m tok/s: 8312567 +313/20000 train_loss: 2.8410 train_time: 0.5m tok/s: 8312609 +314/20000 train_loss: 2.8783 train_time: 0.5m tok/s: 8312332 +315/20000 train_loss: 2.9584 train_time: 0.5m tok/s: 8312363 +316/20000 train_loss: 2.6926 train_time: 0.5m tok/s: 8311979 +317/20000 train_loss: 2.8154 train_time: 0.5m tok/s: 8311751 +318/20000 train_loss: 2.9254 train_time: 0.5m tok/s: 8311549 +319/20000 train_loss: 2.9093 train_time: 0.5m tok/s: 8311300 +320/20000 train_loss: 3.0335 train_time: 0.5m tok/s: 8311092 +321/20000 train_loss: 3.0073 train_time: 0.5m tok/s: 8310757 +322/20000 train_loss: 2.9691 train_time: 0.5m tok/s: 8310814 +323/20000 train_loss: 3.0084 train_time: 0.5m tok/s: 8310869 +324/20000 train_loss: 2.9123 train_time: 0.5m tok/s: 8310525 +325/20000 train_loss: 2.8994 train_time: 0.5m tok/s: 8310474 +326/20000 train_loss: 2.9003 train_time: 0.5m tok/s: 8310596 +327/20000 train_loss: 2.8339 train_time: 0.5m tok/s: 8310742 +328/20000 train_loss: 2.8576 train_time: 0.5m tok/s: 8310434 +329/20000 train_loss: 2.8284 train_time: 0.5m tok/s: 8310471 +330/20000 train_loss: 2.7833 train_time: 0.5m tok/s: 8310618 +331/20000 train_loss: 2.8970 train_time: 0.5m tok/s: 8309079 +332/20000 train_loss: 2.9814 train_time: 0.5m tok/s: 8309740 +333/20000 train_loss: 2.8781 train_time: 0.5m tok/s: 8309788 +334/20000 train_loss: 3.1051 train_time: 0.5m tok/s: 8309479 +335/20000 train_loss: 2.8554 train_time: 0.5m tok/s: 8309255 +336/20000 train_loss: 2.9344 train_time: 0.5m tok/s: 8309170 +337/20000 train_loss: 2.8251 train_time: 0.5m tok/s: 8309378 +338/20000 train_loss: 2.8984 train_time: 0.5m tok/s: 8309139 +339/20000 train_loss: 2.9438 train_time: 0.5m tok/s: 8309158 +340/20000 train_loss: 2.9740 train_time: 0.5m tok/s: 8309027 +341/20000 train_loss: 2.9302 train_time: 0.5m tok/s: 8309013 +342/20000 train_loss: 2.8124 train_time: 0.5m tok/s: 8309044 +343/20000 train_loss: 2.9206 train_time: 0.5m tok/s: 8309156 +344/20000 train_loss: 2.8295 train_time: 0.5m tok/s: 8308962 +345/20000 train_loss: 2.8716 train_time: 0.5m tok/s: 8308871 +346/20000 train_loss: 2.8812 train_time: 0.5m tok/s: 8308869 +347/20000 train_loss: 2.8985 train_time: 0.5m tok/s: 8308868 +348/20000 train_loss: 2.8631 train_time: 0.5m tok/s: 8308617 +349/20000 train_loss: 2.9373 train_time: 0.6m tok/s: 8308709 +350/20000 train_loss: 2.7874 train_time: 0.6m tok/s: 8308761 +351/20000 train_loss: 2.8082 train_time: 0.6m tok/s: 8308960 +352/20000 train_loss: 2.7819 train_time: 0.6m tok/s: 8308819 +353/20000 train_loss: 2.6282 train_time: 0.6m tok/s: 8308401 +354/20000 train_loss: 2.9905 train_time: 0.6m tok/s: 8307983 +355/20000 train_loss: 2.9190 train_time: 0.6m tok/s: 8307745 +356/20000 train_loss: 2.8343 train_time: 0.6m tok/s: 8307463 +357/20000 train_loss: 2.7883 train_time: 0.6m tok/s: 8307149 +358/20000 train_loss: 2.7978 train_time: 0.6m tok/s: 8307154 +359/20000 train_loss: 2.9036 train_time: 0.6m tok/s: 8307173 +360/20000 train_loss: 2.8878 train_time: 0.6m tok/s: 8306947 +361/20000 train_loss: 2.9643 train_time: 0.6m tok/s: 8306796 +362/20000 train_loss: 2.8693 train_time: 0.6m tok/s: 8306609 +363/20000 train_loss: 2.9555 train_time: 0.6m tok/s: 8306449 +364/20000 train_loss: 2.8194 train_time: 0.6m tok/s: 8306378 +365/20000 train_loss: 2.8047 train_time: 0.6m tok/s: 8306308 +366/20000 train_loss: 2.8134 train_time: 0.6m tok/s: 8306187 +367/20000 train_loss: 2.9347 train_time: 0.6m tok/s: 8306247 +368/20000 train_loss: 2.7426 train_time: 0.6m tok/s: 8306092 +369/20000 train_loss: 2.8960 train_time: 0.6m tok/s: 8306209 +370/20000 train_loss: 2.8821 train_time: 0.6m tok/s: 8306250 +371/20000 train_loss: 2.8808 train_time: 0.6m tok/s: 8306232 +372/20000 train_loss: 2.8391 train_time: 0.6m tok/s: 8306104 +373/20000 train_loss: 2.7237 train_time: 0.6m tok/s: 8306009 +374/20000 train_loss: 2.7375 train_time: 0.6m tok/s: 8306029 +375/20000 train_loss: 2.6791 train_time: 0.6m tok/s: 8305982 +376/20000 train_loss: 2.9179 train_time: 0.6m tok/s: 8305586 +377/20000 train_loss: 2.7360 train_time: 0.6m tok/s: 8305482 +378/20000 train_loss: 2.8210 train_time: 0.6m tok/s: 8305502 +379/20000 train_loss: 2.8791 train_time: 0.6m tok/s: 8305423 +380/20000 train_loss: 2.8845 train_time: 0.6m tok/s: 8305461 +381/20000 train_loss: 2.9045 train_time: 0.6m tok/s: 8305396 +382/20000 train_loss: 2.9593 train_time: 0.6m tok/s: 8305384 +383/20000 train_loss: 2.9515 train_time: 0.6m tok/s: 8305385 +384/20000 train_loss: 2.8185 train_time: 0.6m tok/s: 8305093 +385/20000 train_loss: 2.8374 train_time: 0.6m tok/s: 8305159 +386/20000 train_loss: 2.8785 train_time: 0.6m tok/s: 8305031 +387/20000 train_loss: 3.0511 train_time: 0.6m tok/s: 8304672 +388/20000 train_loss: 2.8788 train_time: 0.6m tok/s: 8304616 +389/20000 train_loss: 2.9178 train_time: 0.6m tok/s: 8304649 +390/20000 train_loss: 2.7546 train_time: 0.6m tok/s: 8304629 +391/20000 train_loss: 2.7221 train_time: 0.6m tok/s: 8304607 +392/20000 train_loss: 2.7834 train_time: 0.6m tok/s: 8304521 +393/20000 train_loss: 2.8514 train_time: 0.6m tok/s: 8304572 +394/20000 train_loss: 2.8429 train_time: 0.6m tok/s: 8304619 +395/20000 train_loss: 2.9336 train_time: 0.6m tok/s: 8304394 +396/20000 train_loss: 2.8267 train_time: 0.6m tok/s: 8304404 +397/20000 train_loss: 2.8330 train_time: 0.6m tok/s: 8304331 +398/20000 train_loss: 2.8620 train_time: 0.6m tok/s: 8304243 +399/20000 train_loss: 2.7743 train_time: 0.6m tok/s: 8304308 +400/20000 train_loss: 2.8733 train_time: 0.6m tok/s: 8304196 +401/20000 train_loss: 2.8608 train_time: 0.6m tok/s: 8304369 +402/20000 train_loss: 2.7291 train_time: 0.6m tok/s: 8304295 +403/20000 train_loss: 2.9593 train_time: 0.6m tok/s: 8304259 +404/20000 train_loss: 2.9341 train_time: 0.6m tok/s: 8303881 +405/20000 train_loss: 2.9343 train_time: 0.6m tok/s: 8303696 +406/20000 train_loss: 2.8108 train_time: 0.6m tok/s: 8303439 +407/20000 train_loss: 2.8319 train_time: 0.6m tok/s: 8303230 +408/20000 train_loss: 2.8349 train_time: 0.6m tok/s: 8303207 +409/20000 train_loss: 2.8087 train_time: 0.6m tok/s: 8303194 +410/20000 train_loss: 2.8774 train_time: 0.6m tok/s: 8303319 +411/20000 train_loss: 2.8142 train_time: 0.6m tok/s: 8303269 +412/20000 train_loss: 2.8197 train_time: 0.7m tok/s: 8303262 +413/20000 train_loss: 2.7056 train_time: 0.7m tok/s: 8302987 +414/20000 train_loss: 2.7276 train_time: 0.7m tok/s: 8302925 +415/20000 train_loss: 2.7043 train_time: 0.7m tok/s: 8302825 +416/20000 train_loss: 2.7731 train_time: 0.7m tok/s: 8302646 +417/20000 train_loss: 2.7717 train_time: 0.7m tok/s: 8302288 +418/20000 train_loss: 2.7908 train_time: 0.7m tok/s: 8302331 +419/20000 train_loss: 2.8115 train_time: 0.7m tok/s: 8302327 +420/20000 train_loss: 2.8026 train_time: 0.7m tok/s: 8302436 +421/20000 train_loss: 2.8749 train_time: 0.7m tok/s: 8302306 +422/20000 train_loss: 2.8429 train_time: 0.7m tok/s: 8302424 +423/20000 train_loss: 2.8367 train_time: 0.7m tok/s: 8302198 +424/20000 train_loss: 2.9085 train_time: 0.7m tok/s: 8301950 +425/20000 train_loss: 2.8109 train_time: 0.7m tok/s: 8301553 +426/20000 train_loss: 2.8344 train_time: 0.7m tok/s: 8300989 +427/20000 train_loss: 2.8290 train_time: 0.7m tok/s: 8300901 +428/20000 train_loss: 2.7953 train_time: 0.7m tok/s: 8300855 +429/20000 train_loss: 2.7342 train_time: 0.7m tok/s: 8300957 +430/20000 train_loss: 2.8673 train_time: 0.7m tok/s: 8301096 +431/20000 train_loss: 2.6859 train_time: 0.7m tok/s: 8300964 +432/20000 train_loss: 2.7375 train_time: 0.7m tok/s: 8301064 +433/20000 train_loss: 2.6797 train_time: 0.7m tok/s: 8301122 +434/20000 train_loss: 2.6712 train_time: 0.7m tok/s: 8300700 +435/20000 train_loss: 2.8652 train_time: 0.7m tok/s: 8300274 +436/20000 train_loss: 2.4884 train_time: 0.7m tok/s: 8300064 +437/20000 train_loss: 2.7448 train_time: 0.7m tok/s: 8299988 +438/20000 train_loss: 2.8636 train_time: 0.7m tok/s: 8299964 +439/20000 train_loss: 2.7630 train_time: 0.7m tok/s: 8299210 +440/20000 train_loss: 2.6789 train_time: 0.7m tok/s: 8299221 +441/20000 train_loss: 2.9186 train_time: 0.7m tok/s: 8299236 +442/20000 train_loss: 2.9713 train_time: 0.7m tok/s: 8299175 +443/20000 train_loss: 2.9180 train_time: 0.7m tok/s: 8299190 +444/20000 train_loss: 2.9314 train_time: 0.7m tok/s: 8299134 +445/20000 train_loss: 2.8953 train_time: 0.7m tok/s: 8298992 +446/20000 train_loss: 2.7732 train_time: 0.7m tok/s: 8298912 +447/20000 train_loss: 2.7944 train_time: 0.7m tok/s: 8298647 +448/20000 train_loss: 2.8141 train_time: 0.7m tok/s: 8298669 +449/20000 train_loss: 2.7880 train_time: 0.7m tok/s: 8298672 +450/20000 train_loss: 2.8260 train_time: 0.7m tok/s: 8298779 +451/20000 train_loss: 2.5308 train_time: 0.7m tok/s: 8298621 +452/20000 train_loss: 2.7551 train_time: 0.7m tok/s: 8298211 +453/20000 train_loss: 2.6953 train_time: 0.7m tok/s: 8297998 +454/20000 train_loss: 2.7066 train_time: 0.7m tok/s: 8297747 +455/20000 train_loss: 2.7656 train_time: 0.7m tok/s: 8297741 +456/20000 train_loss: 2.7799 train_time: 0.7m tok/s: 8297688 +457/20000 train_loss: 2.6943 train_time: 0.7m tok/s: 8297520 +458/20000 train_loss: 2.7682 train_time: 0.7m tok/s: 8297350 +459/20000 train_loss: 2.8875 train_time: 0.7m tok/s: 8297257 +460/20000 train_loss: 2.8140 train_time: 0.7m tok/s: 8297277 +461/20000 train_loss: 2.8753 train_time: 0.7m tok/s: 8297434 +462/20000 train_loss: 2.9237 train_time: 0.7m tok/s: 8297432 +463/20000 train_loss: 2.8135 train_time: 0.7m tok/s: 8297587 +464/20000 train_loss: 2.7679 train_time: 0.7m tok/s: 8297645 +465/20000 train_loss: 2.9482 train_time: 0.7m tok/s: 8297505 +466/20000 train_loss: 2.8486 train_time: 0.7m tok/s: 8297369 +467/20000 train_loss: 2.8439 train_time: 0.7m tok/s: 8297383 +468/20000 train_loss: 2.9954 train_time: 0.7m tok/s: 8296997 +469/20000 train_loss: 2.7397 train_time: 0.7m tok/s: 8296645 +470/20000 train_loss: 2.7636 train_time: 0.7m tok/s: 8296525 +471/20000 train_loss: 2.8820 train_time: 0.7m tok/s: 8296332 +472/20000 train_loss: 2.9836 train_time: 0.7m tok/s: 8296204 +473/20000 train_loss: 2.7063 train_time: 0.7m tok/s: 8295652 +474/20000 train_loss: 2.6838 train_time: 0.7m tok/s: 8295520 +475/20000 train_loss: 2.8623 train_time: 0.8m tok/s: 8295397 +476/20000 train_loss: 2.6205 train_time: 0.8m tok/s: 8295125 +477/20000 train_loss: 2.7206 train_time: 0.8m tok/s: 8295108 +478/20000 train_loss: 2.8284 train_time: 0.8m tok/s: 8295023 +479/20000 train_loss: 2.7982 train_time: 0.8m tok/s: 8294921 +480/20000 train_loss: 3.0527 train_time: 0.8m tok/s: 8294748 +481/20000 train_loss: 2.8510 train_time: 0.8m tok/s: 8294792 +482/20000 train_loss: 2.7812 train_time: 0.8m tok/s: 8294876 +483/20000 train_loss: 2.8218 train_time: 0.8m tok/s: 8294960 +484/20000 train_loss: 2.8879 train_time: 0.8m tok/s: 8294996 +485/20000 train_loss: 2.7614 train_time: 0.8m tok/s: 8294994 +486/20000 train_loss: 2.7527 train_time: 0.8m tok/s: 8294773 +487/20000 train_loss: 2.8213 train_time: 0.8m tok/s: 8294707 +488/20000 train_loss: 2.7674 train_time: 0.8m tok/s: 8294724 +489/20000 train_loss: 2.3654 train_time: 0.8m tok/s: 8294623 +490/20000 train_loss: 2.8683 train_time: 0.8m tok/s: 8294320 +491/20000 train_loss: 2.7863 train_time: 0.8m tok/s: 8294447 +492/20000 train_loss: 2.7823 train_time: 0.8m tok/s: 8294606 +493/20000 train_loss: 2.6790 train_time: 0.8m tok/s: 8294654 +494/20000 train_loss: 2.6902 train_time: 0.8m tok/s: 8294411 +495/20000 train_loss: 2.8026 train_time: 0.8m tok/s: 8294271 +496/20000 train_loss: 2.6926 train_time: 0.8m tok/s: 8294195 +497/20000 train_loss: 2.9288 train_time: 0.8m tok/s: 8294176 +498/20000 train_loss: 2.8392 train_time: 0.8m tok/s: 8294190 +499/20000 train_loss: 2.9333 train_time: 0.8m tok/s: 8294243 +500/20000 train_loss: 2.7403 train_time: 0.8m tok/s: 8294290 +501/20000 train_loss: 2.9151 train_time: 0.8m tok/s: 8294287 +502/20000 train_loss: 2.7125 train_time: 0.8m tok/s: 8294348 +503/20000 train_loss: 2.7869 train_time: 0.8m tok/s: 8294294 +504/20000 train_loss: 2.6870 train_time: 0.8m tok/s: 8294266 +505/20000 train_loss: 2.8791 train_time: 0.8m tok/s: 8294238 +506/20000 train_loss: 2.7955 train_time: 0.8m tok/s: 8294159 +507/20000 train_loss: 2.7744 train_time: 0.8m tok/s: 8293943 +508/20000 train_loss: 2.9228 train_time: 0.8m tok/s: 8293922 +509/20000 train_loss: 2.9088 train_time: 0.8m tok/s: 8294044 +510/20000 train_loss: 2.7034 train_time: 0.8m tok/s: 8294017 +511/20000 train_loss: 2.8775 train_time: 0.8m tok/s: 8293960 +512/20000 train_loss: 2.8526 train_time: 0.8m tok/s: 8294001 +513/20000 train_loss: 2.8954 train_time: 0.8m tok/s: 8294170 +514/20000 train_loss: 2.8517 train_time: 0.8m tok/s: 8294293 +515/20000 train_loss: 2.8551 train_time: 0.8m tok/s: 8294280 +516/20000 train_loss: 2.7305 train_time: 0.8m tok/s: 8294359 +517/20000 train_loss: 2.8161 train_time: 0.8m tok/s: 8294231 +518/20000 train_loss: 2.9461 train_time: 0.8m tok/s: 8293893 +519/20000 train_loss: 2.7509 train_time: 0.8m tok/s: 8293956 +520/20000 train_loss: 2.6679 train_time: 0.8m tok/s: 8293923 +521/20000 train_loss: 2.7632 train_time: 0.8m tok/s: 8293825 +522/20000 train_loss: 2.7464 train_time: 0.8m tok/s: 8293627 +523/20000 train_loss: 2.7304 train_time: 0.8m tok/s: 8293580 +524/20000 train_loss: 2.7942 train_time: 0.8m tok/s: 8293512 +525/20000 train_loss: 2.7257 train_time: 0.8m tok/s: 8293347 +526/20000 train_loss: 2.8331 train_time: 0.8m tok/s: 8293320 +527/20000 train_loss: 2.8885 train_time: 0.8m tok/s: 8293455 +528/20000 train_loss: 2.8700 train_time: 0.8m tok/s: 8293359 +529/20000 train_loss: 2.8802 train_time: 0.8m tok/s: 8293240 +530/20000 train_loss: 2.9022 train_time: 0.8m tok/s: 8293146 +531/20000 train_loss: 3.2454 train_time: 0.8m tok/s: 8292928 +532/20000 train_loss: 3.1100 train_time: 0.8m tok/s: 8292646 +533/20000 train_loss: 2.6812 train_time: 0.8m tok/s: 8292619 +534/20000 train_loss: 2.8715 train_time: 0.8m tok/s: 8292572 +535/20000 train_loss: 2.7872 train_time: 0.8m tok/s: 8292511 +536/20000 train_loss: 2.6815 train_time: 0.8m tok/s: 8292289 +537/20000 train_loss: 2.8991 train_time: 0.8m tok/s: 8292048 +538/20000 train_loss: 2.7389 train_time: 0.9m tok/s: 8292026 +539/20000 train_loss: 2.8501 train_time: 0.9m tok/s: 8291895 +540/20000 train_loss: 2.8654 train_time: 0.9m tok/s: 8291722 +541/20000 train_loss: 2.2791 train_time: 0.9m tok/s: 8291517 +542/20000 train_loss: 2.8491 train_time: 0.9m tok/s: 8291290 +543/20000 train_loss: 2.8141 train_time: 0.9m tok/s: 8291340 +544/20000 train_loss: 2.8392 train_time: 0.9m tok/s: 8291111 +545/20000 train_loss: 2.7842 train_time: 0.9m tok/s: 8291173 +546/20000 train_loss: 2.8149 train_time: 0.9m tok/s: 8291197 +547/20000 train_loss: 2.7711 train_time: 0.9m tok/s: 8291120 +548/20000 train_loss: 2.7337 train_time: 0.9m tok/s: 8291026 +549/20000 train_loss: 2.6841 train_time: 0.9m tok/s: 8290928 +550/20000 train_loss: 2.7737 train_time: 0.9m tok/s: 8290820 +551/20000 train_loss: 2.7324 train_time: 0.9m tok/s: 8290771 +552/20000 train_loss: 2.9136 train_time: 0.9m tok/s: 8290162 +553/20000 train_loss: 2.7469 train_time: 0.9m tok/s: 8289747 +554/20000 train_loss: 2.5866 train_time: 0.9m tok/s: 8289787 +555/20000 train_loss: 2.6626 train_time: 0.9m tok/s: 8289859 +556/20000 train_loss: 2.7597 train_time: 0.9m tok/s: 8289932 +557/20000 train_loss: 2.8748 train_time: 0.9m tok/s: 8289880 +558/20000 train_loss: 2.8290 train_time: 0.9m tok/s: 8289740 +559/20000 train_loss: 2.7504 train_time: 0.9m tok/s: 8289935 +560/20000 train_loss: 2.7944 train_time: 0.9m tok/s: 8289702 +561/20000 train_loss: 2.7937 train_time: 0.9m tok/s: 8289647 +562/20000 train_loss: 2.8463 train_time: 0.9m tok/s: 8289543 +563/20000 train_loss: 2.8375 train_time: 0.9m tok/s: 8289487 +564/20000 train_loss: 2.9501 train_time: 0.9m tok/s: 8289415 +565/20000 train_loss: 2.8507 train_time: 0.9m tok/s: 8289323 +566/20000 train_loss: 2.7473 train_time: 0.9m tok/s: 8289377 +567/20000 train_loss: 2.6978 train_time: 0.9m tok/s: 8289413 +568/20000 train_loss: 2.8259 train_time: 0.9m tok/s: 8289409 +569/20000 train_loss: 2.6771 train_time: 0.9m tok/s: 8289350 +570/20000 train_loss: 2.6827 train_time: 0.9m tok/s: 8289086 +571/20000 train_loss: 2.7961 train_time: 0.9m tok/s: 8288965 +572/20000 train_loss: 2.6352 train_time: 0.9m tok/s: 8288759 +573/20000 train_loss: 2.6476 train_time: 0.9m tok/s: 8288692 +574/20000 train_loss: 2.7376 train_time: 0.9m tok/s: 8288724 +575/20000 train_loss: 2.5106 train_time: 0.9m tok/s: 8288719 +576/20000 train_loss: 2.7705 train_time: 0.9m tok/s: 8288579 +577/20000 train_loss: 2.8643 train_time: 0.9m tok/s: 8288498 +578/20000 train_loss: 2.8401 train_time: 0.9m tok/s: 8288546 +579/20000 train_loss: 2.7378 train_time: 0.9m tok/s: 8288641 +580/20000 train_loss: 2.8131 train_time: 0.9m tok/s: 8288643 +581/20000 train_loss: 2.7786 train_time: 0.9m tok/s: 8288676 +582/20000 train_loss: 2.7746 train_time: 0.9m tok/s: 8288533 +583/20000 train_loss: 2.7302 train_time: 0.9m tok/s: 8288472 +584/20000 train_loss: 2.7907 train_time: 0.9m tok/s: 8288540 +585/20000 train_loss: 2.8064 train_time: 0.9m tok/s: 8288570 +586/20000 train_loss: 2.6427 train_time: 0.9m tok/s: 8288404 +587/20000 train_loss: 2.7397 train_time: 0.9m tok/s: 8288511 +588/20000 train_loss: 2.7161 train_time: 0.9m tok/s: 8288502 +589/20000 train_loss: 2.7468 train_time: 0.9m tok/s: 8288434 +590/20000 train_loss: 2.7641 train_time: 0.9m tok/s: 8288386 +591/20000 train_loss: 2.7562 train_time: 0.9m tok/s: 8288496 +592/20000 train_loss: 2.7435 train_time: 0.9m tok/s: 8288383 +593/20000 train_loss: 2.7450 train_time: 0.9m tok/s: 8288405 +594/20000 train_loss: 2.6428 train_time: 0.9m tok/s: 8288151 +595/20000 train_loss: 2.8029 train_time: 0.9m tok/s: 8287908 +596/20000 train_loss: 2.6775 train_time: 0.9m tok/s: 8287989 +597/20000 train_loss: 2.7568 train_time: 0.9m tok/s: 8288134 +598/20000 train_loss: 2.7956 train_time: 0.9m tok/s: 8287913 +599/20000 train_loss: 2.7026 train_time: 0.9m tok/s: 8287831 +600/20000 train_loss: 2.7461 train_time: 0.9m tok/s: 8287725 +601/20000 train_loss: 2.7297 train_time: 1.0m tok/s: 8287679 +602/20000 train_loss: 2.7705 train_time: 1.0m tok/s: 8287621 +603/20000 train_loss: 2.7604 train_time: 1.0m tok/s: 8287453 +604/20000 train_loss: 2.7505 train_time: 1.0m tok/s: 8287359 +605/20000 train_loss: 2.6566 train_time: 1.0m tok/s: 8287226 +606/20000 train_loss: 2.6580 train_time: 1.0m tok/s: 8287212 +607/20000 train_loss: 2.7458 train_time: 1.0m tok/s: 8287370 +608/20000 train_loss: 2.6634 train_time: 1.0m tok/s: 8287410 +609/20000 train_loss: 2.7300 train_time: 1.0m tok/s: 8287390 +610/20000 train_loss: 2.7785 train_time: 1.0m tok/s: 8287552 +611/20000 train_loss: 2.8894 train_time: 1.0m tok/s: 8287337 +612/20000 train_loss: 2.8280 train_time: 1.0m tok/s: 8287157 +613/20000 train_loss: 2.8101 train_time: 1.0m tok/s: 8287072 +614/20000 train_loss: 2.8088 train_time: 1.0m tok/s: 8287080 +615/20000 train_loss: 2.7540 train_time: 1.0m tok/s: 8287050 +616/20000 train_loss: 2.7762 train_time: 1.0m tok/s: 8286948 +617/20000 train_loss: 2.7297 train_time: 1.0m tok/s: 8286903 +618/20000 train_loss: 2.7423 train_time: 1.0m tok/s: 8287007 +619/20000 train_loss: 2.7874 train_time: 1.0m tok/s: 8287015 +620/20000 train_loss: 2.8953 train_time: 1.0m tok/s: 8286918 +621/20000 train_loss: 2.6942 train_time: 1.0m tok/s: 8286933 +622/20000 train_loss: 2.7251 train_time: 1.0m tok/s: 8286942 +623/20000 train_loss: 2.7307 train_time: 1.0m tok/s: 8287013 +624/20000 train_loss: 2.4431 train_time: 1.0m tok/s: 8286821 +625/20000 train_loss: 2.7564 train_time: 1.0m tok/s: 8286731 +626/20000 train_loss: 2.9042 train_time: 1.0m tok/s: 8286740 +627/20000 train_loss: 2.6935 train_time: 1.0m tok/s: 8286538 +628/20000 train_loss: 2.8658 train_time: 1.0m tok/s: 8286457 +629/20000 train_loss: 2.8489 train_time: 1.0m tok/s: 8286465 +630/20000 train_loss: 2.6996 train_time: 1.0m tok/s: 8286327 +631/20000 train_loss: 2.8275 train_time: 1.0m tok/s: 8286410 +632/20000 train_loss: 2.8385 train_time: 1.0m tok/s: 8286495 +633/20000 train_loss: 2.7147 train_time: 1.0m tok/s: 8286447 +634/20000 train_loss: 2.9406 train_time: 1.0m tok/s: 8286470 +635/20000 train_loss: 2.7361 train_time: 1.0m tok/s: 8286344 +636/20000 train_loss: 2.8817 train_time: 1.0m tok/s: 8286219 +637/20000 train_loss: 2.7594 train_time: 1.0m tok/s: 8286192 +638/20000 train_loss: 2.5746 train_time: 1.0m tok/s: 8286212 +639/20000 train_loss: 2.7368 train_time: 1.0m tok/s: 8286190 +640/20000 train_loss: 2.7232 train_time: 1.0m tok/s: 8286156 +641/20000 train_loss: 2.7928 train_time: 1.0m tok/s: 8285959 +642/20000 train_loss: 2.7989 train_time: 1.0m tok/s: 8285090 +643/20000 train_loss: 2.7634 train_time: 1.0m tok/s: 8286139 +644/20000 train_loss: 2.8092 train_time: 1.0m tok/s: 8286142 +645/20000 train_loss: 2.8807 train_time: 1.0m tok/s: 8286069 +646/20000 train_loss: 2.7755 train_time: 1.0m tok/s: 8286083 +647/20000 train_loss: 2.8327 train_time: 1.0m tok/s: 8285862 +648/20000 train_loss: 2.7369 train_time: 1.0m tok/s: 8285624 +649/20000 train_loss: 2.8698 train_time: 1.0m tok/s: 8285576 +650/20000 train_loss: 2.7596 train_time: 1.0m tok/s: 8285699 +651/20000 train_loss: 2.7538 train_time: 1.0m tok/s: 8285678 +652/20000 train_loss: 2.7045 train_time: 1.0m tok/s: 8285693 +653/20000 train_loss: 2.6591 train_time: 1.0m tok/s: 8285681 +654/20000 train_loss: 2.7185 train_time: 1.0m tok/s: 8285762 +655/20000 train_loss: 2.7110 train_time: 1.0m tok/s: 8285378 +656/20000 train_loss: 2.6481 train_time: 1.0m tok/s: 8285374 +657/20000 train_loss: 2.6565 train_time: 1.0m tok/s: 8285339 +658/20000 train_loss: 2.7054 train_time: 1.0m tok/s: 8285305 +659/20000 train_loss: 2.7497 train_time: 1.0m tok/s: 8285421 +660/20000 train_loss: 2.7487 train_time: 1.0m tok/s: 8285335 +661/20000 train_loss: 2.8076 train_time: 1.0m tok/s: 8285302 +662/20000 train_loss: 2.6942 train_time: 1.0m tok/s: 8285323 +663/20000 train_loss: 2.7849 train_time: 1.0m tok/s: 8285426 +664/20000 train_loss: 2.7881 train_time: 1.1m tok/s: 8285355 +665/20000 train_loss: 2.8380 train_time: 1.1m tok/s: 8285159 +666/20000 train_loss: 2.8286 train_time: 1.1m tok/s: 8285046 +667/20000 train_loss: 2.7582 train_time: 1.1m tok/s: 8284993 +668/20000 train_loss: 2.7433 train_time: 1.1m tok/s: 8284916 +669/20000 train_loss: 2.6304 train_time: 1.1m tok/s: 8284898 +670/20000 train_loss: 2.6432 train_time: 1.1m tok/s: 8284845 +671/20000 train_loss: 2.6573 train_time: 1.1m tok/s: 8284933 +672/20000 train_loss: 2.7830 train_time: 1.1m tok/s: 8284975 +673/20000 train_loss: 2.6272 train_time: 1.1m tok/s: 8284877 +674/20000 train_loss: 2.8614 train_time: 1.1m tok/s: 8284830 +675/20000 train_loss: 2.6237 train_time: 1.1m tok/s: 8284733 +676/20000 train_loss: 2.8513 train_time: 1.1m tok/s: 8284593 +677/20000 train_loss: 2.6756 train_time: 1.1m tok/s: 8284531 +678/20000 train_loss: 2.7592 train_time: 1.1m tok/s: 8284459 +679/20000 train_loss: 2.7034 train_time: 1.1m tok/s: 8284392 +680/20000 train_loss: 2.9086 train_time: 1.1m tok/s: 8284378 +681/20000 train_loss: 2.7825 train_time: 1.1m tok/s: 8284380 +682/20000 train_loss: 2.8806 train_time: 1.1m tok/s: 8284377 +683/20000 train_loss: 2.8585 train_time: 1.1m tok/s: 8284301 +684/20000 train_loss: 2.7913 train_time: 1.1m tok/s: 8284259 +685/20000 train_loss: 2.6575 train_time: 1.1m tok/s: 8284318 +686/20000 train_loss: 2.8921 train_time: 1.1m tok/s: 8284237 +687/20000 train_loss: 2.7814 train_time: 1.1m tok/s: 8284213 +688/20000 train_loss: 2.7836 train_time: 1.1m tok/s: 8284142 +689/20000 train_loss: 2.8114 train_time: 1.1m tok/s: 8284049 +690/20000 train_loss: 2.7484 train_time: 1.1m tok/s: 8284036 +691/20000 train_loss: 2.8506 train_time: 1.1m tok/s: 8283974 +692/20000 train_loss: 2.9120 train_time: 1.1m tok/s: 8283946 +693/20000 train_loss: 2.8220 train_time: 1.1m tok/s: 8283916 +694/20000 train_loss: 2.8196 train_time: 1.1m tok/s: 8284018 +695/20000 train_loss: 2.8053 train_time: 1.1m tok/s: 8283971 +696/20000 train_loss: 2.8090 train_time: 1.1m tok/s: 8284064 +697/20000 train_loss: 2.6733 train_time: 1.1m tok/s: 8283919 +698/20000 train_loss: 2.8422 train_time: 1.1m tok/s: 8283836 +699/20000 train_loss: 2.6880 train_time: 1.1m tok/s: 8283709 +700/20000 train_loss: 2.6365 train_time: 1.1m tok/s: 8283590 +701/20000 train_loss: 2.6379 train_time: 1.1m tok/s: 8283583 +702/20000 train_loss: 2.6239 train_time: 1.1m tok/s: 8283498 +703/20000 train_loss: 2.5156 train_time: 1.1m tok/s: 8283359 +704/20000 train_loss: 2.8472 train_time: 1.1m tok/s: 8283276 +705/20000 train_loss: 2.7819 train_time: 1.1m tok/s: 8283216 +706/20000 train_loss: 2.7544 train_time: 1.1m tok/s: 8282924 +707/20000 train_loss: 2.7615 train_time: 1.1m tok/s: 8283285 +708/20000 train_loss: 2.8460 train_time: 1.1m tok/s: 8283190 +709/20000 train_loss: 2.8055 train_time: 1.1m tok/s: 8283246 +710/20000 train_loss: 2.6339 train_time: 1.1m tok/s: 8283192 +711/20000 train_loss: 2.7087 train_time: 1.1m tok/s: 8283183 +712/20000 train_loss: 2.6423 train_time: 1.1m tok/s: 8283087 +713/20000 train_loss: 2.7040 train_time: 1.1m tok/s: 8283022 +714/20000 train_loss: 2.7616 train_time: 1.1m tok/s: 8283064 +715/20000 train_loss: 2.7093 train_time: 1.1m tok/s: 8283086 +716/20000 train_loss: 2.7327 train_time: 1.1m tok/s: 8283047 +717/20000 train_loss: 2.9224 train_time: 1.1m tok/s: 8283101 +718/20000 train_loss: 2.8132 train_time: 1.1m tok/s: 8283149 +719/20000 train_loss: 2.7505 train_time: 1.1m tok/s: 8283220 +720/20000 train_loss: 2.6974 train_time: 1.1m tok/s: 8283017 +721/20000 train_loss: 2.8318 train_time: 1.1m tok/s: 8282896 +722/20000 train_loss: 2.6665 train_time: 1.1m tok/s: 8282867 +723/20000 train_loss: 2.8746 train_time: 1.1m tok/s: 8283004 +724/20000 train_loss: 2.7737 train_time: 1.1m tok/s: 8283046 +725/20000 train_loss: 2.6358 train_time: 1.1m tok/s: 8283079 +726/20000 train_loss: 2.7847 train_time: 1.1m tok/s: 8282985 +727/20000 train_loss: 2.5987 train_time: 1.2m tok/s: 8282855 +728/20000 train_loss: 2.8050 train_time: 1.2m tok/s: 8282800 +729/20000 train_loss: 2.8440 train_time: 1.2m tok/s: 8282765 +730/20000 train_loss: 2.7766 train_time: 1.2m tok/s: 8282808 +731/20000 train_loss: 2.8810 train_time: 1.2m tok/s: 8282814 +732/20000 train_loss: 2.7157 train_time: 1.2m tok/s: 8282744 +733/20000 train_loss: 2.8873 train_time: 1.2m tok/s: 8282494 +734/20000 train_loss: 2.7395 train_time: 1.2m tok/s: 8282536 +735/20000 train_loss: 2.8008 train_time: 1.2m tok/s: 8282584 +736/20000 train_loss: 2.6739 train_time: 1.2m tok/s: 8282605 +737/20000 train_loss: 2.8066 train_time: 1.2m tok/s: 8282487 +738/20000 train_loss: 2.6735 train_time: 1.2m tok/s: 8282472 +739/20000 train_loss: 2.5981 train_time: 1.2m tok/s: 8282533 +740/20000 train_loss: 2.8325 train_time: 1.2m tok/s: 8282454 +741/20000 train_loss: 2.8244 train_time: 1.2m tok/s: 8282380 +742/20000 train_loss: 2.6899 train_time: 1.2m tok/s: 8282454 +743/20000 train_loss: 2.8524 train_time: 1.2m tok/s: 8282506 +744/20000 train_loss: 2.7300 train_time: 1.2m tok/s: 8282441 +745/20000 train_loss: 2.7438 train_time: 1.2m tok/s: 8282434 +746/20000 train_loss: 2.8228 train_time: 1.2m tok/s: 8282543 +747/20000 train_loss: 2.7048 train_time: 1.2m tok/s: 8282545 +748/20000 train_loss: 2.7495 train_time: 1.2m tok/s: 8282570 +749/20000 train_loss: 2.8139 train_time: 1.2m tok/s: 8282545 +750/20000 train_loss: 2.8237 train_time: 1.2m tok/s: 8282414 +751/20000 train_loss: 2.6915 train_time: 1.2m tok/s: 8282387 +752/20000 train_loss: 2.7781 train_time: 1.2m tok/s: 8282230 +753/20000 train_loss: 2.4282 train_time: 1.2m tok/s: 8282021 +754/20000 train_loss: 2.6748 train_time: 1.2m tok/s: 8281864 +755/20000 train_loss: 2.8708 train_time: 1.2m tok/s: 8281920 +756/20000 train_loss: 3.1509 train_time: 1.2m tok/s: 8282052 +757/20000 train_loss: 2.7933 train_time: 1.2m tok/s: 8282109 +758/20000 train_loss: 2.7250 train_time: 1.2m tok/s: 8282193 +759/20000 train_loss: 2.6835 train_time: 1.2m tok/s: 8282166 +760/20000 train_loss: 2.8593 train_time: 1.2m tok/s: 8282060 +761/20000 train_loss: 2.7385 train_time: 1.2m tok/s: 8282065 +762/20000 train_loss: 2.8335 train_time: 1.2m tok/s: 8282002 +763/20000 train_loss: 2.6622 train_time: 1.2m tok/s: 8282058 +764/20000 train_loss: 2.7141 train_time: 1.2m tok/s: 8281904 +765/20000 train_loss: 2.6847 train_time: 1.2m tok/s: 8281945 +766/20000 train_loss: 2.6726 train_time: 1.2m tok/s: 8282013 +767/20000 train_loss: 2.6998 train_time: 1.2m tok/s: 8282106 +768/20000 train_loss: 2.7472 train_time: 1.2m tok/s: 8282131 +769/20000 train_loss: 2.7698 train_time: 1.2m tok/s: 8282137 +770/20000 train_loss: 2.7773 train_time: 1.2m tok/s: 8282123 +771/20000 train_loss: 2.7909 train_time: 1.2m tok/s: 8282101 +772/20000 train_loss: 2.7711 train_time: 1.2m tok/s: 8282062 +773/20000 train_loss: 2.7151 train_time: 1.2m tok/s: 8282027 +774/20000 train_loss: 2.8472 train_time: 1.2m tok/s: 8281876 +775/20000 train_loss: 2.8156 train_time: 1.2m tok/s: 8281836 +776/20000 train_loss: 2.9100 train_time: 1.2m tok/s: 8281734 +777/20000 train_loss: 2.8682 train_time: 1.2m tok/s: 8281458 +778/20000 train_loss: 2.7210 train_time: 1.2m tok/s: 8281415 +779/20000 train_loss: 2.4414 train_time: 1.2m tok/s: 8281269 +780/20000 train_loss: 2.7851 train_time: 1.2m tok/s: 8281183 +781/20000 train_loss: 2.7602 train_time: 1.2m tok/s: 8281251 +782/20000 train_loss: 3.0327 train_time: 1.2m tok/s: 8281166 +783/20000 train_loss: 2.5248 train_time: 1.2m tok/s: 8280980 +784/20000 train_loss: 2.9042 train_time: 1.2m tok/s: 8280909 +785/20000 train_loss: 2.8622 train_time: 1.2m tok/s: 8280968 +786/20000 train_loss: 2.7301 train_time: 1.2m tok/s: 8281117 +787/20000 train_loss: 2.6397 train_time: 1.2m tok/s: 8280989 +788/20000 train_loss: 2.6796 train_time: 1.2m tok/s: 8280901 +789/20000 train_loss: 2.7970 train_time: 1.2m tok/s: 8280824 +790/20000 train_loss: 2.6434 train_time: 1.3m tok/s: 8280589 +791/20000 train_loss: 2.5954 train_time: 1.3m tok/s: 8280590 +792/20000 train_loss: 2.7333 train_time: 1.3m tok/s: 8280602 +793/20000 train_loss: 2.7111 train_time: 1.3m tok/s: 8280512 +794/20000 train_loss: 2.7265 train_time: 1.3m tok/s: 8280438 +795/20000 train_loss: 2.8569 train_time: 1.3m tok/s: 8280473 +796/20000 train_loss: 2.7255 train_time: 1.3m tok/s: 8280451 +797/20000 train_loss: 2.7417 train_time: 1.3m tok/s: 8280534 +798/20000 train_loss: 2.7559 train_time: 1.3m tok/s: 8280543 +799/20000 train_loss: 2.7867 train_time: 1.3m tok/s: 8280533 +800/20000 train_loss: 2.7226 train_time: 1.3m tok/s: 8280406 +801/20000 train_loss: 2.7445 train_time: 1.3m tok/s: 8280279 +802/20000 train_loss: 2.8201 train_time: 1.3m tok/s: 8280255 +803/20000 train_loss: 2.6931 train_time: 1.3m tok/s: 8279970 +804/20000 train_loss: 2.6760 train_time: 1.3m tok/s: 8280214 +805/20000 train_loss: 2.6856 train_time: 1.3m tok/s: 8280143 +806/20000 train_loss: 2.8071 train_time: 1.3m tok/s: 8280180 +807/20000 train_loss: 2.7938 train_time: 1.3m tok/s: 8280313 +808/20000 train_loss: 2.8134 train_time: 1.3m tok/s: 8280275 +809/20000 train_loss: 2.6368 train_time: 1.3m tok/s: 8280225 +810/20000 train_loss: 2.8518 train_time: 1.3m tok/s: 8280228 +811/20000 train_loss: 2.8518 train_time: 1.3m tok/s: 8280240 +812/20000 train_loss: 2.6878 train_time: 1.3m tok/s: 8280219 +813/20000 train_loss: 2.7520 train_time: 1.3m tok/s: 8280187 +814/20000 train_loss: 2.7932 train_time: 1.3m tok/s: 8280255 +815/20000 train_loss: 2.8949 train_time: 1.3m tok/s: 8280229 +816/20000 train_loss: 2.7198 train_time: 1.3m tok/s: 8280163 +817/20000 train_loss: 2.7094 train_time: 1.3m tok/s: 8280148 +818/20000 train_loss: 2.7774 train_time: 1.3m tok/s: 8280291 +819/20000 train_loss: 2.7923 train_time: 1.3m tok/s: 8280214 +820/20000 train_loss: 3.0427 train_time: 1.3m tok/s: 8280127 +821/20000 train_loss: 2.7903 train_time: 1.3m tok/s: 8280045 +822/20000 train_loss: 2.6053 train_time: 1.3m tok/s: 8280026 +823/20000 train_loss: 2.6461 train_time: 1.3m tok/s: 8279919 +824/20000 train_loss: 2.8166 train_time: 1.3m tok/s: 8279936 +825/20000 train_loss: 2.8830 train_time: 1.3m tok/s: 8279801 +826/20000 train_loss: 2.8666 train_time: 1.3m tok/s: 8279704 +827/20000 train_loss: 2.6448 train_time: 1.3m tok/s: 8279710 +828/20000 train_loss: 2.7303 train_time: 1.3m tok/s: 8279726 +829/20000 train_loss: 3.3557 train_time: 1.3m tok/s: 8279633 +830/20000 train_loss: 2.7539 train_time: 1.3m tok/s: 8279449 +831/20000 train_loss: 2.7460 train_time: 1.3m tok/s: 8279412 +832/20000 train_loss: 2.7747 train_time: 1.3m tok/s: 8279376 +833/20000 train_loss: 2.8565 train_time: 1.3m tok/s: 8279414 +834/20000 train_loss: 2.7000 train_time: 1.3m tok/s: 8279394 +835/20000 train_loss: 2.7871 train_time: 1.3m tok/s: 8279465 +836/20000 train_loss: 2.6209 train_time: 1.3m tok/s: 8279436 +837/20000 train_loss: 2.5120 train_time: 1.3m tok/s: 8279327 +838/20000 train_loss: 2.6300 train_time: 1.3m tok/s: 8279069 +839/20000 train_loss: 2.7226 train_time: 1.3m tok/s: 8278887 +840/20000 train_loss: 3.1312 train_time: 1.3m tok/s: 8278887 +841/20000 train_loss: 2.7213 train_time: 1.3m tok/s: 8278909 +842/20000 train_loss: 2.7255 train_time: 1.3m tok/s: 8278931 +843/20000 train_loss: 2.6557 train_time: 1.3m tok/s: 8278883 +844/20000 train_loss: 2.7362 train_time: 1.3m tok/s: 8278807 +845/20000 train_loss: 2.6849 train_time: 1.3m tok/s: 8278774 +846/20000 train_loss: 2.6722 train_time: 1.3m tok/s: 8278766 +847/20000 train_loss: 2.7264 train_time: 1.3m tok/s: 8278777 +848/20000 train_loss: 2.6435 train_time: 1.3m tok/s: 8278781 +849/20000 train_loss: 2.7470 train_time: 1.3m tok/s: 8278781 +850/20000 train_loss: 2.5878 train_time: 1.3m tok/s: 8278879 +851/20000 train_loss: 2.7583 train_time: 1.3m tok/s: 8278878 +852/20000 train_loss: 2.5666 train_time: 1.3m tok/s: 8278942 +853/20000 train_loss: 2.7178 train_time: 1.4m tok/s: 8279003 +854/20000 train_loss: 2.7117 train_time: 1.4m tok/s: 8278874 +855/20000 train_loss: 2.7557 train_time: 1.4m tok/s: 8278784 +856/20000 train_loss: 2.6952 train_time: 1.4m tok/s: 8278855 +857/20000 train_loss: 2.8389 train_time: 1.4m tok/s: 8278862 +858/20000 train_loss: 2.8322 train_time: 1.4m tok/s: 8278895 +859/20000 train_loss: 2.7302 train_time: 1.4m tok/s: 8278756 +860/20000 train_loss: 2.6699 train_time: 1.4m tok/s: 8278608 +861/20000 train_loss: 2.7118 train_time: 1.4m tok/s: 8278598 +862/20000 train_loss: 2.6621 train_time: 1.4m tok/s: 8278585 +863/20000 train_loss: 2.9054 train_time: 1.4m tok/s: 8278546 +864/20000 train_loss: 2.7546 train_time: 1.4m tok/s: 8278472 +865/20000 train_loss: 2.7746 train_time: 1.4m tok/s: 8278444 +866/20000 train_loss: 2.6362 train_time: 1.4m tok/s: 8277987 +867/20000 train_loss: 2.6743 train_time: 1.4m tok/s: 8278280 +868/20000 train_loss: 2.6875 train_time: 1.4m tok/s: 8278175 +869/20000 train_loss: 2.7024 train_time: 1.4m tok/s: 8278218 +870/20000 train_loss: 2.6740 train_time: 1.4m tok/s: 8278281 +871/20000 train_loss: 2.6562 train_time: 1.4m tok/s: 8278329 +872/20000 train_loss: 2.7520 train_time: 1.4m tok/s: 8278368 +873/20000 train_loss: 2.6658 train_time: 1.4m tok/s: 8278390 +874/20000 train_loss: 2.8039 train_time: 1.4m tok/s: 8278329 +875/20000 train_loss: 2.7573 train_time: 1.4m tok/s: 8278397 +876/20000 train_loss: 2.8118 train_time: 1.4m tok/s: 8278384 +877/20000 train_loss: 2.7026 train_time: 1.4m tok/s: 8278399 +878/20000 train_loss: 2.6915 train_time: 1.4m tok/s: 8278441 +879/20000 train_loss: 2.7385 train_time: 1.4m tok/s: 8278399 +880/20000 train_loss: 2.7483 train_time: 1.4m tok/s: 8278370 +881/20000 train_loss: 2.6708 train_time: 1.4m tok/s: 8278448 +882/20000 train_loss: 2.6945 train_time: 1.4m tok/s: 8278426 +883/20000 train_loss: 2.7878 train_time: 1.4m tok/s: 8278371 +884/20000 train_loss: 2.5296 train_time: 1.4m tok/s: 8278386 +885/20000 train_loss: 2.6358 train_time: 1.4m tok/s: 8278369 +886/20000 train_loss: 2.7096 train_time: 1.4m tok/s: 8278385 +887/20000 train_loss: 2.6662 train_time: 1.4m tok/s: 8278345 +888/20000 train_loss: 2.6338 train_time: 1.4m tok/s: 8278265 +889/20000 train_loss: 2.8034 train_time: 1.4m tok/s: 8278309 +890/20000 train_loss: 2.5957 train_time: 1.4m tok/s: 8278210 +891/20000 train_loss: 2.6918 train_time: 1.4m tok/s: 8278260 +892/20000 train_loss: 2.7109 train_time: 1.4m tok/s: 8278236 +893/20000 train_loss: 2.6495 train_time: 1.4m tok/s: 8278259 +894/20000 train_loss: 2.7130 train_time: 1.4m tok/s: 8278317 +895/20000 train_loss: 2.7395 train_time: 1.4m tok/s: 8278447 +896/20000 train_loss: 2.7987 train_time: 1.4m tok/s: 8278362 +897/20000 train_loss: 2.7210 train_time: 1.4m tok/s: 8278337 +898/20000 train_loss: 2.6880 train_time: 1.4m tok/s: 8278307 +899/20000 train_loss: 2.6727 train_time: 1.4m tok/s: 8278319 +900/20000 train_loss: 2.7235 train_time: 1.4m tok/s: 8278213 +901/20000 train_loss: 2.6399 train_time: 1.4m tok/s: 8278197 +902/20000 train_loss: 2.6350 train_time: 1.4m tok/s: 8278138 +903/20000 train_loss: 2.5856 train_time: 1.4m tok/s: 8278181 +904/20000 train_loss: 2.5698 train_time: 1.4m tok/s: 8278243 +905/20000 train_loss: 2.7061 train_time: 1.4m tok/s: 8278346 +906/20000 train_loss: 2.7743 train_time: 1.4m tok/s: 8278383 +907/20000 train_loss: 2.7517 train_time: 1.4m tok/s: 8278375 +908/20000 train_loss: 2.8340 train_time: 1.4m tok/s: 8278342 +909/20000 train_loss: 2.7768 train_time: 1.4m tok/s: 8278275 +910/20000 train_loss: 2.8344 train_time: 1.4m tok/s: 8278271 +911/20000 train_loss: 2.7263 train_time: 1.4m tok/s: 8278347 +912/20000 train_loss: 2.5440 train_time: 1.4m tok/s: 8278077 +913/20000 train_loss: 2.7320 train_time: 1.4m tok/s: 8277703 +914/20000 train_loss: 2.8113 train_time: 1.4m tok/s: 8277668 +915/20000 train_loss: 2.7418 train_time: 1.4m tok/s: 8277696 +916/20000 train_loss: 2.7466 train_time: 1.5m tok/s: 8277800 +917/20000 train_loss: 2.6826 train_time: 1.5m tok/s: 8277706 +918/20000 train_loss: 2.5626 train_time: 1.5m tok/s: 8277681 +919/20000 train_loss: 2.6253 train_time: 1.5m tok/s: 8277614 +920/20000 train_loss: 2.6482 train_time: 1.5m tok/s: 8277659 +921/20000 train_loss: 2.5089 train_time: 1.5m tok/s: 8277637 +922/20000 train_loss: 2.7212 train_time: 1.5m tok/s: 8277559 +923/20000 train_loss: 2.6209 train_time: 1.5m tok/s: 8277493 +924/20000 train_loss: 2.6208 train_time: 1.5m tok/s: 8277494 +925/20000 train_loss: 2.9403 train_time: 1.5m tok/s: 8277492 +926/20000 train_loss: 2.5550 train_time: 1.5m tok/s: 8277427 +927/20000 train_loss: 2.7505 train_time: 1.5m tok/s: 8277402 +928/20000 train_loss: 2.8015 train_time: 1.5m tok/s: 8277427 +929/20000 train_loss: 2.7204 train_time: 1.5m tok/s: 8277468 +930/20000 train_loss: 2.8829 train_time: 1.5m tok/s: 8277208 +931/20000 train_loss: 2.7648 train_time: 1.5m tok/s: 8277152 +932/20000 train_loss: 2.6967 train_time: 1.5m tok/s: 8277254 +933/20000 train_loss: 2.7127 train_time: 1.5m tok/s: 8277392 +934/20000 train_loss: 2.7150 train_time: 1.5m tok/s: 8277354 +935/20000 train_loss: 2.7995 train_time: 1.5m tok/s: 8277369 +936/20000 train_loss: 2.6136 train_time: 1.5m tok/s: 8277375 +937/20000 train_loss: 2.7287 train_time: 1.5m tok/s: 8277461 +938/20000 train_loss: 2.5090 train_time: 1.5m tok/s: 8277260 +939/20000 train_loss: 2.5091 train_time: 1.5m tok/s: 8276981 +940/20000 train_loss: 2.7681 train_time: 1.5m tok/s: 8276810 +941/20000 train_loss: 2.8697 train_time: 1.5m tok/s: 8276718 +942/20000 train_loss: 2.7016 train_time: 1.5m tok/s: 8276732 +943/20000 train_loss: 2.7180 train_time: 1.5m tok/s: 8276642 +944/20000 train_loss: 2.8055 train_time: 1.5m tok/s: 8276719 +945/20000 train_loss: 2.7225 train_time: 1.5m tok/s: 8276769 +946/20000 train_loss: 2.6247 train_time: 1.5m tok/s: 8276799 +947/20000 train_loss: 2.7936 train_time: 1.5m tok/s: 8276796 +948/20000 train_loss: 2.7188 train_time: 1.5m tok/s: 8276780 +949/20000 train_loss: 2.7195 train_time: 1.5m tok/s: 8276772 +950/20000 train_loss: 2.7271 train_time: 1.5m tok/s: 8276719 +951/20000 train_loss: 2.7900 train_time: 1.5m tok/s: 8276684 +952/20000 train_loss: 2.5538 train_time: 1.5m tok/s: 8276591 +953/20000 train_loss: 2.6850 train_time: 1.5m tok/s: 8276641 +954/20000 train_loss: 2.6935 train_time: 1.5m tok/s: 8276520 +955/20000 train_loss: 2.7966 train_time: 1.5m tok/s: 8276395 +956/20000 train_loss: 2.8071 train_time: 1.5m tok/s: 8276315 +957/20000 train_loss: 2.6789 train_time: 1.5m tok/s: 8276280 +958/20000 train_loss: 2.7768 train_time: 1.5m tok/s: 8276239 +959/20000 train_loss: 2.7778 train_time: 1.5m tok/s: 8276284 +960/20000 train_loss: 2.9908 train_time: 1.5m tok/s: 8276208 +961/20000 train_loss: 2.7948 train_time: 1.5m tok/s: 8276146 +962/20000 train_loss: 2.7084 train_time: 1.5m tok/s: 8276175 +963/20000 train_loss: 2.7573 train_time: 1.5m tok/s: 8276177 +964/20000 train_loss: 2.6630 train_time: 1.5m tok/s: 8276211 +965/20000 train_loss: 2.7791 train_time: 1.5m tok/s: 8276161 +966/20000 train_loss: 2.7795 train_time: 1.5m tok/s: 8276196 +967/20000 train_loss: 2.5718 train_time: 1.5m tok/s: 8276238 +968/20000 train_loss: 2.6798 train_time: 1.5m tok/s: 8276265 +969/20000 train_loss: 2.7049 train_time: 1.5m tok/s: 8276188 +970/20000 train_loss: 2.6977 train_time: 1.5m tok/s: 8275999 +971/20000 train_loss: 2.5490 train_time: 1.5m tok/s: 8275799 +972/20000 train_loss: 2.4987 train_time: 1.5m tok/s: 8275694 +973/20000 train_loss: 2.5692 train_time: 1.5m tok/s: 8275512 +974/20000 train_loss: 2.6943 train_time: 1.5m tok/s: 8275162 +975/20000 train_loss: 2.7501 train_time: 1.5m tok/s: 8275437 +976/20000 train_loss: 2.6884 train_time: 1.5m tok/s: 8275419 +977/20000 train_loss: 2.7327 train_time: 1.5m tok/s: 8275451 +978/20000 train_loss: 2.7645 train_time: 1.5m tok/s: 8275517 +979/20000 train_loss: 2.6090 train_time: 1.6m tok/s: 8275549 +980/20000 train_loss: 2.6508 train_time: 1.6m tok/s: 8275369 +981/20000 train_loss: 2.6414 train_time: 1.6m tok/s: 8275376 +982/20000 train_loss: 2.6275 train_time: 1.6m tok/s: 8275415 +983/20000 train_loss: 2.7503 train_time: 1.6m tok/s: 8275224 +984/20000 train_loss: 2.6473 train_time: 1.6m tok/s: 8275279 +985/20000 train_loss: 2.7651 train_time: 1.6m tok/s: 8275241 +986/20000 train_loss: 2.6958 train_time: 1.6m tok/s: 8275302 +987/20000 train_loss: 2.6857 train_time: 1.6m tok/s: 8275177 +988/20000 train_loss: 2.5478 train_time: 1.6m tok/s: 8275454 +989/20000 train_loss: 2.6785 train_time: 1.6m tok/s: 8275435 +990/20000 train_loss: 2.6717 train_time: 1.6m tok/s: 8275490 +991/20000 train_loss: 2.8064 train_time: 1.6m tok/s: 8275462 +992/20000 train_loss: 2.6370 train_time: 1.6m tok/s: 8275437 +993/20000 train_loss: 2.5767 train_time: 1.6m tok/s: 8275371 +994/20000 train_loss: 2.7190 train_time: 1.6m tok/s: 8275398 +995/20000 train_loss: 2.8847 train_time: 1.6m tok/s: 8275422 +996/20000 train_loss: 2.8264 train_time: 1.6m tok/s: 8275482 +997/20000 train_loss: 2.7793 train_time: 1.6m tok/s: 8275527 +998/20000 train_loss: 2.6645 train_time: 1.6m tok/s: 8275529 +999/20000 train_loss: 2.7513 train_time: 1.6m tok/s: 8275551 +1000/20000 train_loss: 2.7859 train_time: 1.6m tok/s: 8275577 +1001/20000 train_loss: 2.6760 train_time: 1.6m tok/s: 8275617 +1002/20000 train_loss: 2.7320 train_time: 1.6m tok/s: 8275484 +1003/20000 train_loss: 2.6519 train_time: 1.6m tok/s: 8275374 +1004/20000 train_loss: 2.6539 train_time: 1.6m tok/s: 8275380 +1005/20000 train_loss: 2.6347 train_time: 1.6m tok/s: 8275366 +1006/20000 train_loss: 2.7181 train_time: 1.6m tok/s: 8275342 +1007/20000 train_loss: 2.5866 train_time: 1.6m tok/s: 8275392 +1008/20000 train_loss: 2.5355 train_time: 1.6m tok/s: 8275344 +1009/20000 train_loss: 2.6867 train_time: 1.6m tok/s: 8275362 +1010/20000 train_loss: 2.8218 train_time: 1.6m tok/s: 8275437 +1011/20000 train_loss: 2.8077 train_time: 1.6m tok/s: 8275515 +1012/20000 train_loss: 2.4258 train_time: 1.6m tok/s: 8275363 +1013/20000 train_loss: 2.6162 train_time: 1.6m tok/s: 8275155 +1014/20000 train_loss: 2.7268 train_time: 1.6m tok/s: 8275187 +1015/20000 train_loss: 2.7595 train_time: 1.6m tok/s: 8275167 +1016/20000 train_loss: 2.5694 train_time: 1.6m tok/s: 8275073 +1017/20000 train_loss: 2.7184 train_time: 1.6m tok/s: 8275085 +1018/20000 train_loss: 2.7856 train_time: 1.6m tok/s: 8274868 +1019/20000 train_loss: 2.6689 train_time: 1.6m tok/s: 8274863 +1020/20000 train_loss: 2.6992 train_time: 1.6m tok/s: 8274774 +1021/20000 train_loss: 2.6704 train_time: 1.6m tok/s: 8274670 +1022/20000 train_loss: 2.7639 train_time: 1.6m tok/s: 8274669 +1023/20000 train_loss: 2.6674 train_time: 1.6m tok/s: 8274622 +1024/20000 train_loss: 2.6902 train_time: 1.6m tok/s: 8274650 +1025/20000 train_loss: 2.7610 train_time: 1.6m tok/s: 8274605 +1026/20000 train_loss: 3.3264 train_time: 1.6m tok/s: 8274415 +1027/20000 train_loss: 2.5510 train_time: 1.6m tok/s: 8274327 +1028/20000 train_loss: 2.6138 train_time: 1.6m tok/s: 8274396 +1029/20000 train_loss: 2.6990 train_time: 1.6m tok/s: 8274377 +1030/20000 train_loss: 2.5458 train_time: 1.6m tok/s: 8274296 +1031/20000 train_loss: 2.6162 train_time: 1.6m tok/s: 8274279 +1032/20000 train_loss: 2.7352 train_time: 1.6m tok/s: 8274329 +1033/20000 train_loss: 2.8480 train_time: 1.6m tok/s: 8274283 +1034/20000 train_loss: 2.6369 train_time: 1.6m tok/s: 8274212 +1035/20000 train_loss: 2.7319 train_time: 1.6m tok/s: 8274244 +1036/20000 train_loss: 2.6824 train_time: 1.6m tok/s: 8274214 +1037/20000 train_loss: 2.7010 train_time: 1.6m tok/s: 8274238 +1038/20000 train_loss: 2.4963 train_time: 1.6m tok/s: 8274188 +1039/20000 train_loss: 2.7003 train_time: 1.6m tok/s: 8274188 +1040/20000 train_loss: 2.6517 train_time: 1.6m tok/s: 8274248 +1041/20000 train_loss: 2.6676 train_time: 1.6m tok/s: 8274241 +1042/20000 train_loss: 2.6718 train_time: 1.7m tok/s: 8274176 +1043/20000 train_loss: 2.7239 train_time: 1.7m tok/s: 8274164 +1044/20000 train_loss: 2.6623 train_time: 1.7m tok/s: 8274144 +1045/20000 train_loss: 2.7958 train_time: 1.7m tok/s: 8274161 +1046/20000 train_loss: 2.7391 train_time: 1.7m tok/s: 8274023 +1047/20000 train_loss: 2.7080 train_time: 1.7m tok/s: 8274066 +1048/20000 train_loss: 2.5965 train_time: 1.7m tok/s: 8273981 +1049/20000 train_loss: 2.7265 train_time: 1.7m tok/s: 8274009 +1050/20000 train_loss: 2.8243 train_time: 1.7m tok/s: 8274038 +1051/20000 train_loss: 2.7338 train_time: 1.7m tok/s: 8274112 +1052/20000 train_loss: 2.6175 train_time: 1.7m tok/s: 8274128 +1053/20000 train_loss: 2.6478 train_time: 1.7m tok/s: 8274100 +1054/20000 train_loss: 2.6232 train_time: 1.7m tok/s: 8274095 +1055/20000 train_loss: 2.5804 train_time: 1.7m tok/s: 8274018 +1056/20000 train_loss: 2.6546 train_time: 1.7m tok/s: 8273892 +1057/20000 train_loss: 2.7180 train_time: 1.7m tok/s: 8273867 +1058/20000 train_loss: 2.6057 train_time: 1.7m tok/s: 8273856 +1059/20000 train_loss: 2.7491 train_time: 1.7m tok/s: 8273793 +1060/20000 train_loss: 2.6774 train_time: 1.7m tok/s: 8273787 +1061/20000 train_loss: 2.7499 train_time: 1.7m tok/s: 8273844 +1062/20000 train_loss: 2.7775 train_time: 1.7m tok/s: 8273762 +1063/20000 train_loss: 2.7772 train_time: 1.7m tok/s: 8273745 +1064/20000 train_loss: 2.6564 train_time: 1.7m tok/s: 8273816 +1065/20000 train_loss: 2.4655 train_time: 1.7m tok/s: 8273797 +1066/20000 train_loss: 2.7941 train_time: 1.7m tok/s: 8273741 +1067/20000 train_loss: 2.8004 train_time: 1.7m tok/s: 8273617 +1068/20000 train_loss: 2.7007 train_time: 1.7m tok/s: 8273635 +1069/20000 train_loss: 2.6452 train_time: 1.7m tok/s: 8273581 +1070/20000 train_loss: 2.5789 train_time: 1.7m tok/s: 8273606 +1071/20000 train_loss: 2.7440 train_time: 1.7m tok/s: 8273606 +1072/20000 train_loss: 2.6715 train_time: 1.7m tok/s: 8273623 +1073/20000 train_loss: 2.6728 train_time: 1.7m tok/s: 8273678 +1074/20000 train_loss: 2.6920 train_time: 1.7m tok/s: 8273712 +1075/20000 train_loss: 2.7199 train_time: 1.7m tok/s: 8273708 +1076/20000 train_loss: 2.7221 train_time: 1.7m tok/s: 8273744 +1077/20000 train_loss: 2.6723 train_time: 1.7m tok/s: 8273775 +1078/20000 train_loss: 2.8156 train_time: 1.7m tok/s: 8273847 +1079/20000 train_loss: 2.6757 train_time: 1.7m tok/s: 8273765 +1080/20000 train_loss: 2.6479 train_time: 1.7m tok/s: 8273792 +1081/20000 train_loss: 2.6744 train_time: 1.7m tok/s: 8273913 +1082/20000 train_loss: 2.6693 train_time: 1.7m tok/s: 8273982 +1083/20000 train_loss: 2.6051 train_time: 1.7m tok/s: 8273888 +1084/20000 train_loss: 2.6863 train_time: 1.7m tok/s: 8273891 +1085/20000 train_loss: 2.6488 train_time: 1.7m tok/s: 8273967 +1086/20000 train_loss: 2.6849 train_time: 1.7m tok/s: 8274038 +1087/20000 train_loss: 2.6682 train_time: 1.7m tok/s: 8273938 +1088/20000 train_loss: 2.8501 train_time: 1.7m tok/s: 8273862 +1089/20000 train_loss: 2.8155 train_time: 1.7m tok/s: 8273860 +1090/20000 train_loss: 2.6345 train_time: 1.7m tok/s: 8273802 +1091/20000 train_loss: 2.6712 train_time: 1.7m tok/s: 8273819 +1092/20000 train_loss: 2.7027 train_time: 1.7m tok/s: 8273778 +1093/20000 train_loss: 2.7565 train_time: 1.7m tok/s: 8273819 +1094/20000 train_loss: 2.8084 train_time: 1.7m tok/s: 8273859 +1095/20000 train_loss: 2.6254 train_time: 1.7m tok/s: 8273752 +1096/20000 train_loss: 2.5300 train_time: 1.7m tok/s: 8273675 +1097/20000 train_loss: 2.6526 train_time: 1.7m tok/s: 8273596 +1098/20000 train_loss: 2.6667 train_time: 1.7m tok/s: 8273579 +1099/20000 train_loss: 2.5152 train_time: 1.7m tok/s: 8273584 +1100/20000 train_loss: 2.5805 train_time: 1.7m tok/s: 8273473 +1101/20000 train_loss: 2.6562 train_time: 1.7m tok/s: 8273487 +1102/20000 train_loss: 2.6754 train_time: 1.7m tok/s: 8273538 +1103/20000 train_loss: 2.7359 train_time: 1.7m tok/s: 8273557 +1104/20000 train_loss: 2.7042 train_time: 1.7m tok/s: 8273505 +1105/20000 train_loss: 2.7244 train_time: 1.8m tok/s: 8273599 +1106/20000 train_loss: 2.7358 train_time: 1.8m tok/s: 8273689 +1107/20000 train_loss: 2.7420 train_time: 1.8m tok/s: 8273657 +1108/20000 train_loss: 2.6849 train_time: 1.8m tok/s: 8273632 +1109/20000 train_loss: 2.6721 train_time: 1.8m tok/s: 8273623 +1110/20000 train_loss: 2.6679 train_time: 1.8m tok/s: 8273655 +1111/20000 train_loss: 2.6423 train_time: 1.8m tok/s: 8273639 +1112/20000 train_loss: 2.6290 train_time: 1.8m tok/s: 8273519 +1113/20000 train_loss: 2.6517 train_time: 1.8m tok/s: 8273469 +1114/20000 train_loss: 2.8216 train_time: 1.8m tok/s: 8273427 +1115/20000 train_loss: 2.6727 train_time: 1.8m tok/s: 8273526 +1116/20000 train_loss: 2.8644 train_time: 1.8m tok/s: 8273506 +1117/20000 train_loss: 2.6946 train_time: 1.8m tok/s: 8273507 +1118/20000 train_loss: 2.7287 train_time: 1.8m tok/s: 8273471 +1119/20000 train_loss: 2.7426 train_time: 1.8m tok/s: 8273426 +1120/20000 train_loss: 2.6366 train_time: 1.8m tok/s: 8273420 +1121/20000 train_loss: 2.6375 train_time: 1.8m tok/s: 8273399 +1122/20000 train_loss: 2.7375 train_time: 1.8m tok/s: 8273421 +1123/20000 train_loss: 2.5354 train_time: 1.8m tok/s: 8273473 +1124/20000 train_loss: 2.6863 train_time: 1.8m tok/s: 8273419 +1125/20000 train_loss: 2.5748 train_time: 1.8m tok/s: 8273388 +1126/20000 train_loss: 2.6465 train_time: 1.8m tok/s: 8273417 +1127/20000 train_loss: 2.8770 train_time: 1.8m tok/s: 8273466 +1128/20000 train_loss: 2.8565 train_time: 1.8m tok/s: 8273460 +1129/20000 train_loss: 2.5984 train_time: 1.8m tok/s: 8273391 +1130/20000 train_loss: 2.7691 train_time: 1.8m tok/s: 8273470 +1131/20000 train_loss: 2.7512 train_time: 1.8m tok/s: 8273485 +1132/20000 train_loss: 2.6229 train_time: 1.8m tok/s: 8273484 +1133/20000 train_loss: 2.5763 train_time: 1.8m tok/s: 8273432 +1134/20000 train_loss: 2.7360 train_time: 1.8m tok/s: 8273410 +1135/20000 train_loss: 2.7315 train_time: 1.8m tok/s: 8273401 +1136/20000 train_loss: 2.5669 train_time: 1.8m tok/s: 8273483 +1137/20000 train_loss: 2.6079 train_time: 1.8m tok/s: 8273429 +1138/20000 train_loss: 2.5648 train_time: 1.8m tok/s: 8273478 +1139/20000 train_loss: 2.5559 train_time: 1.8m tok/s: 8273498 +1140/20000 train_loss: 2.6629 train_time: 1.8m tok/s: 8273479 +1141/20000 train_loss: 2.7014 train_time: 1.8m tok/s: 8273445 +1142/20000 train_loss: 2.6970 train_time: 1.8m tok/s: 8273483 +1143/20000 train_loss: 2.7284 train_time: 1.8m tok/s: 8273490 +1144/20000 train_loss: 2.7826 train_time: 1.8m tok/s: 8273505 +1145/20000 train_loss: 2.7310 train_time: 1.8m tok/s: 8273498 +1146/20000 train_loss: 2.5977 train_time: 1.8m tok/s: 8273549 +1147/20000 train_loss: 2.7569 train_time: 1.8m tok/s: 8273622 +1148/20000 train_loss: 2.5636 train_time: 1.8m tok/s: 8273547 +1149/20000 train_loss: 2.7197 train_time: 1.8m tok/s: 8273449 +1150/20000 train_loss: 2.5914 train_time: 1.8m tok/s: 8273446 +1151/20000 train_loss: 2.5997 train_time: 1.8m tok/s: 8273463 +1152/20000 train_loss: 2.4697 train_time: 1.8m tok/s: 8273457 +1153/20000 train_loss: 2.6009 train_time: 1.8m tok/s: 8273361 +1154/20000 train_loss: 2.7358 train_time: 1.8m tok/s: 8273387 +1155/20000 train_loss: 2.5914 train_time: 1.8m tok/s: 8273448 +1156/20000 train_loss: 2.6985 train_time: 1.8m tok/s: 8273405 +1157/20000 train_loss: 2.6880 train_time: 1.8m tok/s: 8273414 +1158/20000 train_loss: 2.7853 train_time: 1.8m tok/s: 8273375 +1159/20000 train_loss: 2.7171 train_time: 1.8m tok/s: 8273350 +1160/20000 train_loss: 2.6798 train_time: 1.8m tok/s: 8273427 +1161/20000 train_loss: 2.6598 train_time: 1.8m tok/s: 8273399 +1162/20000 train_loss: 2.7107 train_time: 1.8m tok/s: 8273391 +1163/20000 train_loss: 2.6988 train_time: 1.8m tok/s: 8273430 +1164/20000 train_loss: 2.6658 train_time: 1.8m tok/s: 8273378 +1165/20000 train_loss: 2.5309 train_time: 1.8m tok/s: 8273321 +1166/20000 train_loss: 2.7306 train_time: 1.8m tok/s: 8273302 +1167/20000 train_loss: 2.7658 train_time: 1.8m tok/s: 8273287 +1168/20000 train_loss: 2.5670 train_time: 1.9m tok/s: 8273221 +1169/20000 train_loss: 2.7004 train_time: 1.9m tok/s: 8273134 +1170/20000 train_loss: 2.9085 train_time: 1.9m tok/s: 8273059 +1171/20000 train_loss: 2.6624 train_time: 1.9m tok/s: 8273152 +1172/20000 train_loss: 2.7069 train_time: 1.9m tok/s: 8273247 +1173/20000 train_loss: 2.6422 train_time: 1.9m tok/s: 8273275 +1174/20000 train_loss: 2.7063 train_time: 1.9m tok/s: 8273235 +1175/20000 train_loss: 2.6584 train_time: 1.9m tok/s: 8273226 +1176/20000 train_loss: 2.7800 train_time: 1.9m tok/s: 8273228 +1177/20000 train_loss: 2.7805 train_time: 1.9m tok/s: 8273213 +1178/20000 train_loss: 2.6681 train_time: 1.9m tok/s: 8273101 +1179/20000 train_loss: 2.5433 train_time: 1.9m tok/s: 8273067 +1180/20000 train_loss: 2.6053 train_time: 1.9m tok/s: 8272914 +1181/20000 train_loss: 2.6637 train_time: 1.9m tok/s: 8272890 +1182/20000 train_loss: 2.5823 train_time: 1.9m tok/s: 8272878 +1183/20000 train_loss: 2.7936 train_time: 1.9m tok/s: 8272884 +1184/20000 train_loss: 2.4905 train_time: 1.9m tok/s: 8272794 +1185/20000 train_loss: 2.6741 train_time: 1.9m tok/s: 8272806 +1186/20000 train_loss: 2.6644 train_time: 1.9m tok/s: 8272794 +1187/20000 train_loss: 2.7652 train_time: 1.9m tok/s: 8272752 +1188/20000 train_loss: 2.8818 train_time: 1.9m tok/s: 8272776 +1189/20000 train_loss: 2.6405 train_time: 1.9m tok/s: 8272854 +1190/20000 train_loss: 2.7019 train_time: 1.9m tok/s: 8272838 +1191/20000 train_loss: 2.6463 train_time: 1.9m tok/s: 8272716 +1192/20000 train_loss: 2.6771 train_time: 1.9m tok/s: 8272659 +1193/20000 train_loss: 2.6956 train_time: 1.9m tok/s: 8272640 +1194/20000 train_loss: 2.6995 train_time: 1.9m tok/s: 8272585 +1195/20000 train_loss: 2.6076 train_time: 1.9m tok/s: 8272589 +1196/20000 train_loss: 2.8445 train_time: 1.9m tok/s: 8272551 +1197/20000 train_loss: 2.5726 train_time: 1.9m tok/s: 8272527 +1198/20000 train_loss: 2.7006 train_time: 1.9m tok/s: 8272541 +1199/20000 train_loss: 2.7450 train_time: 1.9m tok/s: 8272638 +1200/20000 train_loss: 2.7309 train_time: 1.9m tok/s: 8272606 +1201/20000 train_loss: 2.7303 train_time: 1.9m tok/s: 8272588 +1202/20000 train_loss: 2.8306 train_time: 1.9m tok/s: 8272614 +1203/20000 train_loss: 2.6708 train_time: 1.9m tok/s: 8272648 +1204/20000 train_loss: 2.7127 train_time: 1.9m tok/s: 8272555 +1205/20000 train_loss: 2.7485 train_time: 1.9m tok/s: 8272572 +1206/20000 train_loss: 2.7649 train_time: 1.9m tok/s: 8272535 +1207/20000 train_loss: 2.5706 train_time: 1.9m tok/s: 8272541 +1208/20000 train_loss: 2.5686 train_time: 1.9m tok/s: 8272452 +1209/20000 train_loss: 2.6933 train_time: 1.9m tok/s: 8272390 +1210/20000 train_loss: 2.6285 train_time: 1.9m tok/s: 8272453 +1211/20000 train_loss: 2.5675 train_time: 1.9m tok/s: 8272503 +1212/20000 train_loss: 2.5391 train_time: 1.9m tok/s: 8272298 +1213/20000 train_loss: 2.8106 train_time: 1.9m tok/s: 8272218 +1214/20000 train_loss: 2.6276 train_time: 1.9m tok/s: 8272353 +1215/20000 train_loss: 2.7171 train_time: 1.9m tok/s: 8272362 +1216/20000 train_loss: 2.6834 train_time: 1.9m tok/s: 8272357 +1217/20000 train_loss: 2.7655 train_time: 1.9m tok/s: 8272379 +1218/20000 train_loss: 2.7050 train_time: 1.9m tok/s: 8272424 +1219/20000 train_loss: 3.3161 train_time: 1.9m tok/s: 8272394 +1220/20000 train_loss: 2.6099 train_time: 1.9m tok/s: 8272290 +1221/20000 train_loss: 2.7679 train_time: 1.9m tok/s: 8272332 +1222/20000 train_loss: 2.5870 train_time: 1.9m tok/s: 8272295 +1223/20000 train_loss: 2.7023 train_time: 1.9m tok/s: 8272268 +1224/20000 train_loss: 2.7121 train_time: 1.9m tok/s: 8272245 +1225/20000 train_loss: 2.5347 train_time: 1.9m tok/s: 8272128 +1226/20000 train_loss: 2.6574 train_time: 1.9m tok/s: 8272056 +1227/20000 train_loss: 2.8647 train_time: 1.9m tok/s: 8272016 +1228/20000 train_loss: 2.6612 train_time: 1.9m tok/s: 8272000 +1229/20000 train_loss: 2.6648 train_time: 1.9m tok/s: 8272039 +1230/20000 train_loss: 2.7608 train_time: 1.9m tok/s: 8271976 +1231/20000 train_loss: 2.6980 train_time: 2.0m tok/s: 8271966 +1232/20000 train_loss: 2.6823 train_time: 2.0m tok/s: 8271915 +1233/20000 train_loss: 2.6627 train_time: 2.0m tok/s: 8271895 +1234/20000 train_loss: 2.6597 train_time: 2.0m tok/s: 8271874 +1235/20000 train_loss: 2.5947 train_time: 2.0m tok/s: 8271771 +1236/20000 train_loss: 2.6450 train_time: 2.0m tok/s: 8271728 +1237/20000 train_loss: 2.6226 train_time: 2.0m tok/s: 8271785 +1238/20000 train_loss: 2.5905 train_time: 2.0m tok/s: 8271790 +1239/20000 train_loss: 2.5854 train_time: 2.0m tok/s: 8271783 +1240/20000 train_loss: 2.5381 train_time: 2.0m tok/s: 8271825 +1241/20000 train_loss: 2.5908 train_time: 2.0m tok/s: 8271867 +1242/20000 train_loss: 2.5873 train_time: 2.0m tok/s: 8271682 +1243/20000 train_loss: 2.6826 train_time: 2.0m tok/s: 8271492 +1244/20000 train_loss: 2.7852 train_time: 2.0m tok/s: 8271407 +1245/20000 train_loss: 2.6761 train_time: 2.0m tok/s: 8271267 +1246/20000 train_loss: 2.7898 train_time: 2.0m tok/s: 8271133 +1247/20000 train_loss: 2.7823 train_time: 2.0m tok/s: 8271036 +1248/20000 train_loss: 2.6588 train_time: 2.0m tok/s: 8271154 +1249/20000 train_loss: 2.6444 train_time: 2.0m tok/s: 8271141 +1250/20000 train_loss: 2.6480 train_time: 2.0m tok/s: 8271262 +1251/20000 train_loss: 2.6026 train_time: 2.0m tok/s: 8271286 +1252/20000 train_loss: 2.6788 train_time: 2.0m tok/s: 8271203 +1253/20000 train_loss: 2.6278 train_time: 2.0m tok/s: 8271164 +1254/20000 train_loss: 2.6861 train_time: 2.0m tok/s: 8271162 +1255/20000 train_loss: 2.4585 train_time: 2.0m tok/s: 8271194 +1256/20000 train_loss: 2.6725 train_time: 2.0m tok/s: 8271143 +1257/20000 train_loss: 2.6050 train_time: 2.0m tok/s: 8271059 +1258/20000 train_loss: 2.6371 train_time: 2.0m tok/s: 8271041 +1259/20000 train_loss: 2.7648 train_time: 2.0m tok/s: 8270918 +1260/20000 train_loss: 2.7043 train_time: 2.0m tok/s: 8271012 +1261/20000 train_loss: 2.7838 train_time: 2.0m tok/s: 8270979 +1262/20000 train_loss: 2.6962 train_time: 2.0m tok/s: 8270984 +1263/20000 train_loss: 2.7058 train_time: 2.0m tok/s: 8271037 +1264/20000 train_loss: 2.6339 train_time: 2.0m tok/s: 8271034 +1265/20000 train_loss: 2.6326 train_time: 2.0m tok/s: 8270958 +1266/20000 train_loss: 2.6479 train_time: 2.0m tok/s: 8270906 +1267/20000 train_loss: 2.6805 train_time: 2.0m tok/s: 8270817 +1268/20000 train_loss: 2.4899 train_time: 2.0m tok/s: 8270802 +1269/20000 train_loss: 2.6887 train_time: 2.0m tok/s: 8270820 +1270/20000 train_loss: 2.6524 train_time: 2.0m tok/s: 8270913 +1271/20000 train_loss: 2.5687 train_time: 2.0m tok/s: 8270903 +1272/20000 train_loss: 2.8080 train_time: 2.0m tok/s: 8271026 +1273/20000 train_loss: 2.7516 train_time: 2.0m tok/s: 8271025 +1274/20000 train_loss: 2.6803 train_time: 2.0m tok/s: 8271097 +1275/20000 train_loss: 2.7841 train_time: 2.0m tok/s: 8271163 +1276/20000 train_loss: 2.7148 train_time: 2.0m tok/s: 8271165 +1277/20000 train_loss: 2.7025 train_time: 2.0m tok/s: 8271185 +1278/20000 train_loss: 2.6237 train_time: 2.0m tok/s: 8271169 +1279/20000 train_loss: 2.7185 train_time: 2.0m tok/s: 8271100 +1280/20000 train_loss: 2.6428 train_time: 2.0m tok/s: 8271113 +1281/20000 train_loss: 2.8235 train_time: 2.0m tok/s: 8271063 +1282/20000 train_loss: 2.5391 train_time: 2.0m tok/s: 8271035 +1283/20000 train_loss: 2.6306 train_time: 2.0m tok/s: 8270886 +1284/20000 train_loss: 2.6197 train_time: 2.0m tok/s: 8271054 +1285/20000 train_loss: 2.7764 train_time: 2.0m tok/s: 8271022 +1286/20000 train_loss: 2.6702 train_time: 2.0m tok/s: 8271020 +1287/20000 train_loss: 2.7007 train_time: 2.0m tok/s: 8270942 +1288/20000 train_loss: 2.7251 train_time: 2.0m tok/s: 8270939 +1289/20000 train_loss: 2.7507 train_time: 2.0m tok/s: 8270922 +1290/20000 train_loss: 2.6368 train_time: 2.0m tok/s: 8270894 +1291/20000 train_loss: 2.7467 train_time: 2.0m tok/s: 8270914 +1292/20000 train_loss: 2.7294 train_time: 2.0m tok/s: 8270942 +1293/20000 train_loss: 2.7200 train_time: 2.0m tok/s: 8270960 +1294/20000 train_loss: 2.7252 train_time: 2.1m tok/s: 8270914 +1295/20000 train_loss: 2.7454 train_time: 2.1m tok/s: 8270895 +1296/20000 train_loss: 2.6883 train_time: 2.1m tok/s: 8271033 +1297/20000 train_loss: 2.5949 train_time: 2.1m tok/s: 8271040 +1298/20000 train_loss: 2.6618 train_time: 2.1m tok/s: 8271061 +1299/20000 train_loss: 2.5169 train_time: 2.1m tok/s: 8270977 +1300/20000 train_loss: 2.6310 train_time: 2.1m tok/s: 8270949 +1301/20000 train_loss: 2.6918 train_time: 2.1m tok/s: 8270994 +1302/20000 train_loss: 2.6737 train_time: 2.1m tok/s: 8270957 +1303/20000 train_loss: 2.8991 train_time: 2.1m tok/s: 8270928 +1304/20000 train_loss: 2.7449 train_time: 2.1m tok/s: 8270897 +1305/20000 train_loss: 2.7690 train_time: 2.1m tok/s: 8270913 +1306/20000 train_loss: 2.8625 train_time: 2.1m tok/s: 8270958 +1307/20000 train_loss: 2.6250 train_time: 2.1m tok/s: 8270912 +1308/20000 train_loss: 2.6301 train_time: 2.1m tok/s: 8270998 +1309/20000 train_loss: 2.6625 train_time: 2.1m tok/s: 8270977 +1310/20000 train_loss: 2.5681 train_time: 2.1m tok/s: 8270773 +1311/20000 train_loss: 2.6136 train_time: 2.1m tok/s: 8270632 +1312/20000 train_loss: 2.5481 train_time: 2.1m tok/s: 8270607 +1313/20000 train_loss: 2.5754 train_time: 2.1m tok/s: 8270588 +1314/20000 train_loss: 2.5602 train_time: 2.1m tok/s: 8270685 +1315/20000 train_loss: 2.4160 train_time: 2.1m tok/s: 8270545 +1316/20000 train_loss: 2.6861 train_time: 2.1m tok/s: 8270502 +1317/20000 train_loss: 2.6825 train_time: 2.1m tok/s: 8270579 +1318/20000 train_loss: 2.7179 train_time: 2.1m tok/s: 8270656 +1319/20000 train_loss: 2.8080 train_time: 2.1m tok/s: 8270617 +1320/20000 train_loss: 2.7331 train_time: 2.1m tok/s: 8270622 +1321/20000 train_loss: 2.7251 train_time: 2.1m tok/s: 8270616 +1322/20000 train_loss: 2.7416 train_time: 2.1m tok/s: 8270660 +1323/20000 train_loss: 2.5862 train_time: 2.1m tok/s: 8270544 +1324/20000 train_loss: 2.6470 train_time: 2.1m tok/s: 8270564 +1325/20000 train_loss: 2.8258 train_time: 2.1m tok/s: 8270557 +1326/20000 train_loss: 2.8342 train_time: 2.1m tok/s: 8270576 +1327/20000 train_loss: 2.6615 train_time: 2.1m tok/s: 8270603 +1328/20000 train_loss: 2.6558 train_time: 2.1m tok/s: 8270630 +1329/20000 train_loss: 2.7469 train_time: 2.1m tok/s: 8270616 +1330/20000 train_loss: 2.6174 train_time: 2.1m tok/s: 8270641 +1331/20000 train_loss: 2.6802 train_time: 2.1m tok/s: 8270561 +1332/20000 train_loss: 2.9139 train_time: 2.1m tok/s: 8270522 +1333/20000 train_loss: 2.8379 train_time: 2.1m tok/s: 8270443 +1334/20000 train_loss: 2.6754 train_time: 2.1m tok/s: 8270431 +1335/20000 train_loss: 2.6268 train_time: 2.1m tok/s: 8270404 +1336/20000 train_loss: 2.6569 train_time: 2.1m tok/s: 8270367 +1337/20000 train_loss: 2.8172 train_time: 2.1m tok/s: 8270371 +1338/20000 train_loss: 2.9068 train_time: 2.1m tok/s: 8270404 +1339/20000 train_loss: 2.6921 train_time: 2.1m tok/s: 8270426 +1340/20000 train_loss: 2.5348 train_time: 2.1m tok/s: 8270459 +1341/20000 train_loss: 2.5550 train_time: 2.1m tok/s: 8270413 +1342/20000 train_loss: 2.6512 train_time: 2.1m tok/s: 8270402 +1343/20000 train_loss: 2.6729 train_time: 2.1m tok/s: 8270339 +1344/20000 train_loss: 2.6871 train_time: 2.1m tok/s: 8270405 +1345/20000 train_loss: 2.6406 train_time: 2.1m tok/s: 8270464 +1346/20000 train_loss: 2.7679 train_time: 2.1m tok/s: 8270507 +1347/20000 train_loss: 2.7940 train_time: 2.1m tok/s: 8270467 +1348/20000 train_loss: 2.6956 train_time: 2.1m tok/s: 8270489 +1349/20000 train_loss: 2.7260 train_time: 2.1m tok/s: 8270565 +1350/20000 train_loss: 2.6802 train_time: 2.1m tok/s: 8270625 +1351/20000 train_loss: 2.7822 train_time: 2.1m tok/s: 8270617 +1352/20000 train_loss: 2.6763 train_time: 2.1m tok/s: 8270585 +1353/20000 train_loss: 2.7108 train_time: 2.1m tok/s: 8270626 +1354/20000 train_loss: 2.3805 train_time: 2.1m tok/s: 8270595 +1355/20000 train_loss: 2.5813 train_time: 2.1m tok/s: 8270503 +1356/20000 train_loss: 2.6868 train_time: 2.1m tok/s: 8270679 +1357/20000 train_loss: 2.6799 train_time: 2.2m tok/s: 8270688 +1358/20000 train_loss: 2.7457 train_time: 2.2m tok/s: 8270692 +1359/20000 train_loss: 2.5038 train_time: 2.2m tok/s: 8270647 +1360/20000 train_loss: 2.7404 train_time: 2.2m tok/s: 8270588 +1361/20000 train_loss: 2.6286 train_time: 2.2m tok/s: 8270584 +1362/20000 train_loss: 2.6247 train_time: 2.2m tok/s: 8270584 +1363/20000 train_loss: 2.7094 train_time: 2.2m tok/s: 8270645 +1364/20000 train_loss: 2.5351 train_time: 2.2m tok/s: 8270628 +1365/20000 train_loss: 2.5324 train_time: 2.2m tok/s: 8270496 +1366/20000 train_loss: 2.6189 train_time: 2.2m tok/s: 8270475 +1367/20000 train_loss: 2.7035 train_time: 2.2m tok/s: 8270415 +1368/20000 train_loss: 2.5711 train_time: 2.2m tok/s: 8270530 +1369/20000 train_loss: 2.6776 train_time: 2.2m tok/s: 8270522 +1370/20000 train_loss: 2.7239 train_time: 2.2m tok/s: 8270561 +1371/20000 train_loss: 2.6994 train_time: 2.2m tok/s: 8270506 +1372/20000 train_loss: 2.7319 train_time: 2.2m tok/s: 8270476 +1373/20000 train_loss: 2.6700 train_time: 2.2m tok/s: 8270463 +1374/20000 train_loss: 2.7778 train_time: 2.2m tok/s: 8270503 +1375/20000 train_loss: 2.7299 train_time: 2.2m tok/s: 8270585 +1376/20000 train_loss: 2.5925 train_time: 2.2m tok/s: 8270534 +1377/20000 train_loss: 2.6408 train_time: 2.2m tok/s: 8270498 +1378/20000 train_loss: 2.5872 train_time: 2.2m tok/s: 8270507 +1379/20000 train_loss: 2.6345 train_time: 2.2m tok/s: 8270479 +1380/20000 train_loss: 2.5997 train_time: 2.2m tok/s: 8270521 +1381/20000 train_loss: 2.6470 train_time: 2.2m tok/s: 8270485 +1382/20000 train_loss: 2.6647 train_time: 2.2m tok/s: 8270449 +1383/20000 train_loss: 2.6829 train_time: 2.2m tok/s: 8270483 +1384/20000 train_loss: 2.5805 train_time: 2.2m tok/s: 8270457 +1385/20000 train_loss: 2.6130 train_time: 2.2m tok/s: 8270415 +1386/20000 train_loss: 2.7666 train_time: 2.2m tok/s: 8270461 +1387/20000 train_loss: 2.6778 train_time: 2.2m tok/s: 8270487 +1388/20000 train_loss: 2.7559 train_time: 2.2m tok/s: 8270499 +1389/20000 train_loss: 2.6122 train_time: 2.2m tok/s: 8270484 +1390/20000 train_loss: 2.7408 train_time: 2.2m tok/s: 8270515 +1391/20000 train_loss: 2.5464 train_time: 2.2m tok/s: 8270457 +1392/20000 train_loss: 2.7404 train_time: 2.2m tok/s: 8270517 +1393/20000 train_loss: 2.6354 train_time: 2.2m tok/s: 8270511 +1394/20000 train_loss: 2.9067 train_time: 2.2m tok/s: 8270546 +1395/20000 train_loss: 2.5011 train_time: 2.2m tok/s: 8270494 +1396/20000 train_loss: 2.8269 train_time: 2.2m tok/s: 8270565 +1397/20000 train_loss: 2.7196 train_time: 2.2m tok/s: 8270535 +1398/20000 train_loss: 2.8019 train_time: 2.2m tok/s: 8270595 +1399/20000 train_loss: 2.6432 train_time: 2.2m tok/s: 8270645 +1400/20000 train_loss: 2.7398 train_time: 2.2m tok/s: 8270691 +1401/20000 train_loss: 2.7337 train_time: 2.2m tok/s: 8270682 +1402/20000 train_loss: 2.5672 train_time: 2.2m tok/s: 8270663 +1403/20000 train_loss: 2.5967 train_time: 2.2m tok/s: 8270698 +1404/20000 train_loss: 2.6853 train_time: 2.2m tok/s: 8270699 +1405/20000 train_loss: 2.7235 train_time: 2.2m tok/s: 8270756 +1406/20000 train_loss: 2.8445 train_time: 2.2m tok/s: 8270692 +1407/20000 train_loss: 2.5706 train_time: 2.2m tok/s: 8270607 +1408/20000 train_loss: 2.6950 train_time: 2.2m tok/s: 8270567 +1409/20000 train_loss: 2.7739 train_time: 2.2m tok/s: 8270539 +1410/20000 train_loss: 2.6606 train_time: 2.2m tok/s: 8270573 +1411/20000 train_loss: 2.6970 train_time: 2.2m tok/s: 8270607 +1412/20000 train_loss: 2.7363 train_time: 2.2m tok/s: 8270607 +1413/20000 train_loss: 2.6214 train_time: 2.2m tok/s: 8270553 +1414/20000 train_loss: 2.6030 train_time: 2.2m tok/s: 8270662 +1415/20000 train_loss: 2.6628 train_time: 2.2m tok/s: 8270725 +1416/20000 train_loss: 2.6205 train_time: 2.2m tok/s: 8270789 +1417/20000 train_loss: 2.6286 train_time: 2.2m tok/s: 8270812 +1418/20000 train_loss: 2.7583 train_time: 2.2m tok/s: 8270790 +1419/20000 train_loss: 2.6680 train_time: 2.2m tok/s: 8270751 +1420/20000 train_loss: 2.6302 train_time: 2.3m tok/s: 8270723 +1421/20000 train_loss: 2.7584 train_time: 2.3m tok/s: 8270749 +1422/20000 train_loss: 2.7406 train_time: 2.3m tok/s: 8270756 +1423/20000 train_loss: 2.7028 train_time: 2.3m tok/s: 8270800 +1424/20000 train_loss: 2.7122 train_time: 2.3m tok/s: 8270807 +1425/20000 train_loss: 2.6254 train_time: 2.3m tok/s: 8270809 +1426/20000 train_loss: 2.6917 train_time: 2.3m tok/s: 8270802 +1427/20000 train_loss: 2.6507 train_time: 2.3m tok/s: 8270818 +1428/20000 train_loss: 2.6508 train_time: 2.3m tok/s: 8270792 +1429/20000 train_loss: 2.5917 train_time: 2.3m tok/s: 8270729 +1430/20000 train_loss: 2.6300 train_time: 2.3m tok/s: 8270706 +1431/20000 train_loss: 2.6045 train_time: 2.3m tok/s: 8270725 +1432/20000 train_loss: 2.4441 train_time: 2.3m tok/s: 8270686 +1433/20000 train_loss: 2.6305 train_time: 2.3m tok/s: 8270650 +1434/20000 train_loss: 2.7485 train_time: 2.3m tok/s: 8270610 +1435/20000 train_loss: 2.7226 train_time: 2.3m tok/s: 8270632 +1436/20000 train_loss: 2.6228 train_time: 2.3m tok/s: 8270737 +1437/20000 train_loss: 2.7086 train_time: 2.3m tok/s: 8270756 +1438/20000 train_loss: 2.7516 train_time: 2.3m tok/s: 8270818 +1439/20000 train_loss: 2.6718 train_time: 2.3m tok/s: 8270809 +1440/20000 train_loss: 2.7113 train_time: 2.3m tok/s: 8270661 +1441/20000 train_loss: 2.6674 train_time: 2.3m tok/s: 8270639 +1442/20000 train_loss: 2.5837 train_time: 2.3m tok/s: 8270569 +1443/20000 train_loss: 2.5979 train_time: 2.3m tok/s: 8270560 +1444/20000 train_loss: 2.5518 train_time: 2.3m tok/s: 8270628 +1445/20000 train_loss: 2.6714 train_time: 2.3m tok/s: 8270533 +1446/20000 train_loss: 2.7767 train_time: 2.3m tok/s: 8270484 +1447/20000 train_loss: 2.7219 train_time: 2.3m tok/s: 8270495 +1448/20000 train_loss: 2.7042 train_time: 2.3m tok/s: 8270530 +1449/20000 train_loss: 2.6409 train_time: 2.3m tok/s: 8270591 +1450/20000 train_loss: 2.7357 train_time: 2.3m tok/s: 8270621 +1451/20000 train_loss: 2.5988 train_time: 2.3m tok/s: 8270585 +1452/20000 train_loss: 2.6215 train_time: 2.3m tok/s: 8270570 +1453/20000 train_loss: 2.6470 train_time: 2.3m tok/s: 8270596 +1454/20000 train_loss: 2.7532 train_time: 2.3m tok/s: 8270513 +1455/20000 train_loss: 2.5730 train_time: 2.3m tok/s: 8270532 +1456/20000 train_loss: 2.4581 train_time: 2.3m tok/s: 8270561 +1457/20000 train_loss: 2.4810 train_time: 2.3m tok/s: 8270575 +1458/20000 train_loss: 2.5994 train_time: 2.3m tok/s: 8270493 +1459/20000 train_loss: 2.6739 train_time: 2.3m tok/s: 8270485 +1460/20000 train_loss: 2.7018 train_time: 2.3m tok/s: 8270549 +1461/20000 train_loss: 2.7760 train_time: 2.3m tok/s: 8270581 +1462/20000 train_loss: 2.6521 train_time: 2.3m tok/s: 8270607 +1463/20000 train_loss: 2.6654 train_time: 2.3m tok/s: 8270622 +1464/20000 train_loss: 2.6520 train_time: 2.3m tok/s: 8270567 +1465/20000 train_loss: 2.6820 train_time: 2.3m tok/s: 8270499 +1466/20000 train_loss: 2.6196 train_time: 2.3m tok/s: 8270543 +1467/20000 train_loss: 2.6202 train_time: 2.3m tok/s: 8270629 +1468/20000 train_loss: 2.5601 train_time: 2.3m tok/s: 8270511 +1469/20000 train_loss: 2.5881 train_time: 2.3m tok/s: 8270518 +1470/20000 train_loss: 2.5077 train_time: 2.3m tok/s: 8270517 +1471/20000 train_loss: 2.8105 train_time: 2.3m tok/s: 8270500 +1472/20000 train_loss: 2.8809 train_time: 2.3m tok/s: 8270435 +1473/20000 train_loss: 2.7799 train_time: 2.3m tok/s: 8270319 +1474/20000 train_loss: 2.7368 train_time: 2.3m tok/s: 8270296 +1475/20000 train_loss: 2.6898 train_time: 2.3m tok/s: 8270369 +1476/20000 train_loss: 2.7751 train_time: 2.3m tok/s: 8270396 +1477/20000 train_loss: 2.6019 train_time: 2.3m tok/s: 8270378 +1478/20000 train_loss: 2.6079 train_time: 2.3m tok/s: 8270368 +1479/20000 train_loss: 2.5889 train_time: 2.3m tok/s: 8270408 +1480/20000 train_loss: 2.6080 train_time: 2.3m tok/s: 8270496 +1481/20000 train_loss: 2.6360 train_time: 2.3m tok/s: 8270547 +1482/20000 train_loss: 3.0598 train_time: 2.3m tok/s: 8270464 +1483/20000 train_loss: 2.6962 train_time: 2.4m tok/s: 8270348 +1484/20000 train_loss: 2.6308 train_time: 2.4m tok/s: 8270432 +1485/20000 train_loss: 2.7918 train_time: 2.4m tok/s: 8270467 +1486/20000 train_loss: 2.6001 train_time: 2.4m tok/s: 8270431 +1487/20000 train_loss: 2.6952 train_time: 2.4m tok/s: 8270404 +1488/20000 train_loss: 2.6568 train_time: 2.4m tok/s: 8270481 +1489/20000 train_loss: 2.5767 train_time: 2.4m tok/s: 8270493 +1490/20000 train_loss: 2.6729 train_time: 2.4m tok/s: 8270522 +1491/20000 train_loss: 2.6710 train_time: 2.4m tok/s: 8270500 +1492/20000 train_loss: 2.6003 train_time: 2.4m tok/s: 8270499 +1493/20000 train_loss: 2.6659 train_time: 2.4m tok/s: 8270517 +1494/20000 train_loss: 2.6391 train_time: 2.4m tok/s: 8270507 +1495/20000 train_loss: 2.5699 train_time: 2.4m tok/s: 8270477 +1496/20000 train_loss: 2.6783 train_time: 2.4m tok/s: 8270422 +1497/20000 train_loss: 2.5975 train_time: 2.4m tok/s: 8270434 +1498/20000 train_loss: 2.9254 train_time: 2.4m tok/s: 8270512 +1499/20000 train_loss: 2.6914 train_time: 2.4m tok/s: 8270499 +1500/20000 train_loss: 2.7211 train_time: 2.4m tok/s: 8270543 +1501/20000 train_loss: 2.6918 train_time: 2.4m tok/s: 8270567 +1502/20000 train_loss: 2.8059 train_time: 2.4m tok/s: 8270513 +1503/20000 train_loss: 2.6777 train_time: 2.4m tok/s: 8270422 +1504/20000 train_loss: 2.7333 train_time: 2.4m tok/s: 8270440 +1505/20000 train_loss: 2.6807 train_time: 2.4m tok/s: 8270517 +1506/20000 train_loss: 2.7115 train_time: 2.4m tok/s: 8270476 +1507/20000 train_loss: 2.8285 train_time: 2.4m tok/s: 8270482 +1508/20000 train_loss: 2.5349 train_time: 2.4m tok/s: 8270426 +1509/20000 train_loss: 2.5837 train_time: 2.4m tok/s: 8270462 +1510/20000 train_loss: 2.5422 train_time: 2.4m tok/s: 8270492 +1511/20000 train_loss: 2.5024 train_time: 2.4m tok/s: 8270397 +1512/20000 train_loss: 2.5704 train_time: 2.4m tok/s: 8270415 +1513/20000 train_loss: 2.7204 train_time: 2.4m tok/s: 8270457 +1514/20000 train_loss: 2.7420 train_time: 2.4m tok/s: 8270462 +1515/20000 train_loss: 2.6996 train_time: 2.4m tok/s: 8270477 +1516/20000 train_loss: 2.5885 train_time: 2.4m tok/s: 8270436 +1517/20000 train_loss: 2.5922 train_time: 2.4m tok/s: 8270427 +1518/20000 train_loss: 2.7324 train_time: 2.4m tok/s: 8270431 +1519/20000 train_loss: 2.6603 train_time: 2.4m tok/s: 8270389 +1520/20000 train_loss: 2.6695 train_time: 2.4m tok/s: 8270407 +1521/20000 train_loss: 2.6455 train_time: 2.4m tok/s: 8270441 +1522/20000 train_loss: 2.6485 train_time: 2.4m tok/s: 8270513 +1523/20000 train_loss: 2.6795 train_time: 2.4m tok/s: 8270487 +1524/20000 train_loss: 2.6406 train_time: 2.4m tok/s: 8270429 +1525/20000 train_loss: 2.6305 train_time: 2.4m tok/s: 8270370 +1526/20000 train_loss: 2.7235 train_time: 2.4m tok/s: 8270307 +1527/20000 train_loss: 2.6760 train_time: 2.4m tok/s: 8270264 +1528/20000 train_loss: 2.4670 train_time: 2.4m tok/s: 8270264 +1529/20000 train_loss: 2.6278 train_time: 2.4m tok/s: 8270346 +1530/20000 train_loss: 2.6118 train_time: 2.4m tok/s: 8270379 +1531/20000 train_loss: 2.3620 train_time: 2.4m tok/s: 8270357 +1532/20000 train_loss: 2.6195 train_time: 2.4m tok/s: 8270363 +1533/20000 train_loss: 2.6772 train_time: 2.4m tok/s: 8270331 +1534/20000 train_loss: 2.6350 train_time: 2.4m tok/s: 8270269 +1535/20000 train_loss: 2.7697 train_time: 2.4m tok/s: 8270297 +1536/20000 train_loss: 2.7000 train_time: 2.4m tok/s: 8270234 +1537/20000 train_loss: 3.0812 train_time: 2.4m tok/s: 8270200 +1538/20000 train_loss: 2.7208 train_time: 2.4m tok/s: 8270160 +1539/20000 train_loss: 2.6305 train_time: 2.4m tok/s: 8270180 +1540/20000 train_loss: 2.6815 train_time: 2.4m tok/s: 8270204 +1541/20000 train_loss: 2.5916 train_time: 2.4m tok/s: 8270237 +1542/20000 train_loss: 2.6194 train_time: 2.4m tok/s: 8270316 +1543/20000 train_loss: 2.6343 train_time: 2.4m tok/s: 8270369 +1544/20000 train_loss: 2.5819 train_time: 2.4m tok/s: 8270334 +1545/20000 train_loss: 2.6209 train_time: 2.4m tok/s: 8270380 +1546/20000 train_loss: 2.4978 train_time: 2.5m tok/s: 8270372 +1547/20000 train_loss: 2.7439 train_time: 2.5m tok/s: 8270314 +1548/20000 train_loss: 2.6878 train_time: 2.5m tok/s: 8270320 +1549/20000 train_loss: 2.5786 train_time: 2.5m tok/s: 8270318 +1550/20000 train_loss: 2.7158 train_time: 2.5m tok/s: 8270266 +1551/20000 train_loss: 2.6758 train_time: 2.5m tok/s: 8270255 +1552/20000 train_loss: 2.5518 train_time: 2.5m tok/s: 8270297 +1553/20000 train_loss: 2.4901 train_time: 2.5m tok/s: 8270283 +1554/20000 train_loss: 2.5903 train_time: 2.5m tok/s: 8270300 +1555/20000 train_loss: 2.6307 train_time: 2.5m tok/s: 8270306 +1556/20000 train_loss: 2.5158 train_time: 2.5m tok/s: 8270283 +1557/20000 train_loss: 2.5564 train_time: 2.5m tok/s: 8270213 +1558/20000 train_loss: 2.5730 train_time: 2.5m tok/s: 8270179 +1559/20000 train_loss: 2.5503 train_time: 2.5m tok/s: 8270185 +1560/20000 train_loss: 2.6215 train_time: 2.5m tok/s: 8270195 +1561/20000 train_loss: 2.5461 train_time: 2.5m tok/s: 8270129 +1562/20000 train_loss: 2.5889 train_time: 2.5m tok/s: 8270166 +1563/20000 train_loss: 2.4931 train_time: 2.5m tok/s: 8270201 +1564/20000 train_loss: 2.5873 train_time: 2.5m tok/s: 8270232 +1565/20000 train_loss: 2.5706 train_time: 2.5m tok/s: 8270203 +1566/20000 train_loss: 2.7476 train_time: 2.5m tok/s: 8270196 +1567/20000 train_loss: 2.6869 train_time: 2.5m tok/s: 8270214 +1568/20000 train_loss: 2.5299 train_time: 2.5m tok/s: 8270238 +1569/20000 train_loss: 2.5974 train_time: 2.5m tok/s: 8270250 +1570/20000 train_loss: 2.5485 train_time: 2.5m tok/s: 8270198 +1571/20000 train_loss: 2.6191 train_time: 2.5m tok/s: 8270179 +1572/20000 train_loss: 3.2035 train_time: 2.5m tok/s: 8270182 +1573/20000 train_loss: 2.7553 train_time: 2.5m tok/s: 8270167 +1574/20000 train_loss: 2.6013 train_time: 2.5m tok/s: 8270132 +1575/20000 train_loss: 2.5516 train_time: 2.5m tok/s: 8270142 +1576/20000 train_loss: 2.5406 train_time: 2.5m tok/s: 8270139 +1577/20000 train_loss: 2.5758 train_time: 2.5m tok/s: 8270106 +1578/20000 train_loss: 2.4999 train_time: 2.5m tok/s: 8270048 +1579/20000 train_loss: 2.7629 train_time: 2.5m tok/s: 8270036 +1580/20000 train_loss: 2.6526 train_time: 2.5m tok/s: 8270087 +1581/20000 train_loss: 2.4929 train_time: 2.5m tok/s: 8270091 +1582/20000 train_loss: 2.5170 train_time: 2.5m tok/s: 8270019 +1583/20000 train_loss: 2.5842 train_time: 2.5m tok/s: 8270038 +1584/20000 train_loss: 2.5631 train_time: 2.5m tok/s: 8270105 +1585/20000 train_loss: 2.7101 train_time: 2.5m tok/s: 8270178 +1586/20000 train_loss: 2.5567 train_time: 2.5m tok/s: 8270181 +1587/20000 train_loss: 2.5940 train_time: 2.5m tok/s: 8270189 +1588/20000 train_loss: 2.6345 train_time: 2.5m tok/s: 8270193 +1589/20000 train_loss: 2.6973 train_time: 2.5m tok/s: 8270172 +1590/20000 train_loss: 2.6502 train_time: 2.5m tok/s: 8270154 +1591/20000 train_loss: 2.6407 train_time: 2.5m tok/s: 8270172 +1592/20000 train_loss: 2.5706 train_time: 2.5m tok/s: 8270223 +1593/20000 train_loss: 2.6411 train_time: 2.5m tok/s: 8270241 +1594/20000 train_loss: 2.7418 train_time: 2.5m tok/s: 8270268 +1595/20000 train_loss: 2.6727 train_time: 2.5m tok/s: 8270263 +1596/20000 train_loss: 2.4467 train_time: 2.5m tok/s: 8270250 +1597/20000 train_loss: 2.5642 train_time: 2.5m tok/s: 8270214 +1598/20000 train_loss: 2.6222 train_time: 2.5m tok/s: 8270201 +1599/20000 train_loss: 2.6190 train_time: 2.5m tok/s: 8270245 +1600/20000 train_loss: 2.8158 train_time: 2.5m tok/s: 8270277 +1601/20000 train_loss: 2.6533 train_time: 2.5m tok/s: 8270301 +1602/20000 train_loss: 2.7566 train_time: 2.5m tok/s: 8270176 +1603/20000 train_loss: 2.5788 train_time: 2.5m tok/s: 8270173 +1604/20000 train_loss: 2.6001 train_time: 2.5m tok/s: 8270214 +1605/20000 train_loss: 2.6217 train_time: 2.5m tok/s: 8270162 +1606/20000 train_loss: 2.6111 train_time: 2.5m tok/s: 8270149 +1607/20000 train_loss: 2.5305 train_time: 2.5m tok/s: 8270186 +1608/20000 train_loss: 2.5057 train_time: 2.5m tok/s: 8270186 +1609/20000 train_loss: 2.7044 train_time: 2.6m tok/s: 8270221 +1610/20000 train_loss: 2.6040 train_time: 2.6m tok/s: 8270207 +1611/20000 train_loss: 2.5886 train_time: 2.6m tok/s: 8270210 +1612/20000 train_loss: 2.6554 train_time: 2.6m tok/s: 8270197 +1613/20000 train_loss: 2.6536 train_time: 2.6m tok/s: 8270237 +1614/20000 train_loss: 2.7161 train_time: 2.6m tok/s: 8270248 +1615/20000 train_loss: 2.7244 train_time: 2.6m tok/s: 8270227 +1616/20000 train_loss: 2.6620 train_time: 2.6m tok/s: 8270301 +1617/20000 train_loss: 2.6044 train_time: 2.6m tok/s: 8270154 +1618/20000 train_loss: 3.0161 train_time: 2.6m tok/s: 8270290 +1619/20000 train_loss: 2.7357 train_time: 2.6m tok/s: 8270283 +1620/20000 train_loss: 2.5606 train_time: 2.6m tok/s: 8270311 +1621/20000 train_loss: 2.5782 train_time: 2.6m tok/s: 8270341 +1622/20000 train_loss: 2.7564 train_time: 2.6m tok/s: 8270377 +1623/20000 train_loss: 2.6709 train_time: 2.6m tok/s: 8270365 +1624/20000 train_loss: 2.6237 train_time: 2.6m tok/s: 8270408 +1625/20000 train_loss: 2.6323 train_time: 2.6m tok/s: 8270423 +1626/20000 train_loss: 2.7038 train_time: 2.6m tok/s: 8270436 +1627/20000 train_loss: 2.4403 train_time: 2.6m tok/s: 8270432 +1628/20000 train_loss: 2.5967 train_time: 2.6m tok/s: 8270441 +1629/20000 train_loss: 2.5721 train_time: 2.6m tok/s: 8270486 +1630/20000 train_loss: 2.5862 train_time: 2.6m tok/s: 8270523 +1631/20000 train_loss: 2.8018 train_time: 2.6m tok/s: 8270553 +1632/20000 train_loss: 2.7057 train_time: 2.6m tok/s: 8270591 +1633/20000 train_loss: 2.6693 train_time: 2.6m tok/s: 8270573 +1634/20000 train_loss: 2.6245 train_time: 2.6m tok/s: 8270590 +1635/20000 train_loss: 2.6785 train_time: 2.6m tok/s: 8270594 +1636/20000 train_loss: 2.4657 train_time: 2.6m tok/s: 8270599 +1637/20000 train_loss: 2.5595 train_time: 2.6m tok/s: 8270482 +1638/20000 train_loss: 2.5024 train_time: 2.6m tok/s: 8270445 +1639/20000 train_loss: 2.5287 train_time: 2.6m tok/s: 8270411 +1640/20000 train_loss: 2.3754 train_time: 2.6m tok/s: 8270436 +1641/20000 train_loss: 2.5488 train_time: 2.6m tok/s: 8270435 +1642/20000 train_loss: 2.7542 train_time: 2.6m tok/s: 8270460 +1643/20000 train_loss: 2.4456 train_time: 2.6m tok/s: 8270499 +1644/20000 train_loss: 2.4736 train_time: 2.6m tok/s: 8270525 +1645/20000 train_loss: 2.7629 train_time: 2.6m tok/s: 8270451 +1646/20000 train_loss: 2.5206 train_time: 2.6m tok/s: 8270432 +1647/20000 train_loss: 2.7447 train_time: 2.6m tok/s: 8270388 +1648/20000 train_loss: 2.6432 train_time: 2.6m tok/s: 8270380 +1649/20000 train_loss: 2.7490 train_time: 2.6m tok/s: 8270360 +1650/20000 train_loss: 2.5658 train_time: 2.6m tok/s: 8270328 +1651/20000 train_loss: 2.7362 train_time: 2.6m tok/s: 8270364 +1652/20000 train_loss: 2.6462 train_time: 2.6m tok/s: 8270400 +1653/20000 train_loss: 2.7514 train_time: 2.6m tok/s: 8270454 +1654/20000 train_loss: 2.6774 train_time: 2.6m tok/s: 8270473 +1655/20000 train_loss: 2.5615 train_time: 2.6m tok/s: 8270511 +1656/20000 train_loss: 2.6057 train_time: 2.6m tok/s: 8270569 +1657/20000 train_loss: 2.6442 train_time: 2.6m tok/s: 8270490 +1658/20000 train_loss: 2.6361 train_time: 2.6m tok/s: 8270524 +1659/20000 train_loss: 2.5921 train_time: 2.6m tok/s: 8270543 +1660/20000 train_loss: 2.5560 train_time: 2.6m tok/s: 8270510 +1661/20000 train_loss: 2.7493 train_time: 2.6m tok/s: 8270473 +1662/20000 train_loss: 2.7388 train_time: 2.6m tok/s: 8270334 +1663/20000 train_loss: 2.8002 train_time: 2.6m tok/s: 8270220 +1664/20000 train_loss: 2.8086 train_time: 2.6m tok/s: 8270242 +1665/20000 train_loss: 2.8016 train_time: 2.6m tok/s: 8270212 +1666/20000 train_loss: 2.7030 train_time: 2.6m tok/s: 8270191 +1667/20000 train_loss: 2.6114 train_time: 2.6m tok/s: 8270181 +1668/20000 train_loss: 2.6355 train_time: 2.6m tok/s: 8270188 +1669/20000 train_loss: 2.7580 train_time: 2.6m tok/s: 8270207 +1670/20000 train_loss: 2.5550 train_time: 2.6m tok/s: 8270173 +1671/20000 train_loss: 2.4835 train_time: 2.6m tok/s: 8270179 +1672/20000 train_loss: 2.6117 train_time: 2.6m tok/s: 8270188 +1673/20000 train_loss: 2.5763 train_time: 2.7m tok/s: 8270210 +1674/20000 train_loss: 2.6455 train_time: 2.7m tok/s: 8270267 +1675/20000 train_loss: 2.4375 train_time: 2.7m tok/s: 8270242 +1676/20000 train_loss: 2.6874 train_time: 2.7m tok/s: 8270247 +1677/20000 train_loss: 2.6034 train_time: 2.7m tok/s: 8270266 +1678/20000 train_loss: 2.6733 train_time: 2.7m tok/s: 8270203 +1679/20000 train_loss: 2.6140 train_time: 2.7m tok/s: 8270146 +1680/20000 train_loss: 2.5398 train_time: 2.7m tok/s: 8270112 +1681/20000 train_loss: 2.5185 train_time: 2.7m tok/s: 8270143 +1682/20000 train_loss: 2.6251 train_time: 2.7m tok/s: 8270185 +1683/20000 train_loss: 2.6256 train_time: 2.7m tok/s: 8270129 +1684/20000 train_loss: 2.5788 train_time: 2.7m tok/s: 8270089 +1685/20000 train_loss: 2.6792 train_time: 2.7m tok/s: 8270102 +1686/20000 train_loss: 2.5840 train_time: 2.7m tok/s: 8270146 +1687/20000 train_loss: 2.5504 train_time: 2.7m tok/s: 8270154 +1688/20000 train_loss: 2.5705 train_time: 2.7m tok/s: 8270154 +1689/20000 train_loss: 2.5242 train_time: 2.7m tok/s: 8270139 +1690/20000 train_loss: 2.8012 train_time: 2.7m tok/s: 8270164 +1691/20000 train_loss: 2.5931 train_time: 2.7m tok/s: 8270179 +1692/20000 train_loss: 2.5863 train_time: 2.7m tok/s: 8270185 +1693/20000 train_loss: 2.3909 train_time: 2.7m tok/s: 8270212 +1694/20000 train_loss: 2.6178 train_time: 2.7m tok/s: 8270268 +1695/20000 train_loss: 2.6185 train_time: 2.7m tok/s: 8270291 +1696/20000 train_loss: 2.6575 train_time: 2.7m tok/s: 8270373 +1697/20000 train_loss: 2.7568 train_time: 2.7m tok/s: 8270418 +1698/20000 train_loss: 2.6522 train_time: 2.7m tok/s: 8270484 +1699/20000 train_loss: 2.7500 train_time: 2.7m tok/s: 8270441 +1700/20000 train_loss: 2.5849 train_time: 2.7m tok/s: 8270464 +1701/20000 train_loss: 2.4808 train_time: 2.7m tok/s: 8270479 +1702/20000 train_loss: 2.6125 train_time: 2.7m tok/s: 8270489 +1703/20000 train_loss: 2.6481 train_time: 2.7m tok/s: 8270509 +1704/20000 train_loss: 2.7581 train_time: 2.7m tok/s: 8270545 +1705/20000 train_loss: 2.7739 train_time: 2.7m tok/s: 8270495 +1706/20000 train_loss: 2.7012 train_time: 2.7m tok/s: 8270525 +1707/20000 train_loss: 2.8443 train_time: 2.7m tok/s: 8270557 +1708/20000 train_loss: 2.4774 train_time: 2.7m tok/s: 8270494 +1709/20000 train_loss: 2.6630 train_time: 2.7m tok/s: 8270459 +1710/20000 train_loss: 2.6381 train_time: 2.7m tok/s: 8270519 +1711/20000 train_loss: 2.5757 train_time: 2.7m tok/s: 8270547 +1712/20000 train_loss: 2.7029 train_time: 2.7m tok/s: 8270535 +1713/20000 train_loss: 2.7852 train_time: 2.7m tok/s: 8270564 +1714/20000 train_loss: 2.5642 train_time: 2.7m tok/s: 8270558 +1715/20000 train_loss: 2.7500 train_time: 2.7m tok/s: 8270520 +1716/20000 train_loss: 2.7289 train_time: 2.7m tok/s: 8270485 +1717/20000 train_loss: 2.7390 train_time: 2.7m tok/s: 8270493 +1718/20000 train_loss: 2.8177 train_time: 2.7m tok/s: 8270536 +1719/20000 train_loss: 2.7152 train_time: 2.7m tok/s: 8270530 +1720/20000 train_loss: 2.5324 train_time: 2.7m tok/s: 8270482 +1721/20000 train_loss: 2.6030 train_time: 2.7m tok/s: 8270521 +1722/20000 train_loss: 2.7324 train_time: 2.7m tok/s: 8270526 +1723/20000 train_loss: 2.6240 train_time: 2.7m tok/s: 8270556 +1724/20000 train_loss: 2.6658 train_time: 2.7m tok/s: 8270585 +1725/20000 train_loss: 2.5983 train_time: 2.7m tok/s: 8270558 +1726/20000 train_loss: 2.6301 train_time: 2.7m tok/s: 8270564 +1727/20000 train_loss: 2.5945 train_time: 2.7m tok/s: 8270563 +1728/20000 train_loss: 2.8364 train_time: 2.7m tok/s: 8270574 +1729/20000 train_loss: 2.6610 train_time: 2.7m tok/s: 8270557 +1730/20000 train_loss: 2.7433 train_time: 2.7m tok/s: 8270581 +1731/20000 train_loss: 2.7461 train_time: 2.7m tok/s: 8270615 +1732/20000 train_loss: 2.7099 train_time: 2.7m tok/s: 8270696 +1733/20000 train_loss: 2.7003 train_time: 2.7m tok/s: 8270719 +1734/20000 train_loss: 2.6089 train_time: 2.7m tok/s: 8270678 +1735/20000 train_loss: 2.4654 train_time: 2.7m tok/s: 8270658 +1736/20000 train_loss: 2.6996 train_time: 2.8m tok/s: 8270559 +1737/20000 train_loss: 2.5773 train_time: 2.8m tok/s: 8270550 +1738/20000 train_loss: 2.8106 train_time: 2.8m tok/s: 8270560 +1739/20000 train_loss: 2.7449 train_time: 2.8m tok/s: 8270483 +1740/20000 train_loss: 2.3891 train_time: 2.8m tok/s: 8270542 +1741/20000 train_loss: 2.7966 train_time: 2.8m tok/s: 8270595 +1742/20000 train_loss: 2.6085 train_time: 2.8m tok/s: 8270590 +1743/20000 train_loss: 2.5202 train_time: 2.8m tok/s: 8270572 +1744/20000 train_loss: 2.5886 train_time: 2.8m tok/s: 8270569 +1745/20000 train_loss: 2.6338 train_time: 2.8m tok/s: 8270616 +1746/20000 train_loss: 2.6045 train_time: 2.8m tok/s: 8270648 +1747/20000 train_loss: 2.6416 train_time: 2.8m tok/s: 8270649 +1748/20000 train_loss: 2.5712 train_time: 2.8m tok/s: 8270650 +1749/20000 train_loss: 2.6336 train_time: 2.8m tok/s: 8270629 +1750/20000 train_loss: 2.6746 train_time: 2.8m tok/s: 8270624 +1751/20000 train_loss: 2.6711 train_time: 2.8m tok/s: 8270624 +1752/20000 train_loss: 2.6279 train_time: 2.8m tok/s: 8270680 +1753/20000 train_loss: 2.6126 train_time: 2.8m tok/s: 8270719 +1754/20000 train_loss: 2.6790 train_time: 2.8m tok/s: 8270722 +1755/20000 train_loss: 2.5904 train_time: 2.8m tok/s: 8270757 +1756/20000 train_loss: 2.6107 train_time: 2.8m tok/s: 8270709 +1757/20000 train_loss: 2.5868 train_time: 2.8m tok/s: 8270678 +1758/20000 train_loss: 2.8622 train_time: 2.8m tok/s: 8270668 +1759/20000 train_loss: 2.6338 train_time: 2.8m tok/s: 8270628 +1760/20000 train_loss: 2.5321 train_time: 2.8m tok/s: 8270638 +1761/20000 train_loss: 2.6417 train_time: 2.8m tok/s: 8270684 +1762/20000 train_loss: 2.6987 train_time: 2.8m tok/s: 8270671 +1763/20000 train_loss: 2.7128 train_time: 2.8m tok/s: 8270701 +1764/20000 train_loss: 2.6543 train_time: 2.8m tok/s: 8270725 +1765/20000 train_loss: 2.5892 train_time: 2.8m tok/s: 8270777 +1766/20000 train_loss: 2.7113 train_time: 2.8m tok/s: 8270777 +1767/20000 train_loss: 2.5767 train_time: 2.8m tok/s: 8270753 +1768/20000 train_loss: 2.6471 train_time: 2.8m tok/s: 8270737 +1769/20000 train_loss: 2.6456 train_time: 2.8m tok/s: 8270688 +1770/20000 train_loss: 2.6673 train_time: 2.8m tok/s: 8270678 +1771/20000 train_loss: 2.5297 train_time: 2.8m tok/s: 8270678 +1772/20000 train_loss: 2.5530 train_time: 2.8m tok/s: 8270719 +1773/20000 train_loss: 2.8531 train_time: 2.8m tok/s: 8270738 +1774/20000 train_loss: 2.7178 train_time: 2.8m tok/s: 8270802 +1775/20000 train_loss: 2.7522 train_time: 2.8m tok/s: 8270854 +1776/20000 train_loss: 2.5774 train_time: 2.8m tok/s: 8270823 +1777/20000 train_loss: 2.6951 train_time: 2.8m tok/s: 8270898 +1778/20000 train_loss: 2.6556 train_time: 2.8m tok/s: 8270875 +1779/20000 train_loss: 2.6563 train_time: 2.8m tok/s: 8270877 +1780/20000 train_loss: 2.6565 train_time: 2.8m tok/s: 8270871 +1781/20000 train_loss: 2.5272 train_time: 2.8m tok/s: 8270894 +1782/20000 train_loss: 2.4157 train_time: 2.8m tok/s: 8270876 +1783/20000 train_loss: 2.6253 train_time: 2.8m tok/s: 8270815 +1784/20000 train_loss: 2.6316 train_time: 2.8m tok/s: 8270852 +1785/20000 train_loss: 2.6409 train_time: 2.8m tok/s: 8270909 +1786/20000 train_loss: 2.7975 train_time: 2.8m tok/s: 8270935 +1787/20000 train_loss: 2.7149 train_time: 2.8m tok/s: 8270937 +1788/20000 train_loss: 2.6365 train_time: 2.8m tok/s: 8270985 +1789/20000 train_loss: 2.7192 train_time: 2.8m tok/s: 8271006 +1790/20000 train_loss: 2.5279 train_time: 2.8m tok/s: 8271016 +1791/20000 train_loss: 2.3572 train_time: 2.8m tok/s: 8271015 +1792/20000 train_loss: 2.6321 train_time: 2.8m tok/s: 8271010 +1793/20000 train_loss: 2.4955 train_time: 2.8m tok/s: 8271017 +1794/20000 train_loss: 2.4322 train_time: 2.8m tok/s: 8271019 +1795/20000 train_loss: 2.6224 train_time: 2.8m tok/s: 8271049 +1796/20000 train_loss: 2.6510 train_time: 2.8m tok/s: 8271126 +1797/20000 train_loss: 2.8101 train_time: 2.8m tok/s: 8271152 +1798/20000 train_loss: 2.5723 train_time: 2.8m tok/s: 8271094 +1799/20000 train_loss: 2.6575 train_time: 2.9m tok/s: 8271118 +1800/20000 train_loss: 2.5575 train_time: 2.9m tok/s: 8271192 +1801/20000 train_loss: 2.6881 train_time: 2.9m tok/s: 8271179 +1802/20000 train_loss: 2.5873 train_time: 2.9m tok/s: 8271139 +1803/20000 train_loss: 2.6422 train_time: 2.9m tok/s: 8271096 +1804/20000 train_loss: 2.6027 train_time: 2.9m tok/s: 8271069 +1805/20000 train_loss: 2.5909 train_time: 2.9m tok/s: 8271082 +1806/20000 train_loss: 2.8073 train_time: 2.9m tok/s: 8270981 +1807/20000 train_loss: 2.7088 train_time: 2.9m tok/s: 8270962 +1808/20000 train_loss: 2.7248 train_time: 2.9m tok/s: 8270982 +1809/20000 train_loss: 2.6162 train_time: 2.9m tok/s: 8270944 +1810/20000 train_loss: 2.7194 train_time: 2.9m tok/s: 8270861 +1811/20000 train_loss: 2.5676 train_time: 2.9m tok/s: 8270799 +1812/20000 train_loss: 2.6144 train_time: 2.9m tok/s: 8270822 +1813/20000 train_loss: 2.7021 train_time: 2.9m tok/s: 8270851 +1814/20000 train_loss: 2.7482 train_time: 2.9m tok/s: 8270888 +1815/20000 train_loss: 2.5562 train_time: 2.9m tok/s: 8270895 +1816/20000 train_loss: 2.4914 train_time: 2.9m tok/s: 8270920 +1817/20000 train_loss: 2.8474 train_time: 2.9m tok/s: 8270946 +1818/20000 train_loss: 2.6736 train_time: 2.9m tok/s: 8270953 +1819/20000 train_loss: 2.6948 train_time: 2.9m tok/s: 8271005 +1820/20000 train_loss: 2.5812 train_time: 2.9m tok/s: 8271046 +1821/20000 train_loss: 2.6482 train_time: 2.9m tok/s: 8271084 +1822/20000 train_loss: 2.7037 train_time: 2.9m tok/s: 8271118 +1823/20000 train_loss: 2.4019 train_time: 2.9m tok/s: 8271071 +1824/20000 train_loss: 2.6084 train_time: 2.9m tok/s: 8271036 +1825/20000 train_loss: 2.6297 train_time: 2.9m tok/s: 8271037 +1826/20000 train_loss: 2.4610 train_time: 2.9m tok/s: 8271020 +1827/20000 train_loss: 2.5793 train_time: 2.9m tok/s: 8270989 +1828/20000 train_loss: 2.5251 train_time: 2.9m tok/s: 8270948 +1829/20000 train_loss: 2.4705 train_time: 2.9m tok/s: 8270945 +1830/20000 train_loss: 2.6461 train_time: 2.9m tok/s: 8270977 +1831/20000 train_loss: 2.6526 train_time: 2.9m tok/s: 8271017 +1832/20000 train_loss: 2.6464 train_time: 2.9m tok/s: 8271028 +1833/20000 train_loss: 2.7068 train_time: 2.9m tok/s: 8271042 +1834/20000 train_loss: 2.6725 train_time: 2.9m tok/s: 8271068 +1835/20000 train_loss: 2.5750 train_time: 2.9m tok/s: 8271029 +1836/20000 train_loss: 2.5808 train_time: 2.9m tok/s: 8271032 +1837/20000 train_loss: 2.4915 train_time: 2.9m tok/s: 8271046 +1838/20000 train_loss: 2.6119 train_time: 2.9m tok/s: 8271019 +1839/20000 train_loss: 2.7101 train_time: 2.9m tok/s: 8271022 +1840/20000 train_loss: 2.6514 train_time: 2.9m tok/s: 8271022 +1841/20000 train_loss: 2.6901 train_time: 2.9m tok/s: 8271004 +1842/20000 train_loss: 2.6271 train_time: 2.9m tok/s: 8271015 +1843/20000 train_loss: 2.5608 train_time: 2.9m tok/s: 8271002 +1844/20000 train_loss: 2.6537 train_time: 2.9m tok/s: 8271024 +1845/20000 train_loss: 2.8217 train_time: 2.9m tok/s: 8271024 +1846/20000 train_loss: 2.5953 train_time: 2.9m tok/s: 8270998 +1847/20000 train_loss: 2.4692 train_time: 2.9m tok/s: 8270967 +1848/20000 train_loss: 2.5343 train_time: 2.9m tok/s: 8270925 +1849/20000 train_loss: 2.5291 train_time: 2.9m tok/s: 8270943 +1850/20000 train_loss: 2.6535 train_time: 2.9m tok/s: 8270990 +1851/20000 train_loss: 2.6578 train_time: 2.9m tok/s: 8270997 +1852/20000 train_loss: 2.6769 train_time: 2.9m tok/s: 8271049 +1853/20000 train_loss: 2.5066 train_time: 2.9m tok/s: 8271054 +1854/20000 train_loss: 2.5567 train_time: 2.9m tok/s: 8271074 +1855/20000 train_loss: 2.6403 train_time: 2.9m tok/s: 8271075 +1856/20000 train_loss: 2.6742 train_time: 2.9m tok/s: 8271066 +1857/20000 train_loss: 2.7871 train_time: 2.9m tok/s: 8271083 +1858/20000 train_loss: 2.7310 train_time: 2.9m tok/s: 8271141 +1859/20000 train_loss: 2.6422 train_time: 2.9m tok/s: 8271156 +1860/20000 train_loss: 2.5803 train_time: 2.9m tok/s: 8271180 +1861/20000 train_loss: 2.5706 train_time: 2.9m tok/s: 8271161 +1862/20000 train_loss: 2.5433 train_time: 3.0m tok/s: 8271183 +1863/20000 train_loss: 2.6249 train_time: 3.0m tok/s: 8271169 +1864/20000 train_loss: 2.5854 train_time: 3.0m tok/s: 8271192 +1865/20000 train_loss: 2.7337 train_time: 3.0m tok/s: 8271226 +1866/20000 train_loss: 2.6328 train_time: 3.0m tok/s: 8271150 +1867/20000 train_loss: 2.5536 train_time: 3.0m tok/s: 8271094 +1868/20000 train_loss: 2.5445 train_time: 3.0m tok/s: 8271105 +1869/20000 train_loss: 2.6863 train_time: 3.0m tok/s: 8271104 +1870/20000 train_loss: 2.6366 train_time: 3.0m tok/s: 8271106 +1871/20000 train_loss: 2.5525 train_time: 3.0m tok/s: 8271165 +1872/20000 train_loss: 2.5836 train_time: 3.0m tok/s: 8271207 +1873/20000 train_loss: 2.7360 train_time: 3.0m tok/s: 8271216 +1874/20000 train_loss: 2.6618 train_time: 3.0m tok/s: 8271196 +1875/20000 train_loss: 2.7456 train_time: 3.0m tok/s: 8271231 +1876/20000 train_loss: 2.7849 train_time: 3.0m tok/s: 8271319 +1877/20000 train_loss: 2.9601 train_time: 3.0m tok/s: 8271265 +1878/20000 train_loss: 2.6582 train_time: 3.0m tok/s: 8271176 +1879/20000 train_loss: 2.6171 train_time: 3.0m tok/s: 8271164 +1880/20000 train_loss: 2.7521 train_time: 3.0m tok/s: 8271133 +1881/20000 train_loss: 2.5914 train_time: 3.0m tok/s: 8271131 +1882/20000 train_loss: 2.7292 train_time: 3.0m tok/s: 8271188 +1883/20000 train_loss: 2.5944 train_time: 3.0m tok/s: 8271158 +1884/20000 train_loss: 2.5656 train_time: 3.0m tok/s: 8271085 +1885/20000 train_loss: 2.6388 train_time: 3.0m tok/s: 8271065 +1886/20000 train_loss: 2.5673 train_time: 3.0m tok/s: 8271121 +1887/20000 train_loss: 2.6322 train_time: 3.0m tok/s: 8271143 +1888/20000 train_loss: 2.4895 train_time: 3.0m tok/s: 8271150 +1889/20000 train_loss: 2.5449 train_time: 3.0m tok/s: 8271189 +1890/20000 train_loss: 2.6766 train_time: 3.0m tok/s: 8271208 +1891/20000 train_loss: 2.5092 train_time: 3.0m tok/s: 8271187 +1892/20000 train_loss: 2.6813 train_time: 3.0m tok/s: 8271197 +1893/20000 train_loss: 2.6879 train_time: 3.0m tok/s: 8271179 +1894/20000 train_loss: 2.5924 train_time: 3.0m tok/s: 8271194 +1895/20000 train_loss: 2.6473 train_time: 3.0m tok/s: 8271191 +1896/20000 train_loss: 2.6256 train_time: 3.0m tok/s: 8271155 +1897/20000 train_loss: 2.5625 train_time: 3.0m tok/s: 8271208 +1898/20000 train_loss: 2.7060 train_time: 3.0m tok/s: 8271253 +1899/20000 train_loss: 2.6308 train_time: 3.0m tok/s: 8271277 +1900/20000 train_loss: 2.6151 train_time: 3.0m tok/s: 8271291 +1901/20000 train_loss: 2.6924 train_time: 3.0m tok/s: 8271331 +1902/20000 train_loss: 2.6062 train_time: 3.0m tok/s: 8271307 +1903/20000 train_loss: 2.7617 train_time: 3.0m tok/s: 8271290 +1904/20000 train_loss: 3.1517 train_time: 3.0m tok/s: 8271259 +1905/20000 train_loss: 2.4867 train_time: 3.0m tok/s: 8271237 +1906/20000 train_loss: 2.6350 train_time: 3.0m tok/s: 8271251 +1907/20000 train_loss: 2.5157 train_time: 3.0m tok/s: 8271226 +1908/20000 train_loss: 2.5386 train_time: 3.0m tok/s: 8271270 +1909/20000 train_loss: 2.5836 train_time: 3.0m tok/s: 8271303 +1910/20000 train_loss: 2.5350 train_time: 3.0m tok/s: 8271313 +1911/20000 train_loss: 2.4879 train_time: 3.0m tok/s: 8271321 +1912/20000 train_loss: 2.7074 train_time: 3.0m tok/s: 8271354 +1913/20000 train_loss: 2.7175 train_time: 3.0m tok/s: 8271380 +1914/20000 train_loss: 2.6977 train_time: 3.0m tok/s: 8271388 +1915/20000 train_loss: 2.7100 train_time: 3.0m tok/s: 8271431 +1916/20000 train_loss: 2.5744 train_time: 3.0m tok/s: 8271484 +1917/20000 train_loss: 2.7160 train_time: 3.0m tok/s: 8271493 +1918/20000 train_loss: 2.5808 train_time: 3.0m tok/s: 8271490 +1919/20000 train_loss: 2.5621 train_time: 3.0m tok/s: 8271505 +1920/20000 train_loss: 2.4949 train_time: 3.0m tok/s: 8271477 +1921/20000 train_loss: 2.7052 train_time: 3.0m tok/s: 8271491 +1922/20000 train_loss: 2.6010 train_time: 3.0m tok/s: 8271483 +1923/20000 train_loss: 2.5226 train_time: 3.0m tok/s: 8271501 +1924/20000 train_loss: 2.5925 train_time: 3.0m tok/s: 8271545 +1925/20000 train_loss: 2.5330 train_time: 3.1m tok/s: 8271566 +1926/20000 train_loss: 2.7453 train_time: 3.1m tok/s: 8271517 +1927/20000 train_loss: 2.5751 train_time: 3.1m tok/s: 8271514 +1928/20000 train_loss: 2.6359 train_time: 3.1m tok/s: 8271555 +1929/20000 train_loss: 2.6409 train_time: 3.1m tok/s: 8271564 +1930/20000 train_loss: 2.7077 train_time: 3.1m tok/s: 8271464 +1931/20000 train_loss: 2.6484 train_time: 3.1m tok/s: 8271463 +1932/20000 train_loss: 2.7546 train_time: 3.1m tok/s: 8271468 +1933/20000 train_loss: 2.6577 train_time: 3.1m tok/s: 8271477 +1934/20000 train_loss: 2.6666 train_time: 3.1m tok/s: 8271482 +1935/20000 train_loss: 2.5628 train_time: 3.1m tok/s: 8271534 +1936/20000 train_loss: 2.6811 train_time: 3.1m tok/s: 8271516 +1937/20000 train_loss: 2.6727 train_time: 3.1m tok/s: 8271544 +1938/20000 train_loss: 2.7047 train_time: 3.1m tok/s: 8271583 +1939/20000 train_loss: 2.6230 train_time: 3.1m tok/s: 8271608 +1940/20000 train_loss: 2.8146 train_time: 3.1m tok/s: 8271638 +1941/20000 train_loss: 2.4680 train_time: 3.1m tok/s: 8271643 +1942/20000 train_loss: 2.4992 train_time: 3.1m tok/s: 8271689 +1943/20000 train_loss: 2.5010 train_time: 3.1m tok/s: 8271735 +1944/20000 train_loss: 2.5262 train_time: 3.1m tok/s: 8271742 +1945/20000 train_loss: 2.5751 train_time: 3.1m tok/s: 8271740 +1946/20000 train_loss: 2.6255 train_time: 3.1m tok/s: 8271757 +1947/20000 train_loss: 2.6906 train_time: 3.1m tok/s: 8271763 +1948/20000 train_loss: 2.6829 train_time: 3.1m tok/s: 8271793 +1949/20000 train_loss: 2.7294 train_time: 3.1m tok/s: 8271793 +1950/20000 train_loss: 2.5896 train_time: 3.1m tok/s: 8271847 +1951/20000 train_loss: 2.8066 train_time: 3.1m tok/s: 8271849 +1952/20000 train_loss: 2.8226 train_time: 3.1m tok/s: 8271870 +1953/20000 train_loss: 2.6381 train_time: 3.1m tok/s: 8271875 +1954/20000 train_loss: 2.5857 train_time: 3.1m tok/s: 8271925 +1955/20000 train_loss: 2.8569 train_time: 3.1m tok/s: 8271943 +1956/20000 train_loss: 2.5820 train_time: 3.1m tok/s: 8271934 +1957/20000 train_loss: 2.6036 train_time: 3.1m tok/s: 8271920 +1958/20000 train_loss: 2.5641 train_time: 3.1m tok/s: 8271966 +1959/20000 train_loss: 2.5489 train_time: 3.1m tok/s: 8271995 +1960/20000 train_loss: 2.5020 train_time: 3.1m tok/s: 8271992 +1961/20000 train_loss: 2.5147 train_time: 3.1m tok/s: 8271981 +1962/20000 train_loss: 2.6028 train_time: 3.1m tok/s: 8272009 +1963/20000 train_loss: 2.5659 train_time: 3.1m tok/s: 8272055 +1964/20000 train_loss: 2.5896 train_time: 3.1m tok/s: 8272058 +1965/20000 train_loss: 2.5896 train_time: 3.1m tok/s: 8272080 +1966/20000 train_loss: 2.7535 train_time: 3.1m tok/s: 8272026 +1967/20000 train_loss: 2.5613 train_time: 3.1m tok/s: 8272017 +1968/20000 train_loss: 2.7064 train_time: 3.1m tok/s: 8272012 +1969/20000 train_loss: 2.7652 train_time: 3.1m tok/s: 8272080 +1970/20000 train_loss: 2.5939 train_time: 3.1m tok/s: 8272104 +1971/20000 train_loss: 2.6185 train_time: 3.1m tok/s: 8272099 +1972/20000 train_loss: 2.6820 train_time: 3.1m tok/s: 8272129 +1973/20000 train_loss: 2.6055 train_time: 3.1m tok/s: 8272115 +1974/20000 train_loss: 2.7700 train_time: 3.1m tok/s: 8272089 +1975/20000 train_loss: 2.5266 train_time: 3.1m tok/s: 8272099 +1976/20000 train_loss: 2.7206 train_time: 3.1m tok/s: 8272059 +1977/20000 train_loss: 2.5275 train_time: 3.1m tok/s: 8272070 +1978/20000 train_loss: 2.6936 train_time: 3.1m tok/s: 8272068 +1979/20000 train_loss: 2.5299 train_time: 3.1m tok/s: 8272111 +1980/20000 train_loss: 2.5773 train_time: 3.1m tok/s: 8272114 +1981/20000 train_loss: 2.4800 train_time: 3.1m tok/s: 8272149 +1982/20000 train_loss: 2.6444 train_time: 3.1m tok/s: 8272159 +1983/20000 train_loss: 2.3879 train_time: 3.1m tok/s: 8272156 +1984/20000 train_loss: 2.6912 train_time: 3.1m tok/s: 8272074 +1985/20000 train_loss: 2.6292 train_time: 3.1m tok/s: 8272092 +1986/20000 train_loss: 2.6786 train_time: 3.1m tok/s: 8272104 +1987/20000 train_loss: 2.6917 train_time: 3.1m tok/s: 8272151 +1988/20000 train_loss: 2.6549 train_time: 3.1m tok/s: 8272169 +1989/20000 train_loss: 2.4874 train_time: 3.2m tok/s: 8272189 +1990/20000 train_loss: 2.6511 train_time: 3.2m tok/s: 8272188 +1991/20000 train_loss: 2.5728 train_time: 3.2m tok/s: 8272254 +1992/20000 train_loss: 2.7519 train_time: 3.2m tok/s: 8272282 +1993/20000 train_loss: 2.5732 train_time: 3.2m tok/s: 8272275 +1994/20000 train_loss: 2.6189 train_time: 3.2m tok/s: 8272291 +1995/20000 train_loss: 2.5238 train_time: 3.2m tok/s: 8272321 +1996/20000 train_loss: 2.6145 train_time: 3.2m tok/s: 8272307 +1997/20000 train_loss: 2.5948 train_time: 3.2m tok/s: 8272358 +1998/20000 train_loss: 2.6115 train_time: 3.2m tok/s: 8272375 +1999/20000 train_loss: 2.6770 train_time: 3.2m tok/s: 8272397 +2000/20000 train_loss: 2.4911 train_time: 3.2m tok/s: 8272464 +2001/20000 train_loss: 2.5815 train_time: 3.2m tok/s: 8272485 +2002/20000 train_loss: 2.4517 train_time: 3.2m tok/s: 8272512 +2003/20000 train_loss: 2.6512 train_time: 3.2m tok/s: 8272529 +2004/20000 train_loss: 2.6342 train_time: 3.2m tok/s: 8272566 +2005/20000 train_loss: 2.6178 train_time: 3.2m tok/s: 8272568 +2006/20000 train_loss: 2.5822 train_time: 3.2m tok/s: 8272583 +2007/20000 train_loss: 2.6268 train_time: 3.2m tok/s: 8272565 +2008/20000 train_loss: 2.5448 train_time: 3.2m tok/s: 8272540 +2009/20000 train_loss: 2.6532 train_time: 3.2m tok/s: 8272604 +2010/20000 train_loss: 2.7002 train_time: 3.2m tok/s: 8272566 +2011/20000 train_loss: 2.5750 train_time: 3.2m tok/s: 8272575 +2012/20000 train_loss: 2.5818 train_time: 3.2m tok/s: 8272582 +2013/20000 train_loss: 2.4832 train_time: 3.2m tok/s: 8272614 +2014/20000 train_loss: 2.4825 train_time: 3.2m tok/s: 8272623 +2015/20000 train_loss: 2.6955 train_time: 3.2m tok/s: 8272630 +2016/20000 train_loss: 2.4825 train_time: 3.2m tok/s: 8272609 +2017/20000 train_loss: 2.6341 train_time: 3.2m tok/s: 8272635 +2018/20000 train_loss: 2.6298 train_time: 3.2m tok/s: 8272625 +2019/20000 train_loss: 2.7188 train_time: 3.2m tok/s: 8272690 +2020/20000 train_loss: 2.7104 train_time: 3.2m tok/s: 8272724 +2021/20000 train_loss: 2.5646 train_time: 3.2m tok/s: 8272707 +2022/20000 train_loss: 2.4959 train_time: 3.2m tok/s: 8272653 +2023/20000 train_loss: 2.6934 train_time: 3.2m tok/s: 8272658 +2024/20000 train_loss: 2.6329 train_time: 3.2m tok/s: 8272683 +2025/20000 train_loss: 2.4827 train_time: 3.2m tok/s: 8272702 +2026/20000 train_loss: 2.7052 train_time: 3.2m tok/s: 8272627 +2027/20000 train_loss: 2.5953 train_time: 3.2m tok/s: 8272614 +2028/20000 train_loss: 2.7180 train_time: 3.2m tok/s: 8272582 +2029/20000 train_loss: 2.4999 train_time: 3.2m tok/s: 8272600 +2030/20000 train_loss: 2.5149 train_time: 3.2m tok/s: 8272608 +2031/20000 train_loss: 2.5094 train_time: 3.2m tok/s: 8272655 +2032/20000 train_loss: 2.5641 train_time: 3.2m tok/s: 8272628 +2033/20000 train_loss: 2.8379 train_time: 3.2m tok/s: 8272599 +2034/20000 train_loss: 2.6831 train_time: 3.2m tok/s: 8272591 +2035/20000 train_loss: 2.6544 train_time: 3.2m tok/s: 8272639 +2036/20000 train_loss: 2.6187 train_time: 3.2m tok/s: 8272664 +2037/20000 train_loss: 2.8316 train_time: 3.2m tok/s: 8272612 +2038/20000 train_loss: 2.5888 train_time: 3.2m tok/s: 8272557 +2039/20000 train_loss: 2.6332 train_time: 3.2m tok/s: 8272567 +2040/20000 train_loss: 2.5789 train_time: 3.2m tok/s: 8272614 +2041/20000 train_loss: 2.6434 train_time: 3.2m tok/s: 8272642 +2042/20000 train_loss: 2.5407 train_time: 3.2m tok/s: 8272617 +2043/20000 train_loss: 2.4928 train_time: 3.2m tok/s: 8272548 +2044/20000 train_loss: 2.6486 train_time: 3.2m tok/s: 8272569 +2045/20000 train_loss: 2.4388 train_time: 3.2m tok/s: 8272594 +2046/20000 train_loss: 2.4636 train_time: 3.2m tok/s: 8272590 +2047/20000 train_loss: 2.7451 train_time: 3.2m tok/s: 8272580 +2048/20000 train_loss: 2.5728 train_time: 3.2m tok/s: 8272636 +2049/20000 train_loss: 2.7147 train_time: 3.2m tok/s: 8272663 +2050/20000 train_loss: 2.6705 train_time: 3.2m tok/s: 8272678 +2051/20000 train_loss: 2.6481 train_time: 3.2m tok/s: 8272679 +2052/20000 train_loss: 2.5372 train_time: 3.3m tok/s: 8272736 +2053/20000 train_loss: 2.6395 train_time: 3.3m tok/s: 8272765 +2054/20000 train_loss: 2.6753 train_time: 3.3m tok/s: 8272775 +2055/20000 train_loss: 2.5803 train_time: 3.3m tok/s: 8272788 +2056/20000 train_loss: 2.6194 train_time: 3.3m tok/s: 8272793 +2057/20000 train_loss: 2.6599 train_time: 3.3m tok/s: 8272785 +2058/20000 train_loss: 2.5610 train_time: 3.3m tok/s: 8272772 +2059/20000 train_loss: 2.4965 train_time: 3.3m tok/s: 8272801 +2060/20000 train_loss: 2.5892 train_time: 3.3m tok/s: 8272810 +2061/20000 train_loss: 2.5841 train_time: 3.3m tok/s: 8272842 +2062/20000 train_loss: 2.6159 train_time: 3.3m tok/s: 8272814 +2063/20000 train_loss: 2.5509 train_time: 3.3m tok/s: 8272794 +2064/20000 train_loss: 2.8009 train_time: 3.3m tok/s: 8272796 +2065/20000 train_loss: 2.5373 train_time: 3.3m tok/s: 8272771 +2066/20000 train_loss: 2.6139 train_time: 3.3m tok/s: 8272728 +2067/20000 train_loss: 2.6652 train_time: 3.3m tok/s: 8272742 +2068/20000 train_loss: 2.6036 train_time: 3.3m tok/s: 8272800 +2069/20000 train_loss: 2.4578 train_time: 3.3m tok/s: 8272798 +2070/20000 train_loss: 2.6105 train_time: 3.3m tok/s: 8272753 +2071/20000 train_loss: 2.5505 train_time: 3.3m tok/s: 8272773 +2072/20000 train_loss: 2.6002 train_time: 3.3m tok/s: 8272770 +2073/20000 train_loss: 2.5333 train_time: 3.3m tok/s: 8272766 +2074/20000 train_loss: 2.6952 train_time: 3.3m tok/s: 8272751 +2075/20000 train_loss: 2.5726 train_time: 3.3m tok/s: 8272750 +2076/20000 train_loss: 2.6715 train_time: 3.3m tok/s: 8272756 +2077/20000 train_loss: 3.5629 train_time: 3.3m tok/s: 8272716 +2078/20000 train_loss: 2.7108 train_time: 3.3m tok/s: 8272647 +2079/20000 train_loss: 2.6583 train_time: 3.3m tok/s: 8272641 +2080/20000 train_loss: 2.5992 train_time: 3.3m tok/s: 8272634 +2081/20000 train_loss: 2.6081 train_time: 3.3m tok/s: 8272619 +2082/20000 train_loss: 2.5985 train_time: 3.3m tok/s: 8272646 +2083/20000 train_loss: 2.5431 train_time: 3.3m tok/s: 8272607 +2084/20000 train_loss: 2.5826 train_time: 3.3m tok/s: 8272620 +2085/20000 train_loss: 2.5817 train_time: 3.3m tok/s: 8272638 +2086/20000 train_loss: 2.6096 train_time: 3.3m tok/s: 8272663 +2087/20000 train_loss: 2.5503 train_time: 3.3m tok/s: 8272664 +2088/20000 train_loss: 2.4597 train_time: 3.3m tok/s: 8272698 +2089/20000 train_loss: 2.6314 train_time: 3.3m tok/s: 8272711 +2090/20000 train_loss: 2.7403 train_time: 3.3m tok/s: 8272763 +2091/20000 train_loss: 2.6251 train_time: 3.3m tok/s: 8272757 +2092/20000 train_loss: 2.6555 train_time: 3.3m tok/s: 8272798 +2093/20000 train_loss: 2.6435 train_time: 3.3m tok/s: 8272840 +2094/20000 train_loss: 2.6041 train_time: 3.3m tok/s: 8272858 +2095/20000 train_loss: 2.5886 train_time: 3.3m tok/s: 8272891 +2096/20000 train_loss: 2.6932 train_time: 3.3m tok/s: 8272941 +2097/20000 train_loss: 2.5714 train_time: 3.3m tok/s: 8272958 +2098/20000 train_loss: 2.4871 train_time: 3.3m tok/s: 8272968 +2099/20000 train_loss: 2.4876 train_time: 3.3m tok/s: 8272991 +2100/20000 train_loss: 2.5833 train_time: 3.3m tok/s: 8273029 +2101/20000 train_loss: 2.6551 train_time: 3.3m tok/s: 8272982 +2102/20000 train_loss: 2.5689 train_time: 3.3m tok/s: 8272943 +2103/20000 train_loss: 2.5914 train_time: 3.3m tok/s: 8272942 +2104/20000 train_loss: 2.7089 train_time: 3.3m tok/s: 8272984 +2105/20000 train_loss: 2.7198 train_time: 3.3m tok/s: 8273043 +2106/20000 train_loss: 2.7067 train_time: 3.3m tok/s: 8273059 +2107/20000 train_loss: 2.6068 train_time: 3.3m tok/s: 8273039 +2108/20000 train_loss: 2.5428 train_time: 3.3m tok/s: 8273038 +2109/20000 train_loss: 2.7170 train_time: 3.3m tok/s: 8273026 +2110/20000 train_loss: 2.5784 train_time: 3.3m tok/s: 8273047 +2111/20000 train_loss: 2.5549 train_time: 3.3m tok/s: 8273031 +2112/20000 train_loss: 2.5514 train_time: 3.3m tok/s: 8273038 +2113/20000 train_loss: 2.5208 train_time: 3.3m tok/s: 8273043 +2114/20000 train_loss: 2.7872 train_time: 3.3m tok/s: 8273065 +2115/20000 train_loss: 2.4914 train_time: 3.4m tok/s: 8273090 +2116/20000 train_loss: 2.6523 train_time: 3.4m tok/s: 8273143 +2117/20000 train_loss: 2.6930 train_time: 3.4m tok/s: 8273134 +2118/20000 train_loss: 2.6965 train_time: 3.4m tok/s: 8273166 +2119/20000 train_loss: 2.8019 train_time: 3.4m tok/s: 8273177 +2120/20000 train_loss: 2.6544 train_time: 3.4m tok/s: 8273199 +2121/20000 train_loss: 2.6327 train_time: 3.4m tok/s: 8273213 +2122/20000 train_loss: 2.5464 train_time: 3.4m tok/s: 8273221 +2123/20000 train_loss: 2.4000 train_time: 3.4m tok/s: 8273243 +2124/20000 train_loss: 2.6359 train_time: 3.4m tok/s: 8273284 +2125/20000 train_loss: 2.6170 train_time: 3.4m tok/s: 8273324 +2126/20000 train_loss: 2.6351 train_time: 3.4m tok/s: 8273362 +2127/20000 train_loss: 2.5556 train_time: 3.4m tok/s: 8273188 +2128/20000 train_loss: 2.5380 train_time: 3.4m tok/s: 8273330 +2129/20000 train_loss: 2.3717 train_time: 3.4m tok/s: 8273339 +2130/20000 train_loss: 2.6778 train_time: 3.4m tok/s: 8273349 +2131/20000 train_loss: 2.5853 train_time: 3.4m tok/s: 8273372 +2132/20000 train_loss: 2.9141 train_time: 3.4m tok/s: 8273356 +2133/20000 train_loss: 2.6940 train_time: 3.4m tok/s: 8273372 +2134/20000 train_loss: 2.6332 train_time: 3.4m tok/s: 8273383 +2135/20000 train_loss: 2.5896 train_time: 3.4m tok/s: 8273406 +2136/20000 train_loss: 2.6004 train_time: 3.4m tok/s: 8273412 +2137/20000 train_loss: 2.5365 train_time: 3.4m tok/s: 8273492 +2138/20000 train_loss: 2.5913 train_time: 3.4m tok/s: 8273529 +2139/20000 train_loss: 2.6815 train_time: 3.4m tok/s: 8273531 +2140/20000 train_loss: 2.5493 train_time: 3.4m tok/s: 8273500 +2141/20000 train_loss: 2.5275 train_time: 3.4m tok/s: 8273491 +2142/20000 train_loss: 2.6009 train_time: 3.4m tok/s: 8273523 +2143/20000 train_loss: 2.4849 train_time: 3.4m tok/s: 8273494 +2144/20000 train_loss: 2.5863 train_time: 3.4m tok/s: 8273460 +2145/20000 train_loss: 2.6706 train_time: 3.4m tok/s: 8273508 +2146/20000 train_loss: 2.5914 train_time: 3.4m tok/s: 8273537 +2147/20000 train_loss: 2.5553 train_time: 3.4m tok/s: 8273541 +2148/20000 train_loss: 2.7040 train_time: 3.4m tok/s: 8273526 +2149/20000 train_loss: 2.5390 train_time: 3.4m tok/s: 8273535 +2150/20000 train_loss: 2.7382 train_time: 3.4m tok/s: 8273588 +2151/20000 train_loss: 2.6845 train_time: 3.4m tok/s: 8273530 +2152/20000 train_loss: 2.6027 train_time: 3.4m tok/s: 8273504 +2153/20000 train_loss: 2.4668 train_time: 3.4m tok/s: 8273486 +2154/20000 train_loss: 2.6288 train_time: 3.4m tok/s: 8273488 +2155/20000 train_loss: 2.5239 train_time: 3.4m tok/s: 8273484 +2156/20000 train_loss: 2.6109 train_time: 3.4m tok/s: 8273526 +2157/20000 train_loss: 2.5964 train_time: 3.4m tok/s: 8273568 +2158/20000 train_loss: 2.5918 train_time: 3.4m tok/s: 8273584 +2159/20000 train_loss: 2.5398 train_time: 3.4m tok/s: 8273593 +2160/20000 train_loss: 2.4297 train_time: 3.4m tok/s: 8273578 +2161/20000 train_loss: 2.6176 train_time: 3.4m tok/s: 8273586 +2162/20000 train_loss: 2.6401 train_time: 3.4m tok/s: 8273631 +2163/20000 train_loss: 2.5570 train_time: 3.4m tok/s: 8273593 +2164/20000 train_loss: 2.6020 train_time: 3.4m tok/s: 8273612 +2165/20000 train_loss: 2.6441 train_time: 3.4m tok/s: 8273619 +2166/20000 train_loss: 2.5238 train_time: 3.4m tok/s: 8273620 +2167/20000 train_loss: 2.5282 train_time: 3.4m tok/s: 8273574 +2168/20000 train_loss: 2.5847 train_time: 3.4m tok/s: 8273539 +2169/20000 train_loss: 2.6297 train_time: 3.4m tok/s: 8273533 +2170/20000 train_loss: 2.4945 train_time: 3.4m tok/s: 8273527 +2171/20000 train_loss: 2.6030 train_time: 3.4m tok/s: 8273550 +2172/20000 train_loss: 2.5089 train_time: 3.4m tok/s: 8273550 +2173/20000 train_loss: 2.7158 train_time: 3.4m tok/s: 8273579 +2174/20000 train_loss: 2.5515 train_time: 3.4m tok/s: 8273539 +2175/20000 train_loss: 2.4526 train_time: 3.4m tok/s: 8273481 +2176/20000 train_loss: 2.6709 train_time: 3.4m tok/s: 8273472 +2177/20000 train_loss: 2.6142 train_time: 3.4m tok/s: 8273485 +2178/20000 train_loss: 2.5317 train_time: 3.5m tok/s: 8273497 +2179/20000 train_loss: 2.7064 train_time: 3.5m tok/s: 8273491 +2180/20000 train_loss: 2.5593 train_time: 3.5m tok/s: 8273491 +2181/20000 train_loss: 2.5577 train_time: 3.5m tok/s: 8273497 +2182/20000 train_loss: 2.4381 train_time: 3.5m tok/s: 8273518 +2183/20000 train_loss: 2.5360 train_time: 3.5m tok/s: 8273527 +2184/20000 train_loss: 2.5393 train_time: 3.5m tok/s: 8273491 +2185/20000 train_loss: 2.4402 train_time: 3.5m tok/s: 8273522 +2186/20000 train_loss: 2.7450 train_time: 3.5m tok/s: 8273531 +2187/20000 train_loss: 2.5540 train_time: 3.5m tok/s: 8273552 +2188/20000 train_loss: 2.5483 train_time: 3.5m tok/s: 8273543 +2189/20000 train_loss: 2.6370 train_time: 3.5m tok/s: 8273551 +2190/20000 train_loss: 2.6794 train_time: 3.5m tok/s: 8273545 +2191/20000 train_loss: 2.6073 train_time: 3.5m tok/s: 8273584 +2192/20000 train_loss: 2.6466 train_time: 3.5m tok/s: 8273601 +2193/20000 train_loss: 2.5781 train_time: 3.5m tok/s: 8273644 +2194/20000 train_loss: 2.5887 train_time: 3.5m tok/s: 8273613 +2195/20000 train_loss: 2.6105 train_time: 3.5m tok/s: 8273610 +2196/20000 train_loss: 2.6043 train_time: 3.5m tok/s: 8273597 +2197/20000 train_loss: 2.5840 train_time: 3.5m tok/s: 8273591 +2198/20000 train_loss: 2.5315 train_time: 3.5m tok/s: 8273616 +2199/20000 train_loss: 2.5420 train_time: 3.5m tok/s: 8273630 +2200/20000 train_loss: 2.6167 train_time: 3.5m tok/s: 8273617 +2201/20000 train_loss: 2.6670 train_time: 3.5m tok/s: 8273640 +2202/20000 train_loss: 2.5853 train_time: 3.5m tok/s: 8273608 +2203/20000 train_loss: 2.5657 train_time: 3.5m tok/s: 8273577 +2204/20000 train_loss: 2.4266 train_time: 3.5m tok/s: 8273549 +2205/20000 train_loss: 2.6459 train_time: 3.5m tok/s: 8273509 +2206/20000 train_loss: 2.5740 train_time: 3.5m tok/s: 8273499 +2207/20000 train_loss: 2.5256 train_time: 3.5m tok/s: 8273528 +layer_loop:enabled step:2207 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2208/20000 train_loss: 2.9876 train_time: 3.5m tok/s: 8271529 +2209/20000 train_loss: 2.9498 train_time: 3.5m tok/s: 8269813 +2210/20000 train_loss: 2.8414 train_time: 3.5m tok/s: 8268040 +2211/20000 train_loss: 2.7446 train_time: 3.5m tok/s: 8266264 +2212/20000 train_loss: 2.5297 train_time: 3.5m tok/s: 8264541 +2213/20000 train_loss: 2.6333 train_time: 3.5m tok/s: 8262817 +2214/20000 train_loss: 2.5405 train_time: 3.5m tok/s: 8261054 +2215/20000 train_loss: 2.6327 train_time: 3.5m tok/s: 8259333 +2216/20000 train_loss: 2.5596 train_time: 3.5m tok/s: 8257644 +2217/20000 train_loss: 2.7179 train_time: 3.5m tok/s: 8255776 +2218/20000 train_loss: 2.7037 train_time: 3.5m tok/s: 8253993 +2219/20000 train_loss: 2.5258 train_time: 3.5m tok/s: 8252237 +2220/20000 train_loss: 2.5638 train_time: 3.5m tok/s: 8250450 +2221/20000 train_loss: 2.6885 train_time: 3.5m tok/s: 8248697 +2222/20000 train_loss: 2.5141 train_time: 3.5m tok/s: 8246912 +2223/20000 train_loss: 2.5964 train_time: 3.5m tok/s: 8245141 +2224/20000 train_loss: 2.4088 train_time: 3.5m tok/s: 8243293 +2225/20000 train_loss: 2.5705 train_time: 3.5m tok/s: 8241515 +2226/20000 train_loss: 2.5503 train_time: 3.5m tok/s: 8239790 +2227/20000 train_loss: 2.5256 train_time: 3.5m tok/s: 8238029 +2228/20000 train_loss: 2.5692 train_time: 3.5m tok/s: 8236306 +2229/20000 train_loss: 2.5287 train_time: 3.5m tok/s: 8234601 +2230/20000 train_loss: 2.5459 train_time: 3.6m tok/s: 8232933 +2231/20000 train_loss: 2.3510 train_time: 3.6m tok/s: 8231164 +2232/20000 train_loss: 2.5418 train_time: 3.6m tok/s: 8229336 +2233/20000 train_loss: 2.6770 train_time: 3.6m tok/s: 8227692 +2234/20000 train_loss: 2.7023 train_time: 3.6m tok/s: 8226020 +2235/20000 train_loss: 2.6532 train_time: 3.6m tok/s: 8224324 +2236/20000 train_loss: 2.6117 train_time: 3.6m tok/s: 8222630 +2237/20000 train_loss: 2.6441 train_time: 3.6m tok/s: 8220932 +2238/20000 train_loss: 2.6580 train_time: 3.6m tok/s: 8219252 +2239/20000 train_loss: 2.7595 train_time: 3.6m tok/s: 8217531 +2240/20000 train_loss: 2.7654 train_time: 3.6m tok/s: 8215853 +2241/20000 train_loss: 2.3962 train_time: 3.6m tok/s: 8214104 +2242/20000 train_loss: 2.5858 train_time: 3.6m tok/s: 8212443 +2243/20000 train_loss: 2.5704 train_time: 3.6m tok/s: 8210733 +2244/20000 train_loss: 2.5935 train_time: 3.6m tok/s: 8209075 +2245/20000 train_loss: 2.6336 train_time: 3.6m tok/s: 8207401 +2246/20000 train_loss: 2.5438 train_time: 3.6m tok/s: 8205706 +2247/20000 train_loss: 2.6849 train_time: 3.6m tok/s: 8204030 +2248/20000 train_loss: 2.6728 train_time: 3.6m tok/s: 8202339 +2249/20000 train_loss: 2.5469 train_time: 3.6m tok/s: 8200712 +2250/20000 train_loss: 2.5419 train_time: 3.6m tok/s: 8198973 +2251/20000 train_loss: 2.5844 train_time: 3.6m tok/s: 8197303 +2252/20000 train_loss: 2.7341 train_time: 3.6m tok/s: 8195637 +2253/20000 train_loss: 2.5742 train_time: 3.6m tok/s: 8194036 +2254/20000 train_loss: 2.5670 train_time: 3.6m tok/s: 8192275 +2255/20000 train_loss: 2.4334 train_time: 3.6m tok/s: 8190597 +2256/20000 train_loss: 2.4734 train_time: 3.6m tok/s: 8188956 +2257/20000 train_loss: 2.5158 train_time: 3.6m tok/s: 8187191 +2258/20000 train_loss: 2.5423 train_time: 3.6m tok/s: 8185487 +2259/20000 train_loss: 2.5633 train_time: 3.6m tok/s: 8183862 +2260/20000 train_loss: 2.5138 train_time: 3.6m tok/s: 8182114 +2261/20000 train_loss: 2.6092 train_time: 3.6m tok/s: 8180478 +2262/20000 train_loss: 2.7067 train_time: 3.6m tok/s: 8178811 +2263/20000 train_loss: 2.7005 train_time: 3.6m tok/s: 8177176 +2264/20000 train_loss: 2.4670 train_time: 3.6m tok/s: 8175462 +2265/20000 train_loss: 2.5608 train_time: 3.6m tok/s: 8173867 +2266/20000 train_loss: 2.5852 train_time: 3.6m tok/s: 8172200 +2267/20000 train_loss: 2.6027 train_time: 3.6m tok/s: 8170577 +2268/20000 train_loss: 2.6930 train_time: 3.6m tok/s: 8168978 +2269/20000 train_loss: 2.7688 train_time: 3.6m tok/s: 8167289 +2270/20000 train_loss: 2.5311 train_time: 3.6m tok/s: 8165633 +2271/20000 train_loss: 2.5833 train_time: 3.6m tok/s: 8163991 +2272/20000 train_loss: 2.5164 train_time: 3.6m tok/s: 8162390 +2273/20000 train_loss: 2.5755 train_time: 3.7m tok/s: 8160843 +2274/20000 train_loss: 3.2796 train_time: 3.7m tok/s: 8159095 +2275/20000 train_loss: 2.4186 train_time: 3.7m tok/s: 8157411 +2276/20000 train_loss: 2.5230 train_time: 3.7m tok/s: 8155739 +2277/20000 train_loss: 2.7828 train_time: 3.7m tok/s: 8154072 +2278/20000 train_loss: 2.7144 train_time: 3.7m tok/s: 8152484 +2279/20000 train_loss: 2.6444 train_time: 3.7m tok/s: 8150763 +2280/20000 train_loss: 2.7358 train_time: 3.7m tok/s: 8149080 +2281/20000 train_loss: 2.5476 train_time: 3.7m tok/s: 8147424 +2282/20000 train_loss: 2.7366 train_time: 3.7m tok/s: 8145819 +2283/20000 train_loss: 2.9772 train_time: 3.7m tok/s: 8144159 +2284/20000 train_loss: 2.4846 train_time: 3.7m tok/s: 8142529 +2285/20000 train_loss: 2.5308 train_time: 3.7m tok/s: 8140915 +2286/20000 train_loss: 2.4559 train_time: 3.7m tok/s: 8139255 +2287/20000 train_loss: 2.4594 train_time: 3.7m tok/s: 8137606 +2288/20000 train_loss: 2.6546 train_time: 3.7m tok/s: 8135969 +2289/20000 train_loss: 2.6641 train_time: 3.7m tok/s: 8134390 +2290/20000 train_loss: 2.6418 train_time: 3.7m tok/s: 8132796 +2291/20000 train_loss: 2.5431 train_time: 3.7m tok/s: 8131236 +2292/20000 train_loss: 2.4783 train_time: 3.7m tok/s: 8129624 +2293/20000 train_loss: 2.5578 train_time: 3.7m tok/s: 8128080 +2294/20000 train_loss: 2.4621 train_time: 3.7m tok/s: 8126555 +2295/20000 train_loss: 2.6630 train_time: 3.7m tok/s: 8124959 +2296/20000 train_loss: 2.5161 train_time: 3.7m tok/s: 8123382 +2297/20000 train_loss: 2.6307 train_time: 3.7m tok/s: 8121716 +2298/20000 train_loss: 2.4975 train_time: 3.7m tok/s: 8120140 +2299/20000 train_loss: 2.5947 train_time: 3.7m tok/s: 8118612 +2300/20000 train_loss: 2.6286 train_time: 3.7m tok/s: 8117005 +2301/20000 train_loss: 2.4008 train_time: 3.7m tok/s: 8115449 +2302/20000 train_loss: 2.5752 train_time: 3.7m tok/s: 8113876 +2303/20000 train_loss: 2.5003 train_time: 3.7m tok/s: 8112302 +2304/20000 train_loss: 2.5926 train_time: 3.7m tok/s: 8110703 +2305/20000 train_loss: 2.4483 train_time: 3.7m tok/s: 8109026 +2306/20000 train_loss: 2.6698 train_time: 3.7m tok/s: 8107510 +2307/20000 train_loss: 2.6480 train_time: 3.7m tok/s: 8105994 +2308/20000 train_loss: 2.5912 train_time: 3.7m tok/s: 8104500 +2309/20000 train_loss: 2.5427 train_time: 3.7m tok/s: 8102951 +2310/20000 train_loss: 2.6249 train_time: 3.7m tok/s: 8101426 +2311/20000 train_loss: 2.6231 train_time: 3.7m tok/s: 8099880 +2312/20000 train_loss: 2.6378 train_time: 3.7m tok/s: 8098298 +2313/20000 train_loss: 2.6206 train_time: 3.7m tok/s: 8096762 +2314/20000 train_loss: 2.4444 train_time: 3.7m tok/s: 8095258 +2315/20000 train_loss: 2.4101 train_time: 3.7m tok/s: 8093695 +2316/20000 train_loss: 2.3704 train_time: 3.8m tok/s: 8092147 +2317/20000 train_loss: 2.7539 train_time: 3.8m tok/s: 8090467 +2318/20000 train_loss: 2.6430 train_time: 3.8m tok/s: 8088955 +2319/20000 train_loss: 2.4524 train_time: 3.8m tok/s: 8087411 +2320/20000 train_loss: 2.6186 train_time: 3.8m tok/s: 8085868 +2321/20000 train_loss: 2.5984 train_time: 3.8m tok/s: 8084300 +2322/20000 train_loss: 2.4635 train_time: 3.8m tok/s: 8082780 +2323/20000 train_loss: 2.6334 train_time: 3.8m tok/s: 8081311 +2324/20000 train_loss: 2.5738 train_time: 3.8m tok/s: 8079753 +2325/20000 train_loss: 2.5943 train_time: 3.8m tok/s: 8078178 +2326/20000 train_loss: 2.6036 train_time: 3.8m tok/s: 8076682 +2327/20000 train_loss: 2.5809 train_time: 3.8m tok/s: 8075159 +2328/20000 train_loss: 2.5342 train_time: 3.8m tok/s: 8073695 +2329/20000 train_loss: 2.4197 train_time: 3.8m tok/s: 8072170 +2330/20000 train_loss: 2.6790 train_time: 3.8m tok/s: 8070659 +2331/20000 train_loss: 2.5834 train_time: 3.8m tok/s: 8069082 +2332/20000 train_loss: 2.3994 train_time: 3.8m tok/s: 8067552 +2333/20000 train_loss: 2.6284 train_time: 3.8m tok/s: 8066051 +2334/20000 train_loss: 2.2811 train_time: 3.8m tok/s: 8064457 +2335/20000 train_loss: 2.5493 train_time: 3.8m tok/s: 8062931 +2336/20000 train_loss: 2.6277 train_time: 3.8m tok/s: 8061444 +2337/20000 train_loss: 2.6639 train_time: 3.8m tok/s: 8059939 +2338/20000 train_loss: 2.5374 train_time: 3.8m tok/s: 8058453 +2339/20000 train_loss: 2.6210 train_time: 3.8m tok/s: 8056963 +2340/20000 train_loss: 2.5724 train_time: 3.8m tok/s: 8055513 +2341/20000 train_loss: 2.5549 train_time: 3.8m tok/s: 8054048 +2342/20000 train_loss: 2.5062 train_time: 3.8m tok/s: 8052549 +2343/20000 train_loss: 2.4378 train_time: 3.8m tok/s: 8051112 +2344/20000 train_loss: 2.7104 train_time: 3.8m tok/s: 8049585 +2345/20000 train_loss: 3.0394 train_time: 3.8m tok/s: 8048111 +2346/20000 train_loss: 2.5307 train_time: 3.8m tok/s: 8046600 +2347/20000 train_loss: 2.5292 train_time: 3.8m tok/s: 8045154 +2348/20000 train_loss: 2.7206 train_time: 3.8m tok/s: 8043673 +2349/20000 train_loss: 2.5936 train_time: 3.8m tok/s: 8042228 +2350/20000 train_loss: 2.5871 train_time: 3.8m tok/s: 8040786 +2351/20000 train_loss: 2.5681 train_time: 3.8m tok/s: 8039324 +2352/20000 train_loss: 2.6001 train_time: 3.8m tok/s: 8037850 +2353/20000 train_loss: 2.4984 train_time: 3.8m tok/s: 8036351 +2354/20000 train_loss: 2.5514 train_time: 3.8m tok/s: 8034897 +2355/20000 train_loss: 2.5127 train_time: 3.8m tok/s: 8033476 +2356/20000 train_loss: 2.5705 train_time: 3.8m tok/s: 8032026 +2357/20000 train_loss: 2.5195 train_time: 3.8m tok/s: 8030567 +2358/20000 train_loss: 2.5278 train_time: 3.8m tok/s: 8029149 +2359/20000 train_loss: 2.5082 train_time: 3.9m tok/s: 8027695 +2360/20000 train_loss: 2.5808 train_time: 3.9m tok/s: 8026227 +2361/20000 train_loss: 2.5955 train_time: 3.9m tok/s: 8024777 +2362/20000 train_loss: 2.5270 train_time: 3.9m tok/s: 8023278 +2363/20000 train_loss: 2.5443 train_time: 3.9m tok/s: 8021866 +2364/20000 train_loss: 2.6575 train_time: 3.9m tok/s: 8020410 +2365/20000 train_loss: 2.5583 train_time: 3.9m tok/s: 8018951 +2366/20000 train_loss: 2.6248 train_time: 3.9m tok/s: 8017529 +2367/20000 train_loss: 2.5359 train_time: 3.9m tok/s: 8016071 +2368/20000 train_loss: 2.6901 train_time: 3.9m tok/s: 8014657 +2369/20000 train_loss: 2.5198 train_time: 3.9m tok/s: 8013218 +2370/20000 train_loss: 2.6087 train_time: 3.9m tok/s: 8011827 +2371/20000 train_loss: 2.5922 train_time: 3.9m tok/s: 8010393 +2372/20000 train_loss: 2.6329 train_time: 3.9m tok/s: 8008978 +2373/20000 train_loss: 2.4926 train_time: 3.9m tok/s: 8007542 +2374/20000 train_loss: 2.5590 train_time: 3.9m tok/s: 8006102 +2375/20000 train_loss: 2.5532 train_time: 3.9m tok/s: 8004688 +2376/20000 train_loss: 2.5007 train_time: 3.9m tok/s: 8003240 +2377/20000 train_loss: 2.4178 train_time: 3.9m tok/s: 8001788 +2378/20000 train_loss: 2.5177 train_time: 3.9m tok/s: 8000355 +2379/20000 train_loss: 2.8983 train_time: 3.9m tok/s: 7998880 +2380/20000 train_loss: 2.5038 train_time: 3.9m tok/s: 7997498 +2381/20000 train_loss: 2.6739 train_time: 3.9m tok/s: 7996138 +2382/20000 train_loss: 2.4641 train_time: 3.9m tok/s: 7994700 +2383/20000 train_loss: 2.6520 train_time: 3.9m tok/s: 7993279 +2384/20000 train_loss: 2.6600 train_time: 3.9m tok/s: 7991884 +2385/20000 train_loss: 2.6787 train_time: 3.9m tok/s: 7990475 +2386/20000 train_loss: 2.6375 train_time: 3.9m tok/s: 7988986 +2387/20000 train_loss: 2.4951 train_time: 3.9m tok/s: 7987607 +2388/20000 train_loss: 2.9510 train_time: 3.9m tok/s: 7986053 +2389/20000 train_loss: 2.3710 train_time: 3.9m tok/s: 7984486 +2390/20000 train_loss: 2.5955 train_time: 3.9m tok/s: 7983081 +2391/20000 train_loss: 2.5195 train_time: 3.9m tok/s: 7981705 +2392/20000 train_loss: 2.6423 train_time: 3.9m tok/s: 7980286 +2393/20000 train_loss: 2.5440 train_time: 3.9m tok/s: 7978916 +2394/20000 train_loss: 2.6382 train_time: 3.9m tok/s: 7977491 +2395/20000 train_loss: 2.6382 train_time: 3.9m tok/s: 7976107 +2396/20000 train_loss: 2.6050 train_time: 3.9m tok/s: 7974763 +2397/20000 train_loss: 2.6897 train_time: 3.9m tok/s: 7973369 +2398/20000 train_loss: 2.5174 train_time: 3.9m tok/s: 7971989 +2399/20000 train_loss: 2.5248 train_time: 3.9m tok/s: 7970649 +2400/20000 train_loss: 2.5226 train_time: 3.9m tok/s: 7969253 +2401/20000 train_loss: 2.6087 train_time: 3.9m tok/s: 7967898 +2402/20000 train_loss: 2.5071 train_time: 4.0m tok/s: 7966499 +2403/20000 train_loss: 2.8818 train_time: 4.0m tok/s: 7965046 +2404/20000 train_loss: 2.5546 train_time: 4.0m tok/s: 7963700 +2405/20000 train_loss: 2.4832 train_time: 4.0m tok/s: 7962310 +2406/20000 train_loss: 2.5776 train_time: 4.0m tok/s: 7960978 +2407/20000 train_loss: 2.5811 train_time: 4.0m tok/s: 7959632 +2408/20000 train_loss: 2.6607 train_time: 4.0m tok/s: 7958325 +2409/20000 train_loss: 2.5869 train_time: 4.0m tok/s: 7956949 +2410/20000 train_loss: 2.5789 train_time: 4.0m tok/s: 7955595 +2411/20000 train_loss: 2.5391 train_time: 4.0m tok/s: 7954257 +2412/20000 train_loss: 2.6504 train_time: 4.0m tok/s: 7952826 +2413/20000 train_loss: 2.4989 train_time: 4.0m tok/s: 7951433 +2414/20000 train_loss: 2.5658 train_time: 4.0m tok/s: 7950065 +2415/20000 train_loss: 2.5505 train_time: 4.0m tok/s: 7948713 +2416/20000 train_loss: 2.5772 train_time: 4.0m tok/s: 7947332 +2417/20000 train_loss: 2.5491 train_time: 4.0m tok/s: 7945921 +2418/20000 train_loss: 2.5110 train_time: 4.0m tok/s: 7944594 +2419/20000 train_loss: 2.5588 train_time: 4.0m tok/s: 7943224 +2420/20000 train_loss: 2.5986 train_time: 4.0m tok/s: 7941927 +2421/20000 train_loss: 2.6272 train_time: 4.0m tok/s: 7940575 +2422/20000 train_loss: 2.6066 train_time: 4.0m tok/s: 7939203 +2423/20000 train_loss: 2.5051 train_time: 4.0m tok/s: 7937865 +2424/20000 train_loss: 2.6249 train_time: 4.0m tok/s: 7936515 +2425/20000 train_loss: 2.6074 train_time: 4.0m tok/s: 7935130 +2426/20000 train_loss: 2.5448 train_time: 4.0m tok/s: 7933774 +2427/20000 train_loss: 2.4184 train_time: 4.0m tok/s: 7932429 +2428/20000 train_loss: 2.5207 train_time: 4.0m tok/s: 7931022 +2429/20000 train_loss: 2.4963 train_time: 4.0m tok/s: 7929716 +2430/20000 train_loss: 2.4704 train_time: 4.0m tok/s: 7928373 +2431/20000 train_loss: 2.5885 train_time: 4.0m tok/s: 7926947 +2432/20000 train_loss: 2.5420 train_time: 4.0m tok/s: 7925613 +2433/20000 train_loss: 2.6759 train_time: 4.0m tok/s: 7924300 +2434/20000 train_loss: 2.4891 train_time: 4.0m tok/s: 7922981 +2435/20000 train_loss: 2.6772 train_time: 4.0m tok/s: 7921626 +2436/20000 train_loss: 2.5365 train_time: 4.0m tok/s: 7920311 +2437/20000 train_loss: 2.5837 train_time: 4.0m tok/s: 7918979 +2438/20000 train_loss: 2.5316 train_time: 4.0m tok/s: 7917637 +2439/20000 train_loss: 2.5330 train_time: 4.0m tok/s: 7916282 +2440/20000 train_loss: 2.5069 train_time: 4.0m tok/s: 7914997 +2441/20000 train_loss: 2.5391 train_time: 4.0m tok/s: 7913694 +2442/20000 train_loss: 2.5164 train_time: 4.0m tok/s: 7912368 +2443/20000 train_loss: 2.6455 train_time: 4.0m tok/s: 7911063 +2444/20000 train_loss: 2.5882 train_time: 4.0m tok/s: 7909721 +2445/20000 train_loss: 2.5052 train_time: 4.1m tok/s: 7908439 +2446/20000 train_loss: 2.7168 train_time: 4.1m tok/s: 7907113 +2447/20000 train_loss: 2.7269 train_time: 4.1m tok/s: 7905800 +2448/20000 train_loss: 2.5847 train_time: 4.1m tok/s: 7904514 +2449/20000 train_loss: 2.5003 train_time: 4.1m tok/s: 7903169 +2450/20000 train_loss: 2.5558 train_time: 4.1m tok/s: 7901852 +2451/20000 train_loss: 2.5470 train_time: 4.1m tok/s: 7900553 +2452/20000 train_loss: 2.5694 train_time: 4.1m tok/s: 7899218 +2453/20000 train_loss: 2.4424 train_time: 4.1m tok/s: 7897907 +2454/20000 train_loss: 2.4500 train_time: 4.1m tok/s: 7896639 +2455/20000 train_loss: 2.5487 train_time: 4.1m tok/s: 7895304 +2456/20000 train_loss: 2.5543 train_time: 4.1m tok/s: 7893988 +2457/20000 train_loss: 2.6242 train_time: 4.1m tok/s: 7892718 +2458/20000 train_loss: 2.7155 train_time: 4.1m tok/s: 7891381 +2459/20000 train_loss: 2.5672 train_time: 4.1m tok/s: 7890131 +2460/20000 train_loss: 2.5889 train_time: 4.1m tok/s: 7888805 +2461/20000 train_loss: 2.6668 train_time: 4.1m tok/s: 7887547 +2462/20000 train_loss: 2.5464 train_time: 4.1m tok/s: 7886238 +2463/20000 train_loss: 2.6039 train_time: 4.1m tok/s: 7884950 +2464/20000 train_loss: 2.5216 train_time: 4.1m tok/s: 7883620 +2465/20000 train_loss: 2.6161 train_time: 4.1m tok/s: 7882376 +2466/20000 train_loss: 2.3301 train_time: 4.1m tok/s: 7881089 +2467/20000 train_loss: 2.5954 train_time: 4.1m tok/s: 7879772 +2468/20000 train_loss: 2.4793 train_time: 4.1m tok/s: 7878471 +2469/20000 train_loss: 2.5815 train_time: 4.1m tok/s: 7877202 +2470/20000 train_loss: 2.6069 train_time: 4.1m tok/s: 7875963 +2471/20000 train_loss: 2.5851 train_time: 4.1m tok/s: 7874649 +2472/20000 train_loss: 2.6802 train_time: 4.1m tok/s: 7873366 +2473/20000 train_loss: 2.5544 train_time: 4.1m tok/s: 7872116 +2474/20000 train_loss: 2.8435 train_time: 4.1m tok/s: 7870692 +2475/20000 train_loss: 2.6939 train_time: 4.1m tok/s: 7869405 +2476/20000 train_loss: 2.5771 train_time: 4.1m tok/s: 7868159 +2477/20000 train_loss: 2.5021 train_time: 4.1m tok/s: 7866887 +2478/20000 train_loss: 2.5340 train_time: 4.1m tok/s: 7865651 +2479/20000 train_loss: 2.6441 train_time: 4.1m tok/s: 7864385 +2480/20000 train_loss: 2.5834 train_time: 4.1m tok/s: 7863125 +2481/20000 train_loss: 2.4684 train_time: 4.1m tok/s: 7861892 +2482/20000 train_loss: 2.6226 train_time: 4.1m tok/s: 7860597 +2483/20000 train_loss: 2.5765 train_time: 4.1m tok/s: 7859358 +2484/20000 train_loss: 2.5552 train_time: 4.1m tok/s: 7858117 +2485/20000 train_loss: 2.4921 train_time: 4.1m tok/s: 7856870 +2486/20000 train_loss: 2.5789 train_time: 4.1m tok/s: 7855581 +2487/20000 train_loss: 2.6060 train_time: 4.2m tok/s: 7854319 +2488/20000 train_loss: 2.5724 train_time: 4.2m tok/s: 7853074 +2489/20000 train_loss: 2.4660 train_time: 4.2m tok/s: 7851824 +2490/20000 train_loss: 2.6346 train_time: 4.2m tok/s: 7850552 +2491/20000 train_loss: 2.5767 train_time: 4.2m tok/s: 7849293 +2492/20000 train_loss: 2.5452 train_time: 4.2m tok/s: 7848026 +2493/20000 train_loss: 2.5718 train_time: 4.2m tok/s: 7846795 +2494/20000 train_loss: 2.4560 train_time: 4.2m tok/s: 7845476 +2495/20000 train_loss: 2.4973 train_time: 4.2m tok/s: 7844234 +2496/20000 train_loss: 2.6145 train_time: 4.2m tok/s: 7843029 +2497/20000 train_loss: 2.5656 train_time: 4.2m tok/s: 7841780 +2498/20000 train_loss: 2.5613 train_time: 4.2m tok/s: 7840546 +2499/20000 train_loss: 2.6039 train_time: 4.2m tok/s: 7839329 +2500/20000 train_loss: 2.6728 train_time: 4.2m tok/s: 7838140 +2501/20000 train_loss: 2.5662 train_time: 4.2m tok/s: 7836849 +2502/20000 train_loss: 2.4136 train_time: 4.2m tok/s: 7835578 +2503/20000 train_loss: 2.5229 train_time: 4.2m tok/s: 7834348 +2504/20000 train_loss: 2.6195 train_time: 4.2m tok/s: 7833070 +2505/20000 train_loss: 2.5240 train_time: 4.2m tok/s: 7831838 +2506/20000 train_loss: 2.5763 train_time: 4.2m tok/s: 7830562 +2507/20000 train_loss: 2.4341 train_time: 4.2m tok/s: 7829338 +2508/20000 train_loss: 2.5896 train_time: 4.2m tok/s: 7828113 +2509/20000 train_loss: 2.6294 train_time: 4.2m tok/s: 7826907 +2510/20000 train_loss: 2.5293 train_time: 4.2m tok/s: 7825664 +2511/20000 train_loss: 2.5671 train_time: 4.2m tok/s: 7824298 +2512/20000 train_loss: 2.6267 train_time: 4.2m tok/s: 7823022 +2513/20000 train_loss: 2.4801 train_time: 4.2m tok/s: 7821826 +2514/20000 train_loss: 2.5743 train_time: 4.2m tok/s: 7820629 +2515/20000 train_loss: 2.6102 train_time: 4.2m tok/s: 7819290 +2516/20000 train_loss: 2.5965 train_time: 4.2m tok/s: 7818027 +2517/20000 train_loss: 2.4395 train_time: 4.2m tok/s: 7816795 +2518/20000 train_loss: 2.5444 train_time: 4.2m tok/s: 7815550 +2519/20000 train_loss: 2.5825 train_time: 4.2m tok/s: 7814271 +2520/20000 train_loss: 2.5422 train_time: 4.2m tok/s: 7813003 +2521/20000 train_loss: 2.6211 train_time: 4.2m tok/s: 7811791 +2522/20000 train_loss: 2.6172 train_time: 4.2m tok/s: 7810589 +2523/20000 train_loss: 2.5602 train_time: 4.2m tok/s: 7809366 +2524/20000 train_loss: 2.5419 train_time: 4.2m tok/s: 7808154 +2525/20000 train_loss: 2.4498 train_time: 4.2m tok/s: 7807021 +2526/20000 train_loss: 2.5383 train_time: 4.2m tok/s: 7805789 +2527/20000 train_loss: 2.5508 train_time: 4.2m tok/s: 7804505 +2528/20000 train_loss: 2.6039 train_time: 4.2m tok/s: 7803278 +2529/20000 train_loss: 2.5620 train_time: 4.2m tok/s: 7802092 +2530/20000 train_loss: 2.4890 train_time: 4.3m tok/s: 7800930 +2531/20000 train_loss: 2.4014 train_time: 4.3m tok/s: 7799725 +2532/20000 train_loss: 2.5258 train_time: 4.3m tok/s: 7798410 +2533/20000 train_loss: 2.5434 train_time: 4.3m tok/s: 7797214 +2534/20000 train_loss: 2.4605 train_time: 4.3m tok/s: 7796031 +2535/20000 train_loss: 2.5775 train_time: 4.3m tok/s: 7794867 +2536/20000 train_loss: 2.5701 train_time: 4.3m tok/s: 7793650 +2537/20000 train_loss: 2.4774 train_time: 4.3m tok/s: 7792454 +2538/20000 train_loss: 2.5960 train_time: 4.3m tok/s: 7791295 +2539/20000 train_loss: 2.7889 train_time: 4.3m tok/s: 7790125 +2540/20000 train_loss: 2.5461 train_time: 4.3m tok/s: 7788942 +2541/20000 train_loss: 2.5357 train_time: 4.3m tok/s: 7787815 +2542/20000 train_loss: 2.5526 train_time: 4.3m tok/s: 7786581 +2543/20000 train_loss: 2.6335 train_time: 4.3m tok/s: 7785396 +2544/20000 train_loss: 2.6598 train_time: 4.3m tok/s: 7784222 +2545/20000 train_loss: 2.4912 train_time: 4.3m tok/s: 7782997 +2546/20000 train_loss: 2.5370 train_time: 4.3m tok/s: 7781830 +2547/20000 train_loss: 2.5134 train_time: 4.3m tok/s: 7780648 +2548/20000 train_loss: 2.7855 train_time: 4.3m tok/s: 7779485 +2549/20000 train_loss: 2.5548 train_time: 4.3m tok/s: 7778286 +2550/20000 train_loss: 2.8241 train_time: 4.3m tok/s: 7777136 +2551/20000 train_loss: 2.5345 train_time: 4.3m tok/s: 7775979 +2552/20000 train_loss: 2.7488 train_time: 4.3m tok/s: 7774842 +2553/20000 train_loss: 2.5872 train_time: 4.3m tok/s: 7773591 +2554/20000 train_loss: 2.4628 train_time: 4.3m tok/s: 7772478 +2555/20000 train_loss: 2.5542 train_time: 4.3m tok/s: 7771336 +2556/20000 train_loss: 2.5598 train_time: 4.3m tok/s: 7770160 +2557/20000 train_loss: 2.5038 train_time: 4.3m tok/s: 7769013 +2558/20000 train_loss: 2.6142 train_time: 4.3m tok/s: 7767880 +2559/20000 train_loss: 2.4343 train_time: 4.3m tok/s: 7766673 +2560/20000 train_loss: 2.4490 train_time: 4.3m tok/s: 7765447 +2561/20000 train_loss: 2.5288 train_time: 4.3m tok/s: 7764320 +2562/20000 train_loss: 2.4745 train_time: 4.3m tok/s: 7763200 +2563/20000 train_loss: 2.4427 train_time: 4.3m tok/s: 7762056 +2564/20000 train_loss: 2.4324 train_time: 4.3m tok/s: 7760874 +2565/20000 train_loss: 2.5392 train_time: 4.3m tok/s: 7759716 +2566/20000 train_loss: 2.5667 train_time: 4.3m tok/s: 7758633 +2567/20000 train_loss: 2.5835 train_time: 4.3m tok/s: 7757493 +2568/20000 train_loss: 2.6135 train_time: 4.3m tok/s: 7756332 +2569/20000 train_loss: 2.6611 train_time: 4.3m tok/s: 7755195 +2570/20000 train_loss: 2.5517 train_time: 4.3m tok/s: 7754026 +2571/20000 train_loss: 2.6060 train_time: 4.3m tok/s: 7752875 +2572/20000 train_loss: 2.4356 train_time: 4.3m tok/s: 7751741 +2573/20000 train_loss: 2.4992 train_time: 4.4m tok/s: 7750548 +2574/20000 train_loss: 2.6861 train_time: 4.4m tok/s: 7749382 +2575/20000 train_loss: 2.5862 train_time: 4.4m tok/s: 7748274 +2576/20000 train_loss: 2.5131 train_time: 4.4m tok/s: 7747145 +2577/20000 train_loss: 2.4739 train_time: 4.4m tok/s: 7746023 +2578/20000 train_loss: 2.4463 train_time: 4.4m tok/s: 7744906 +2579/20000 train_loss: 2.5430 train_time: 4.4m tok/s: 7743785 +2580/20000 train_loss: 2.4690 train_time: 4.4m tok/s: 7742628 +2581/20000 train_loss: 2.3496 train_time: 4.4m tok/s: 7741415 +2582/20000 train_loss: 2.5642 train_time: 4.4m tok/s: 7740279 +2583/20000 train_loss: 2.5957 train_time: 4.4m tok/s: 7739148 +2584/20000 train_loss: 2.5916 train_time: 4.4m tok/s: 7738026 +2585/20000 train_loss: 2.4834 train_time: 4.4m tok/s: 7736930 +2586/20000 train_loss: 2.5082 train_time: 4.4m tok/s: 7735783 +2587/20000 train_loss: 2.5487 train_time: 4.4m tok/s: 7734676 +2588/20000 train_loss: 2.6069 train_time: 4.4m tok/s: 7733539 +2589/20000 train_loss: 2.4782 train_time: 4.4m tok/s: 7732419 +2590/20000 train_loss: 2.5150 train_time: 4.4m tok/s: 7731318 +2591/20000 train_loss: 2.4702 train_time: 4.4m tok/s: 7730199 +2592/20000 train_loss: 2.4328 train_time: 4.4m tok/s: 7729075 +2593/20000 train_loss: 2.5044 train_time: 4.4m tok/s: 7727987 +2594/20000 train_loss: 2.4295 train_time: 4.4m tok/s: 7726818 +2595/20000 train_loss: 2.6080 train_time: 4.4m tok/s: 7725705 +2596/20000 train_loss: 3.0952 train_time: 4.4m tok/s: 7724593 +2597/20000 train_loss: 2.4065 train_time: 4.4m tok/s: 7723484 +2598/20000 train_loss: 2.5094 train_time: 4.4m tok/s: 7722371 +2599/20000 train_loss: 2.6202 train_time: 4.4m tok/s: 7721249 +2600/20000 train_loss: 2.5802 train_time: 4.4m tok/s: 7720126 +2601/20000 train_loss: 2.5012 train_time: 4.4m tok/s: 7719018 +2602/20000 train_loss: 2.7696 train_time: 4.4m tok/s: 7717917 +2603/20000 train_loss: 2.5249 train_time: 4.4m tok/s: 7716819 +2604/20000 train_loss: 2.5245 train_time: 4.4m tok/s: 7715689 +2605/20000 train_loss: 2.6691 train_time: 4.4m tok/s: 7714583 +2606/20000 train_loss: 2.4130 train_time: 4.4m tok/s: 7713501 +2607/20000 train_loss: 2.4734 train_time: 4.4m tok/s: 7712347 +2608/20000 train_loss: 2.5299 train_time: 4.4m tok/s: 7711283 +2609/20000 train_loss: 2.4865 train_time: 4.4m tok/s: 7710113 +2610/20000 train_loss: 2.4743 train_time: 4.4m tok/s: 7709023 +2611/20000 train_loss: 2.6162 train_time: 4.4m tok/s: 7707941 +2612/20000 train_loss: 2.6840 train_time: 4.4m tok/s: 7706771 +2613/20000 train_loss: 2.5602 train_time: 4.4m tok/s: 7705676 +2614/20000 train_loss: 2.5460 train_time: 4.4m tok/s: 7704598 +2615/20000 train_loss: 2.6722 train_time: 4.4m tok/s: 7703528 +2616/20000 train_loss: 2.5771 train_time: 4.5m tok/s: 7702434 +2617/20000 train_loss: 2.5680 train_time: 4.5m tok/s: 7701354 +2618/20000 train_loss: 2.5671 train_time: 4.5m tok/s: 7700250 +2619/20000 train_loss: 2.4633 train_time: 4.5m tok/s: 7699189 +2620/20000 train_loss: 2.4306 train_time: 4.5m tok/s: 7698112 +2621/20000 train_loss: 2.5054 train_time: 4.5m tok/s: 7697011 +2622/20000 train_loss: 2.4895 train_time: 4.5m tok/s: 7695992 +2623/20000 train_loss: 2.5090 train_time: 4.5m tok/s: 7694830 +2624/20000 train_loss: 2.3486 train_time: 4.5m tok/s: 7693745 +2625/20000 train_loss: 2.6367 train_time: 4.5m tok/s: 7692716 +2626/20000 train_loss: 2.3587 train_time: 4.5m tok/s: 7691610 +2627/20000 train_loss: 2.4638 train_time: 4.5m tok/s: 7690551 +2628/20000 train_loss: 2.6601 train_time: 4.5m tok/s: 7689485 +2629/20000 train_loss: 2.5601 train_time: 4.5m tok/s: 7688433 +2630/20000 train_loss: 2.6218 train_time: 4.5m tok/s: 7687328 +2631/20000 train_loss: 2.5929 train_time: 4.5m tok/s: 7686222 +2632/20000 train_loss: 2.6348 train_time: 4.5m tok/s: 7685185 +2633/20000 train_loss: 2.4865 train_time: 4.5m tok/s: 7684089 +2634/20000 train_loss: 2.5667 train_time: 4.5m tok/s: 7683040 +2635/20000 train_loss: 2.4911 train_time: 4.5m tok/s: 7681948 +2636/20000 train_loss: 2.5418 train_time: 4.5m tok/s: 7680888 +2637/20000 train_loss: 2.4554 train_time: 4.5m tok/s: 7679830 +2638/20000 train_loss: 2.5388 train_time: 4.5m tok/s: 7678739 +2639/20000 train_loss: 2.2733 train_time: 4.5m tok/s: 7677607 +2640/20000 train_loss: 2.5428 train_time: 4.5m tok/s: 7676583 +2641/20000 train_loss: 2.5948 train_time: 4.5m tok/s: 7675465 +2642/20000 train_loss: 2.6462 train_time: 4.5m tok/s: 7674442 +2643/20000 train_loss: 2.5601 train_time: 4.5m tok/s: 7673352 +2644/20000 train_loss: 2.5819 train_time: 4.5m tok/s: 7672309 +2645/20000 train_loss: 2.5340 train_time: 4.5m tok/s: 7671253 +2646/20000 train_loss: 2.5672 train_time: 4.5m tok/s: 7670093 +2647/20000 train_loss: 2.6580 train_time: 4.5m tok/s: 7669027 +2648/20000 train_loss: 2.5172 train_time: 4.5m tok/s: 7667989 +2649/20000 train_loss: 2.5460 train_time: 4.5m tok/s: 7666945 +2650/20000 train_loss: 2.4652 train_time: 4.5m tok/s: 7665913 +2651/20000 train_loss: 2.4373 train_time: 4.5m tok/s: 7664875 +2652/20000 train_loss: 2.3550 train_time: 4.5m tok/s: 7663823 +2653/20000 train_loss: 2.6508 train_time: 4.5m tok/s: 7662724 +2654/20000 train_loss: 2.2725 train_time: 4.5m tok/s: 7661599 +2655/20000 train_loss: 2.9441 train_time: 4.5m tok/s: 7660464 +2656/20000 train_loss: 2.4550 train_time: 4.5m tok/s: 7659420 +2657/20000 train_loss: 2.4392 train_time: 4.5m tok/s: 7658394 +2658/20000 train_loss: 2.6202 train_time: 4.5m tok/s: 7657290 +2659/20000 train_loss: 2.5281 train_time: 4.6m tok/s: 7656186 +2660/20000 train_loss: 2.5936 train_time: 4.6m tok/s: 7655104 +2661/20000 train_loss: 2.5486 train_time: 4.6m tok/s: 7654055 +2662/20000 train_loss: 2.3351 train_time: 4.6m tok/s: 7652986 +2663/20000 train_loss: 2.7240 train_time: 4.6m tok/s: 7651897 +2664/20000 train_loss: 2.5450 train_time: 4.6m tok/s: 7650761 +2665/20000 train_loss: 2.4913 train_time: 4.6m tok/s: 7649712 +2666/20000 train_loss: 2.4073 train_time: 4.6m tok/s: 7648686 +2667/20000 train_loss: 2.2918 train_time: 4.6m tok/s: 7647656 +2668/20000 train_loss: 2.5799 train_time: 4.6m tok/s: 7646594 +2669/20000 train_loss: 2.4555 train_time: 4.6m tok/s: 7645564 +2670/20000 train_loss: 2.5740 train_time: 4.6m tok/s: 7644553 +2671/20000 train_loss: 2.6714 train_time: 4.6m tok/s: 7643510 +2672/20000 train_loss: 2.6147 train_time: 4.6m tok/s: 7642499 +2673/20000 train_loss: 2.5684 train_time: 4.6m tok/s: 7641482 +2674/20000 train_loss: 2.6311 train_time: 4.6m tok/s: 7640458 +2675/20000 train_loss: 2.5330 train_time: 4.6m tok/s: 7639438 +2676/20000 train_loss: 2.5228 train_time: 4.6m tok/s: 7638381 +2677/20000 train_loss: 2.4367 train_time: 4.6m tok/s: 7637375 +2678/20000 train_loss: 2.4740 train_time: 4.6m tok/s: 7636328 +2679/20000 train_loss: 2.3282 train_time: 4.6m tok/s: 7635279 +2680/20000 train_loss: 2.4358 train_time: 4.6m tok/s: 7634257 +2681/20000 train_loss: 2.4490 train_time: 4.6m tok/s: 7633230 +2682/20000 train_loss: 2.5427 train_time: 4.6m tok/s: 7632222 +2683/20000 train_loss: 2.4670 train_time: 4.6m tok/s: 7631194 +2684/20000 train_loss: 2.4811 train_time: 4.6m tok/s: 7630166 +2685/20000 train_loss: 2.8106 train_time: 4.6m tok/s: 7629125 +2686/20000 train_loss: 2.5216 train_time: 4.6m tok/s: 7628157 +2687/20000 train_loss: 2.5802 train_time: 4.6m tok/s: 7627156 +2688/20000 train_loss: 2.4570 train_time: 4.6m tok/s: 7626161 +2689/20000 train_loss: 2.5505 train_time: 4.6m tok/s: 7625129 +2690/20000 train_loss: 2.5524 train_time: 4.6m tok/s: 7624114 +2691/20000 train_loss: 2.5344 train_time: 4.6m tok/s: 7623110 +2692/20000 train_loss: 2.4694 train_time: 4.6m tok/s: 7622097 +2693/20000 train_loss: 2.4480 train_time: 4.6m tok/s: 7621033 +2694/20000 train_loss: 2.4779 train_time: 4.6m tok/s: 7620004 +2695/20000 train_loss: 2.5831 train_time: 4.6m tok/s: 7618986 +2696/20000 train_loss: 2.4787 train_time: 4.6m tok/s: 7617983 +2697/20000 train_loss: 2.5606 train_time: 4.6m tok/s: 7616965 +2698/20000 train_loss: 2.5781 train_time: 4.6m tok/s: 7615973 +2699/20000 train_loss: 2.4777 train_time: 4.6m tok/s: 7614986 +2700/20000 train_loss: 2.4587 train_time: 4.6m tok/s: 7614003 +2701/20000 train_loss: 2.5890 train_time: 4.7m tok/s: 7613021 +2702/20000 train_loss: 2.3712 train_time: 4.7m tok/s: 7612031 +2703/20000 train_loss: 2.5563 train_time: 4.7m tok/s: 7611059 +2704/20000 train_loss: 2.4514 train_time: 4.7m tok/s: 7609986 +2705/20000 train_loss: 2.5100 train_time: 4.7m tok/s: 7609024 +2706/20000 train_loss: 2.5183 train_time: 4.7m tok/s: 7608052 +2707/20000 train_loss: 2.6204 train_time: 4.7m tok/s: 7607019 +2708/20000 train_loss: 2.6550 train_time: 4.7m tok/s: 7605994 +2709/20000 train_loss: 2.5740 train_time: 4.7m tok/s: 7605028 +2710/20000 train_loss: 2.6279 train_time: 4.7m tok/s: 7604004 +2711/20000 train_loss: 2.7073 train_time: 4.7m tok/s: 7602982 +2712/20000 train_loss: 2.4950 train_time: 4.7m tok/s: 7602032 +2713/20000 train_loss: 2.5724 train_time: 4.7m tok/s: 7601005 +2714/20000 train_loss: 2.6281 train_time: 4.7m tok/s: 7600028 +2715/20000 train_loss: 2.4579 train_time: 4.7m tok/s: 7599075 +2716/20000 train_loss: 2.4199 train_time: 4.7m tok/s: 7598088 +2717/20000 train_loss: 2.5248 train_time: 4.7m tok/s: 7597157 +2718/20000 train_loss: 2.4495 train_time: 4.7m tok/s: 7596147 +2719/20000 train_loss: 2.4432 train_time: 4.7m tok/s: 7595173 +2720/20000 train_loss: 2.5684 train_time: 4.7m tok/s: 7594102 +2721/20000 train_loss: 2.4111 train_time: 4.7m tok/s: 7593026 +2722/20000 train_loss: 2.4857 train_time: 4.7m tok/s: 7592046 +2723/20000 train_loss: 2.4645 train_time: 4.7m tok/s: 7591109 +2724/20000 train_loss: 2.5260 train_time: 4.7m tok/s: 7590108 +2725/20000 train_loss: 2.6417 train_time: 4.7m tok/s: 7589169 +2726/20000 train_loss: 2.5065 train_time: 4.7m tok/s: 7588223 +2727/20000 train_loss: 2.5663 train_time: 4.7m tok/s: 7587260 +2728/20000 train_loss: 2.8968 train_time: 4.7m tok/s: 7586315 +2729/20000 train_loss: 2.6997 train_time: 4.7m tok/s: 7585364 +2730/20000 train_loss: 2.5374 train_time: 4.7m tok/s: 7584379 +2731/20000 train_loss: 2.6094 train_time: 4.7m tok/s: 7583390 +2732/20000 train_loss: 2.6347 train_time: 4.7m tok/s: 7582411 +2733/20000 train_loss: 2.5490 train_time: 4.7m tok/s: 7581444 +2734/20000 train_loss: 2.6283 train_time: 4.7m tok/s: 7580485 +2735/20000 train_loss: 2.4311 train_time: 4.7m tok/s: 7579535 +2736/20000 train_loss: 2.5532 train_time: 4.7m tok/s: 7578619 +2737/20000 train_loss: 2.4799 train_time: 4.7m tok/s: 7577592 +2738/20000 train_loss: 2.4094 train_time: 4.7m tok/s: 7576607 +2739/20000 train_loss: 2.5408 train_time: 4.7m tok/s: 7575654 +2740/20000 train_loss: 2.5572 train_time: 4.7m tok/s: 7574678 +2741/20000 train_loss: 2.5185 train_time: 4.7m tok/s: 7573711 +2742/20000 train_loss: 2.4823 train_time: 4.7m tok/s: 7572778 +2743/20000 train_loss: 2.5805 train_time: 4.7m tok/s: 7571791 +2744/20000 train_loss: 2.5895 train_time: 4.8m tok/s: 7570846 +2745/20000 train_loss: 2.6531 train_time: 4.8m tok/s: 7569920 +2746/20000 train_loss: 2.6218 train_time: 4.8m tok/s: 7568959 +2747/20000 train_loss: 2.4441 train_time: 4.8m tok/s: 7567990 +2748/20000 train_loss: 2.5154 train_time: 4.8m tok/s: 7567013 +2749/20000 train_loss: 2.5820 train_time: 4.8m tok/s: 7566082 +2750/20000 train_loss: 2.6452 train_time: 4.8m tok/s: 7565117 +2751/20000 train_loss: 2.6306 train_time: 4.8m tok/s: 7564165 +2752/20000 train_loss: 2.5047 train_time: 4.8m tok/s: 7563214 +2753/20000 train_loss: 2.4866 train_time: 4.8m tok/s: 7562313 +2754/20000 train_loss: 2.4567 train_time: 4.8m tok/s: 7561381 +2755/20000 train_loss: 2.5016 train_time: 4.8m tok/s: 7560402 +2756/20000 train_loss: 2.4830 train_time: 4.8m tok/s: 7559475 +2757/20000 train_loss: 2.4496 train_time: 4.8m tok/s: 7558489 +2758/20000 train_loss: 2.5931 train_time: 4.8m tok/s: 7557529 +2759/20000 train_loss: 2.4757 train_time: 4.8m tok/s: 7556575 +2760/20000 train_loss: 2.4130 train_time: 4.8m tok/s: 7555632 +2761/20000 train_loss: 2.6437 train_time: 4.8m tok/s: 7554640 +2762/20000 train_loss: 2.5414 train_time: 4.8m tok/s: 7553714 +2763/20000 train_loss: 2.5951 train_time: 4.8m tok/s: 7552780 +2764/20000 train_loss: 2.5721 train_time: 4.8m tok/s: 7551856 +2765/20000 train_loss: 2.5085 train_time: 4.8m tok/s: 7550940 +2766/20000 train_loss: 2.4660 train_time: 4.8m tok/s: 7550007 +2767/20000 train_loss: 2.5102 train_time: 4.8m tok/s: 7549060 +2768/20000 train_loss: 2.6474 train_time: 4.8m tok/s: 7548125 +2769/20000 train_loss: 2.5592 train_time: 4.8m tok/s: 7547141 +2770/20000 train_loss: 2.7017 train_time: 4.8m tok/s: 7546196 +2771/20000 train_loss: 2.5115 train_time: 4.8m tok/s: 7545247 +2772/20000 train_loss: 2.5359 train_time: 4.8m tok/s: 7544319 +2773/20000 train_loss: 2.5178 train_time: 4.8m tok/s: 7543383 +2774/20000 train_loss: 2.4714 train_time: 4.8m tok/s: 7542452 +2775/20000 train_loss: 2.4707 train_time: 4.8m tok/s: 7541530 +2776/20000 train_loss: 2.4153 train_time: 4.8m tok/s: 7540610 +2777/20000 train_loss: 2.4803 train_time: 4.8m tok/s: 7539684 +2778/20000 train_loss: 2.5345 train_time: 4.8m tok/s: 7538772 +2779/20000 train_loss: 2.4037 train_time: 4.8m tok/s: 7537784 +2780/20000 train_loss: 2.6623 train_time: 4.8m tok/s: 7536883 +2781/20000 train_loss: 2.6360 train_time: 4.8m tok/s: 7535976 +2782/20000 train_loss: 2.4615 train_time: 4.8m tok/s: 7535080 +2783/20000 train_loss: 2.6230 train_time: 4.8m tok/s: 7534145 +2784/20000 train_loss: 2.6529 train_time: 4.8m tok/s: 7533248 +2785/20000 train_loss: 2.5229 train_time: 4.8m tok/s: 7532340 +2786/20000 train_loss: 2.5237 train_time: 4.8m tok/s: 7531411 +2787/20000 train_loss: 2.5830 train_time: 4.9m tok/s: 7530475 +2788/20000 train_loss: 2.4289 train_time: 4.9m tok/s: 7529556 +2789/20000 train_loss: 2.5456 train_time: 4.9m tok/s: 7528657 +2790/20000 train_loss: 2.6072 train_time: 4.9m tok/s: 7527724 +2791/20000 train_loss: 2.3670 train_time: 4.9m tok/s: 7526715 +2792/20000 train_loss: 2.5233 train_time: 4.9m tok/s: 7525801 +2793/20000 train_loss: 2.5464 train_time: 4.9m tok/s: 7524874 +2794/20000 train_loss: 2.4984 train_time: 4.9m tok/s: 7523981 +2795/20000 train_loss: 2.3798 train_time: 4.9m tok/s: 7523120 +2796/20000 train_loss: 2.5178 train_time: 4.9m tok/s: 7522191 +2797/20000 train_loss: 2.5439 train_time: 4.9m tok/s: 7521301 +2798/20000 train_loss: 2.5204 train_time: 4.9m tok/s: 7520391 +2799/20000 train_loss: 2.7806 train_time: 4.9m tok/s: 7519501 +2800/20000 train_loss: 2.6309 train_time: 4.9m tok/s: 7518591 +2801/20000 train_loss: 2.5026 train_time: 4.9m tok/s: 7517674 +2802/20000 train_loss: 2.5197 train_time: 4.9m tok/s: 7516810 +2803/20000 train_loss: 2.5935 train_time: 4.9m tok/s: 7515844 +2804/20000 train_loss: 2.6355 train_time: 4.9m tok/s: 7514939 +2805/20000 train_loss: 2.4744 train_time: 4.9m tok/s: 7514035 +2806/20000 train_loss: 2.5062 train_time: 4.9m tok/s: 7513117 +2807/20000 train_loss: 2.6584 train_time: 4.9m tok/s: 7512225 +2808/20000 train_loss: 2.5859 train_time: 4.9m tok/s: 7511214 +2809/20000 train_loss: 2.4399 train_time: 4.9m tok/s: 7510301 +2810/20000 train_loss: 2.5304 train_time: 4.9m tok/s: 7509413 +2811/20000 train_loss: 2.5916 train_time: 4.9m tok/s: 7508507 +2812/20000 train_loss: 2.5973 train_time: 4.9m tok/s: 7507622 +2813/20000 train_loss: 2.3938 train_time: 4.9m tok/s: 7506751 +2814/20000 train_loss: 2.5393 train_time: 4.9m tok/s: 7505862 +2815/20000 train_loss: 2.6721 train_time: 4.9m tok/s: 7504978 +2816/20000 train_loss: 2.6399 train_time: 4.9m tok/s: 7504078 +2817/20000 train_loss: 2.5448 train_time: 4.9m tok/s: 7503199 +2818/20000 train_loss: 2.5661 train_time: 4.9m tok/s: 7502345 +2819/20000 train_loss: 2.4743 train_time: 4.9m tok/s: 7501498 +2820/20000 train_loss: 2.4989 train_time: 4.9m tok/s: 7500615 +2821/20000 train_loss: 2.4400 train_time: 4.9m tok/s: 7499655 +2822/20000 train_loss: 2.7934 train_time: 4.9m tok/s: 7498679 +2823/20000 train_loss: 2.6177 train_time: 4.9m tok/s: 7497799 +2824/20000 train_loss: 2.6749 train_time: 4.9m tok/s: 7496897 +2825/20000 train_loss: 2.4820 train_time: 4.9m tok/s: 7496015 +2826/20000 train_loss: 2.5645 train_time: 4.9m tok/s: 7495153 +2827/20000 train_loss: 2.4507 train_time: 4.9m tok/s: 7494305 +2828/20000 train_loss: 2.6260 train_time: 4.9m tok/s: 7493423 +2829/20000 train_loss: 2.4131 train_time: 4.9m tok/s: 7492558 +2830/20000 train_loss: 2.5082 train_time: 5.0m tok/s: 7491654 +2831/20000 train_loss: 2.7162 train_time: 5.0m tok/s: 7490777 +2832/20000 train_loss: 2.5276 train_time: 5.0m tok/s: 7489914 +2833/20000 train_loss: 2.6925 train_time: 5.0m tok/s: 7489040 +2834/20000 train_loss: 2.6388 train_time: 5.0m tok/s: 7488164 +2835/20000 train_loss: 2.5931 train_time: 5.0m tok/s: 7487297 +2836/20000 train_loss: 2.4994 train_time: 5.0m tok/s: 7486413 +2837/20000 train_loss: 2.5779 train_time: 5.0m tok/s: 7485536 +2838/20000 train_loss: 2.5442 train_time: 5.0m tok/s: 7484633 +2839/20000 train_loss: 2.5285 train_time: 5.0m tok/s: 7483759 +2840/20000 train_loss: 2.6097 train_time: 5.0m tok/s: 7482890 +2841/20000 train_loss: 2.5508 train_time: 5.0m tok/s: 7482004 +2842/20000 train_loss: 2.6262 train_time: 5.0m tok/s: 7481139 +2843/20000 train_loss: 2.4449 train_time: 5.0m tok/s: 7480233 +2844/20000 train_loss: 2.4384 train_time: 5.0m tok/s: 7479361 +2845/20000 train_loss: 2.5239 train_time: 5.0m tok/s: 7478509 +2846/20000 train_loss: 2.3911 train_time: 5.0m tok/s: 7477650 +2847/20000 train_loss: 2.4322 train_time: 5.0m tok/s: 7476768 +2848/20000 train_loss: 2.5625 train_time: 5.0m tok/s: 7475922 +2849/20000 train_loss: 2.6316 train_time: 5.0m tok/s: 7475030 +2850/20000 train_loss: 2.5749 train_time: 5.0m tok/s: 7474146 +2851/20000 train_loss: 2.7799 train_time: 5.0m tok/s: 7473255 +2852/20000 train_loss: 2.4624 train_time: 5.0m tok/s: 7472383 +2853/20000 train_loss: 2.5365 train_time: 5.0m tok/s: 7471521 +2854/20000 train_loss: 2.4569 train_time: 5.0m tok/s: 7470659 +2855/20000 train_loss: 2.6423 train_time: 5.0m tok/s: 7469797 +2856/20000 train_loss: 2.4677 train_time: 5.0m tok/s: 7468929 +2857/20000 train_loss: 2.5327 train_time: 5.0m tok/s: 7468084 +2858/20000 train_loss: 2.5123 train_time: 5.0m tok/s: 7467202 +2859/20000 train_loss: 3.1528 train_time: 5.0m tok/s: 7466254 +2860/20000 train_loss: 2.4918 train_time: 5.0m tok/s: 7465375 +2861/20000 train_loss: 2.5030 train_time: 5.0m tok/s: 7464548 +2862/20000 train_loss: 2.5150 train_time: 5.0m tok/s: 7463715 +2863/20000 train_loss: 2.3516 train_time: 5.0m tok/s: 7462884 +2864/20000 train_loss: 2.3826 train_time: 5.0m tok/s: 7462037 +2865/20000 train_loss: 2.6168 train_time: 5.0m tok/s: 7461178 +2866/20000 train_loss: 2.5269 train_time: 5.0m tok/s: 7460355 +2867/20000 train_loss: 2.3950 train_time: 5.0m tok/s: 7459494 +2868/20000 train_loss: 2.4288 train_time: 5.0m tok/s: 7458660 +2869/20000 train_loss: 2.5630 train_time: 5.0m tok/s: 7457843 +2870/20000 train_loss: 2.6530 train_time: 5.0m tok/s: 7457002 +2871/20000 train_loss: 2.4594 train_time: 5.0m tok/s: 7456165 +2872/20000 train_loss: 3.0441 train_time: 5.0m tok/s: 7455283 +2873/20000 train_loss: 2.4747 train_time: 5.1m tok/s: 7454311 +2874/20000 train_loss: 2.6147 train_time: 5.1m tok/s: 7453433 +2875/20000 train_loss: 2.5346 train_time: 5.1m tok/s: 7452612 +2876/20000 train_loss: 2.5927 train_time: 5.1m tok/s: 7451797 +2877/20000 train_loss: 2.5201 train_time: 5.1m tok/s: 7450965 +2878/20000 train_loss: 2.4987 train_time: 5.1m tok/s: 7450113 +2879/20000 train_loss: 2.5352 train_time: 5.1m tok/s: 7449275 +2880/20000 train_loss: 2.5453 train_time: 5.1m tok/s: 7448453 +2881/20000 train_loss: 2.6013 train_time: 5.1m tok/s: 7447634 +2882/20000 train_loss: 2.6852 train_time: 5.1m tok/s: 7446821 +2883/20000 train_loss: 2.6457 train_time: 5.1m tok/s: 7445941 +2884/20000 train_loss: 2.6037 train_time: 5.1m tok/s: 7445141 +2885/20000 train_loss: 2.5567 train_time: 5.1m tok/s: 7444308 +2886/20000 train_loss: 2.5517 train_time: 5.1m tok/s: 7443504 +2887/20000 train_loss: 2.5273 train_time: 5.1m tok/s: 7442633 +2888/20000 train_loss: 2.6248 train_time: 5.1m tok/s: 7441752 +2889/20000 train_loss: 2.5761 train_time: 5.1m tok/s: 7440932 +2890/20000 train_loss: 2.5906 train_time: 5.1m tok/s: 7440097 +2891/20000 train_loss: 2.5308 train_time: 5.1m tok/s: 7439285 +2892/20000 train_loss: 2.4684 train_time: 5.1m tok/s: 7438463 +2893/20000 train_loss: 2.3144 train_time: 5.1m tok/s: 7437626 +2894/20000 train_loss: 2.5563 train_time: 5.1m tok/s: 7436797 +2895/20000 train_loss: 2.5231 train_time: 5.1m tok/s: 7435954 +2896/20000 train_loss: 2.4963 train_time: 5.1m tok/s: 7435147 +2897/20000 train_loss: 2.5760 train_time: 5.1m tok/s: 7434315 +2898/20000 train_loss: 2.5603 train_time: 5.1m tok/s: 7433512 +2899/20000 train_loss: 2.5608 train_time: 5.1m tok/s: 7432718 +2900/20000 train_loss: 2.6062 train_time: 5.1m tok/s: 7431854 +2901/20000 train_loss: 2.4184 train_time: 5.1m tok/s: 7430990 +2902/20000 train_loss: 2.5358 train_time: 5.1m tok/s: 7430115 +2903/20000 train_loss: 2.4659 train_time: 5.1m tok/s: 7429267 +2904/20000 train_loss: 2.4623 train_time: 5.1m tok/s: 7428451 +2905/20000 train_loss: 2.6116 train_time: 5.1m tok/s: 7427645 +2906/20000 train_loss: 2.4171 train_time: 5.1m tok/s: 7426851 +2907/20000 train_loss: 2.4593 train_time: 5.1m tok/s: 7426063 +2908/20000 train_loss: 2.5220 train_time: 5.1m tok/s: 7425249 +2909/20000 train_loss: 2.5184 train_time: 5.1m tok/s: 7424417 +2910/20000 train_loss: 2.3961 train_time: 5.1m tok/s: 7423598 +2911/20000 train_loss: 2.5949 train_time: 5.1m tok/s: 7422750 +2912/20000 train_loss: 2.5438 train_time: 5.1m tok/s: 7421888 +2913/20000 train_loss: 2.5726 train_time: 5.1m tok/s: 7421041 +2914/20000 train_loss: 2.6261 train_time: 5.1m tok/s: 7420237 +2915/20000 train_loss: 2.5355 train_time: 5.1m tok/s: 7419473 +2916/20000 train_loss: 2.4732 train_time: 5.2m tok/s: 7418603 +2917/20000 train_loss: 2.5245 train_time: 5.2m tok/s: 7417802 +2918/20000 train_loss: 2.8647 train_time: 5.2m tok/s: 7417009 +2919/20000 train_loss: 2.6991 train_time: 5.2m tok/s: 7416163 +2920/20000 train_loss: 2.4767 train_time: 5.2m tok/s: 7415367 +2921/20000 train_loss: 2.4263 train_time: 5.2m tok/s: 7414571 +2922/20000 train_loss: 2.4648 train_time: 5.2m tok/s: 7413783 +2923/20000 train_loss: 2.4600 train_time: 5.2m tok/s: 7412990 +2924/20000 train_loss: 2.5753 train_time: 5.2m tok/s: 7412185 +2925/20000 train_loss: 2.6242 train_time: 5.2m tok/s: 7411360 +2926/20000 train_loss: 2.4980 train_time: 5.2m tok/s: 7410516 +2927/20000 train_loss: 2.5172 train_time: 5.2m tok/s: 7409737 +2928/20000 train_loss: 2.3561 train_time: 5.2m tok/s: 7408927 +2929/20000 train_loss: 2.5576 train_time: 5.2m tok/s: 7408137 +2930/20000 train_loss: 2.4514 train_time: 5.2m tok/s: 7407332 +2931/20000 train_loss: 2.4456 train_time: 5.2m tok/s: 7406551 +2932/20000 train_loss: 2.4277 train_time: 5.2m tok/s: 7405755 +2933/20000 train_loss: 2.5200 train_time: 5.2m tok/s: 7404956 +2934/20000 train_loss: 2.5360 train_time: 5.2m tok/s: 7404146 +2935/20000 train_loss: 2.5263 train_time: 5.2m tok/s: 7403343 +2936/20000 train_loss: 2.5245 train_time: 5.2m tok/s: 7402548 +2937/20000 train_loss: 2.6517 train_time: 5.2m tok/s: 7401770 +2938/20000 train_loss: 2.5971 train_time: 5.2m tok/s: 7400950 +2939/20000 train_loss: 2.5770 train_time: 5.2m tok/s: 7400171 +2940/20000 train_loss: 2.3900 train_time: 5.2m tok/s: 7399364 +2941/20000 train_loss: 2.6217 train_time: 5.2m tok/s: 7398532 +2942/20000 train_loss: 2.5345 train_time: 5.2m tok/s: 7397735 +2943/20000 train_loss: 2.4248 train_time: 5.2m tok/s: 7396938 +2944/20000 train_loss: 2.4162 train_time: 5.2m tok/s: 7396134 +2945/20000 train_loss: 2.5160 train_time: 5.2m tok/s: 7395311 +2946/20000 train_loss: 2.4909 train_time: 5.2m tok/s: 7394511 +2947/20000 train_loss: 2.4964 train_time: 5.2m tok/s: 7393760 +2948/20000 train_loss: 2.4783 train_time: 5.2m tok/s: 7392968 +2949/20000 train_loss: 2.5432 train_time: 5.2m tok/s: 7392189 +2950/20000 train_loss: 2.6427 train_time: 5.2m tok/s: 7391369 +2951/20000 train_loss: 2.7275 train_time: 5.2m tok/s: 7390570 +2952/20000 train_loss: 2.5588 train_time: 5.2m tok/s: 7389773 +2953/20000 train_loss: 2.5044 train_time: 5.2m tok/s: 7388962 +2954/20000 train_loss: 2.4811 train_time: 5.2m tok/s: 7388193 +2955/20000 train_loss: 2.4678 train_time: 5.2m tok/s: 7387392 +2956/20000 train_loss: 2.4613 train_time: 5.2m tok/s: 7386589 +2957/20000 train_loss: 2.7206 train_time: 5.2m tok/s: 7385746 +2958/20000 train_loss: 2.4343 train_time: 5.2m tok/s: 7384983 +2959/20000 train_loss: 2.4326 train_time: 5.3m tok/s: 7384242 +2960/20000 train_loss: 2.4329 train_time: 5.3m tok/s: 7383438 +2961/20000 train_loss: 2.5396 train_time: 5.3m tok/s: 7382666 +2962/20000 train_loss: 2.4772 train_time: 5.3m tok/s: 7381934 +2963/20000 train_loss: 2.6926 train_time: 5.3m tok/s: 7381146 +2964/20000 train_loss: 2.5856 train_time: 5.3m tok/s: 7380377 +2965/20000 train_loss: 2.7039 train_time: 5.3m tok/s: 7379599 +2966/20000 train_loss: 2.5839 train_time: 5.3m tok/s: 7378820 +2967/20000 train_loss: 2.5507 train_time: 5.3m tok/s: 7378037 +2968/20000 train_loss: 2.4838 train_time: 5.3m tok/s: 7377281 +2969/20000 train_loss: 2.6632 train_time: 5.3m tok/s: 7376502 +2970/20000 train_loss: 2.4759 train_time: 5.3m tok/s: 7375745 +2971/20000 train_loss: 2.4893 train_time: 5.3m tok/s: 7374976 +2972/20000 train_loss: 2.5255 train_time: 5.3m tok/s: 7374207 +2973/20000 train_loss: 2.4442 train_time: 5.3m tok/s: 7373402 +2974/20000 train_loss: 2.5844 train_time: 5.3m tok/s: 7372598 +2975/20000 train_loss: 2.5875 train_time: 5.3m tok/s: 7371829 +2976/20000 train_loss: 2.5492 train_time: 5.3m tok/s: 7371062 +2977/20000 train_loss: 2.4883 train_time: 5.3m tok/s: 7370289 +2978/20000 train_loss: 2.5807 train_time: 5.3m tok/s: 7369515 +2979/20000 train_loss: 2.4757 train_time: 5.3m tok/s: 7368738 +2980/20000 train_loss: 2.5893 train_time: 5.3m tok/s: 7367954 +2981/20000 train_loss: 2.3945 train_time: 5.3m tok/s: 7367170 +2982/20000 train_loss: 2.6209 train_time: 5.3m tok/s: 7366419 +2983/20000 train_loss: 2.4142 train_time: 5.3m tok/s: 7365649 +2984/20000 train_loss: 2.4483 train_time: 5.3m tok/s: 7364879 +2985/20000 train_loss: 2.4481 train_time: 5.3m tok/s: 7364107 +2986/20000 train_loss: 2.5753 train_time: 5.3m tok/s: 7363347 +2987/20000 train_loss: 2.4943 train_time: 5.3m tok/s: 7362518 +2988/20000 train_loss: 2.5556 train_time: 5.3m tok/s: 7361774 +2989/20000 train_loss: 2.4368 train_time: 5.3m tok/s: 7361001 +2990/20000 train_loss: 2.7006 train_time: 5.3m tok/s: 7360249 +2991/20000 train_loss: 2.4850 train_time: 5.3m tok/s: 7359474 +2992/20000 train_loss: 2.5980 train_time: 5.3m tok/s: 7358720 +2993/20000 train_loss: 2.6530 train_time: 5.3m tok/s: 7357932 +2994/20000 train_loss: 2.4379 train_time: 5.3m tok/s: 7357171 +2995/20000 train_loss: 2.6419 train_time: 5.3m tok/s: 7356411 +2996/20000 train_loss: 2.4306 train_time: 5.3m tok/s: 7355607 +2997/20000 train_loss: 2.5860 train_time: 5.3m tok/s: 7354846 +2998/20000 train_loss: 2.4835 train_time: 5.3m tok/s: 7354104 +2999/20000 train_loss: 2.4688 train_time: 5.3m tok/s: 7353316 +3000/20000 train_loss: 2.4910 train_time: 5.3m tok/s: 7352563 +3001/20000 train_loss: 2.5481 train_time: 5.4m tok/s: 7351792 +3002/20000 train_loss: 2.5073 train_time: 5.4m tok/s: 7351044 +3003/20000 train_loss: 2.5382 train_time: 5.4m tok/s: 7350319 +3004/20000 train_loss: 2.4663 train_time: 5.4m tok/s: 7349602 +3005/20000 train_loss: 2.5445 train_time: 5.4m tok/s: 7348826 +3006/20000 train_loss: 2.7572 train_time: 5.4m tok/s: 7348059 +3007/20000 train_loss: 2.5085 train_time: 5.4m tok/s: 7347315 +3008/20000 train_loss: 2.4961 train_time: 5.4m tok/s: 7346578 +3009/20000 train_loss: 2.4721 train_time: 5.4m tok/s: 7345851 +3010/20000 train_loss: 2.4112 train_time: 5.4m tok/s: 7345053 +3011/20000 train_loss: 2.5774 train_time: 5.4m tok/s: 7344268 +3012/20000 train_loss: 2.5540 train_time: 5.4m tok/s: 7343542 +3013/20000 train_loss: 2.5201 train_time: 5.4m tok/s: 7342813 +3014/20000 train_loss: 2.5624 train_time: 5.4m tok/s: 7342104 +3015/20000 train_loss: 2.6600 train_time: 5.4m tok/s: 7341363 +3016/20000 train_loss: 2.6838 train_time: 5.4m tok/s: 7340591 +3017/20000 train_loss: 2.4709 train_time: 5.4m tok/s: 7339870 +3018/20000 train_loss: 2.6229 train_time: 5.4m tok/s: 7339126 +3019/20000 train_loss: 2.4574 train_time: 5.4m tok/s: 7338430 +3020/20000 train_loss: 3.1341 train_time: 5.4m tok/s: 7337576 +3021/20000 train_loss: 2.4725 train_time: 5.4m tok/s: 7336810 +3022/20000 train_loss: 2.4524 train_time: 5.4m tok/s: 7336083 +3023/20000 train_loss: 2.5751 train_time: 5.4m tok/s: 7335332 +3024/20000 train_loss: 3.4321 train_time: 5.4m tok/s: 7334504 +3025/20000 train_loss: 2.4420 train_time: 5.4m tok/s: 7333726 +3026/20000 train_loss: 2.5321 train_time: 5.4m tok/s: 7332912 +3027/20000 train_loss: 2.5684 train_time: 5.4m tok/s: 7332191 +3028/20000 train_loss: 2.6509 train_time: 5.4m tok/s: 7331419 +3029/20000 train_loss: 2.7250 train_time: 5.4m tok/s: 7330696 +3030/20000 train_loss: 2.5691 train_time: 5.4m tok/s: 7329974 +3031/20000 train_loss: 2.5147 train_time: 5.4m tok/s: 7329258 +3032/20000 train_loss: 2.5441 train_time: 5.4m tok/s: 7328519 +3033/20000 train_loss: 2.5689 train_time: 5.4m tok/s: 7327776 +3034/20000 train_loss: 2.4821 train_time: 5.4m tok/s: 7327054 +3035/20000 train_loss: 2.3035 train_time: 5.4m tok/s: 7326333 +3036/20000 train_loss: 2.5735 train_time: 5.4m tok/s: 7325615 +3037/20000 train_loss: 2.4892 train_time: 5.4m tok/s: 7324893 +3038/20000 train_loss: 2.5277 train_time: 5.4m tok/s: 7324190 +3039/20000 train_loss: 2.5216 train_time: 5.4m tok/s: 7323467 +3040/20000 train_loss: 2.4157 train_time: 5.4m tok/s: 7322724 +3041/20000 train_loss: 2.6344 train_time: 5.4m tok/s: 7322025 +3042/20000 train_loss: 2.5782 train_time: 5.4m tok/s: 7321264 +3043/20000 train_loss: 2.6451 train_time: 5.4m tok/s: 7320505 +3044/20000 train_loss: 2.5921 train_time: 5.5m tok/s: 7319825 +3045/20000 train_loss: 2.5755 train_time: 5.5m tok/s: 7319140 +3046/20000 train_loss: 2.5585 train_time: 5.5m tok/s: 7318439 +3047/20000 train_loss: 2.3909 train_time: 5.5m tok/s: 7317709 +3048/20000 train_loss: 2.3599 train_time: 5.5m tok/s: 7316998 +3049/20000 train_loss: 2.5108 train_time: 5.5m tok/s: 7316264 +3050/20000 train_loss: 2.5964 train_time: 5.5m tok/s: 7315567 +3051/20000 train_loss: 2.3615 train_time: 5.5m tok/s: 7314879 +3052/20000 train_loss: 2.4196 train_time: 5.5m tok/s: 7314188 +3053/20000 train_loss: 2.6077 train_time: 5.5m tok/s: 7313421 +3054/20000 train_loss: 2.4234 train_time: 5.5m tok/s: 7312701 +3055/20000 train_loss: 2.5414 train_time: 5.5m tok/s: 7312006 +3056/20000 train_loss: 2.5463 train_time: 5.5m tok/s: 7311296 +3057/20000 train_loss: 2.5077 train_time: 5.5m tok/s: 7310552 +3058/20000 train_loss: 2.5489 train_time: 5.5m tok/s: 7309851 +3059/20000 train_loss: 2.4642 train_time: 5.5m tok/s: 7309139 +3060/20000 train_loss: 2.5681 train_time: 5.5m tok/s: 7308443 +3061/20000 train_loss: 2.4408 train_time: 5.5m tok/s: 7307704 +3062/20000 train_loss: 2.5951 train_time: 5.5m tok/s: 7306979 +3063/20000 train_loss: 2.5443 train_time: 5.5m tok/s: 7306269 +3064/20000 train_loss: 2.4400 train_time: 5.5m tok/s: 7305527 +3065/20000 train_loss: 2.4089 train_time: 5.5m tok/s: 7304831 +3066/20000 train_loss: 2.6754 train_time: 5.5m tok/s: 7304083 +3067/20000 train_loss: 2.4550 train_time: 5.5m tok/s: 7303389 +3068/20000 train_loss: 2.5991 train_time: 5.5m tok/s: 7302653 +3069/20000 train_loss: 2.3979 train_time: 5.5m tok/s: 7301963 +3070/20000 train_loss: 2.5563 train_time: 5.5m tok/s: 7301228 +3071/20000 train_loss: 2.5080 train_time: 5.5m tok/s: 7300511 +3072/20000 train_loss: 2.5430 train_time: 5.5m tok/s: 7299817 +3073/20000 train_loss: 2.6080 train_time: 5.5m tok/s: 7299120 +3074/20000 train_loss: 2.4285 train_time: 5.5m tok/s: 7298413 +3075/20000 train_loss: 2.4121 train_time: 5.5m tok/s: 7297678 +3076/20000 train_loss: 2.4750 train_time: 5.5m tok/s: 7296962 +3077/20000 train_loss: 2.5299 train_time: 5.5m tok/s: 7296299 +3078/20000 train_loss: 2.3929 train_time: 5.5m tok/s: 7295579 +3079/20000 train_loss: 2.4371 train_time: 5.5m tok/s: 7294869 +3080/20000 train_loss: 3.2236 train_time: 5.5m tok/s: 7294127 +3081/20000 train_loss: 2.3849 train_time: 5.5m tok/s: 7293397 +3082/20000 train_loss: 2.4461 train_time: 5.5m tok/s: 7292693 +3083/20000 train_loss: 2.4673 train_time: 5.5m tok/s: 7292014 +3084/20000 train_loss: 2.4699 train_time: 5.5m tok/s: 7291309 +3085/20000 train_loss: 2.5320 train_time: 5.5m tok/s: 7290616 +3086/20000 train_loss: 2.5731 train_time: 5.5m tok/s: 7289920 +3087/20000 train_loss: 2.6157 train_time: 5.6m tok/s: 7289231 +3088/20000 train_loss: 2.5752 train_time: 5.6m tok/s: 7288538 +3089/20000 train_loss: 2.4487 train_time: 5.6m tok/s: 7287860 +3090/20000 train_loss: 2.6988 train_time: 5.6m tok/s: 7287142 +3091/20000 train_loss: 2.4483 train_time: 5.6m tok/s: 7286433 +3092/20000 train_loss: 2.4843 train_time: 5.6m tok/s: 7285720 +3093/20000 train_loss: 2.5263 train_time: 5.6m tok/s: 7285012 +3094/20000 train_loss: 2.4780 train_time: 5.6m tok/s: 7284305 +3095/20000 train_loss: 2.3340 train_time: 5.6m tok/s: 7283599 +3096/20000 train_loss: 2.5115 train_time: 5.6m tok/s: 7282904 +3097/20000 train_loss: 2.5503 train_time: 5.6m tok/s: 7282202 +3098/20000 train_loss: 2.4662 train_time: 5.6m tok/s: 7281499 +3099/20000 train_loss: 2.3239 train_time: 5.6m tok/s: 7280809 +3100/20000 train_loss: 2.4148 train_time: 5.6m tok/s: 7280113 +3101/20000 train_loss: 2.6671 train_time: 5.6m tok/s: 7279421 +3102/20000 train_loss: 2.6549 train_time: 5.6m tok/s: 7278724 +3103/20000 train_loss: 2.4874 train_time: 5.6m tok/s: 7278065 +3104/20000 train_loss: 2.5857 train_time: 5.6m tok/s: 7277382 +3105/20000 train_loss: 2.3868 train_time: 5.6m tok/s: 7276684 +3106/20000 train_loss: 2.5920 train_time: 5.6m tok/s: 7275999 +3107/20000 train_loss: 2.3398 train_time: 5.6m tok/s: 7275282 +3108/20000 train_loss: 2.4710 train_time: 5.6m tok/s: 7274591 +3109/20000 train_loss: 2.5622 train_time: 5.6m tok/s: 7273933 +3110/20000 train_loss: 2.4228 train_time: 5.6m tok/s: 7273249 +3111/20000 train_loss: 2.4309 train_time: 5.6m tok/s: 7272552 +3112/20000 train_loss: 2.3898 train_time: 5.6m tok/s: 7271859 +3113/20000 train_loss: 2.5276 train_time: 5.6m tok/s: 7271173 +3114/20000 train_loss: 2.5402 train_time: 5.6m tok/s: 7270471 +3115/20000 train_loss: 2.5592 train_time: 5.6m tok/s: 7269823 +3116/20000 train_loss: 2.5762 train_time: 5.6m tok/s: 7269104 +3117/20000 train_loss: 2.6043 train_time: 5.6m tok/s: 7268408 +3118/20000 train_loss: 2.5363 train_time: 5.6m tok/s: 7267721 +3119/20000 train_loss: 2.5745 train_time: 5.6m tok/s: 7267037 +3120/20000 train_loss: 2.5553 train_time: 5.6m tok/s: 7266384 +3121/20000 train_loss: 2.5326 train_time: 5.6m tok/s: 7265675 +3122/20000 train_loss: 2.5508 train_time: 5.6m tok/s: 7264972 +3123/20000 train_loss: 2.4311 train_time: 5.6m tok/s: 7264299 +3124/20000 train_loss: 2.5347 train_time: 5.6m tok/s: 7263610 +3125/20000 train_loss: 2.4327 train_time: 5.6m tok/s: 7262916 +3126/20000 train_loss: 2.1315 train_time: 5.6m tok/s: 7262216 +3127/20000 train_loss: 2.6029 train_time: 5.6m tok/s: 7261566 +3128/20000 train_loss: 2.5056 train_time: 5.6m tok/s: 7260858 +3129/20000 train_loss: 2.5332 train_time: 5.6m tok/s: 7260171 +3130/20000 train_loss: 2.4766 train_time: 5.7m tok/s: 7259491 +3131/20000 train_loss: 2.4446 train_time: 5.7m tok/s: 7258832 +3132/20000 train_loss: 2.5666 train_time: 5.7m tok/s: 7258143 +3133/20000 train_loss: 2.5172 train_time: 5.7m tok/s: 7257458 +3134/20000 train_loss: 2.5504 train_time: 5.7m tok/s: 7256808 +3135/20000 train_loss: 2.5078 train_time: 5.7m tok/s: 7256145 +3136/20000 train_loss: 2.5677 train_time: 5.7m tok/s: 7255469 +3137/20000 train_loss: 2.6063 train_time: 5.7m tok/s: 7254789 +3138/20000 train_loss: 2.4454 train_time: 5.7m tok/s: 7254099 +3139/20000 train_loss: 2.4518 train_time: 5.7m tok/s: 7253430 +3140/20000 train_loss: 2.4270 train_time: 5.7m tok/s: 7252754 +3141/20000 train_loss: 2.5722 train_time: 5.7m tok/s: 7252087 +3142/20000 train_loss: 2.5458 train_time: 5.7m tok/s: 7251406 +3143/20000 train_loss: 2.5484 train_time: 5.7m tok/s: 7250757 +3144/20000 train_loss: 2.5466 train_time: 5.7m tok/s: 7250080 +3145/20000 train_loss: 2.0733 train_time: 5.7m tok/s: 7249352 +3146/20000 train_loss: 2.4543 train_time: 5.7m tok/s: 7248698 +3147/20000 train_loss: 2.5500 train_time: 5.7m tok/s: 7248043 +3148/20000 train_loss: 2.4830 train_time: 5.7m tok/s: 7247369 +3149/20000 train_loss: 2.5957 train_time: 5.7m tok/s: 7246683 +3150/20000 train_loss: 2.3853 train_time: 5.7m tok/s: 7246026 +3151/20000 train_loss: 2.4066 train_time: 5.7m tok/s: 7245364 +3152/20000 train_loss: 2.4724 train_time: 5.7m tok/s: 7244713 +3153/20000 train_loss: 2.5220 train_time: 5.7m tok/s: 7244042 +3154/20000 train_loss: 2.3784 train_time: 5.7m tok/s: 7243392 +3155/20000 train_loss: 2.4782 train_time: 5.7m tok/s: 7242730 +3156/20000 train_loss: 2.5891 train_time: 5.7m tok/s: 7242051 +3157/20000 train_loss: 2.5949 train_time: 5.7m tok/s: 7241441 +3158/20000 train_loss: 2.6398 train_time: 5.7m tok/s: 7240731 +3159/20000 train_loss: 2.5245 train_time: 5.7m tok/s: 7240080 +3160/20000 train_loss: 2.3664 train_time: 5.7m tok/s: 7239385 +3161/20000 train_loss: 2.5357 train_time: 5.7m tok/s: 7238741 +3162/20000 train_loss: 2.6225 train_time: 5.7m tok/s: 7238103 +3163/20000 train_loss: 2.5484 train_time: 5.7m tok/s: 7237464 +3164/20000 train_loss: 2.3755 train_time: 5.7m tok/s: 7236774 +3165/20000 train_loss: 2.5073 train_time: 5.7m tok/s: 7236141 +3166/20000 train_loss: 2.4204 train_time: 5.7m tok/s: 7235487 +3167/20000 train_loss: 2.5266 train_time: 5.7m tok/s: 7234852 +3168/20000 train_loss: 2.5155 train_time: 5.7m tok/s: 7234204 +3169/20000 train_loss: 2.4900 train_time: 5.7m tok/s: 7233552 +3170/20000 train_loss: 2.6200 train_time: 5.7m tok/s: 7232809 +3171/20000 train_loss: 2.6528 train_time: 5.7m tok/s: 7232179 +3172/20000 train_loss: 2.6575 train_time: 5.7m tok/s: 7231540 +3173/20000 train_loss: 2.7155 train_time: 5.8m tok/s: 7230866 +3174/20000 train_loss: 2.4698 train_time: 5.8m tok/s: 7230210 +3175/20000 train_loss: 2.6154 train_time: 5.8m tok/s: 7229564 +3176/20000 train_loss: 2.4464 train_time: 5.8m tok/s: 7228911 +3177/20000 train_loss: 2.4533 train_time: 5.8m tok/s: 7228272 +3178/20000 train_loss: 2.5972 train_time: 5.8m tok/s: 7227560 +3179/20000 train_loss: 2.4327 train_time: 5.8m tok/s: 7226835 +3180/20000 train_loss: 2.5261 train_time: 5.8m tok/s: 7226177 +3181/20000 train_loss: 2.3762 train_time: 5.8m tok/s: 7225529 +3182/20000 train_loss: 2.8114 train_time: 5.8m tok/s: 7224864 +3183/20000 train_loss: 2.6582 train_time: 5.8m tok/s: 7224215 +3184/20000 train_loss: 2.5071 train_time: 5.8m tok/s: 7223519 +3185/20000 train_loss: 2.5786 train_time: 5.8m tok/s: 7222899 +3186/20000 train_loss: 2.5297 train_time: 5.8m tok/s: 7222250 +3187/20000 train_loss: 2.5748 train_time: 5.8m tok/s: 7221593 +3188/20000 train_loss: 2.3938 train_time: 5.8m tok/s: 7220947 +3189/20000 train_loss: 2.5958 train_time: 5.8m tok/s: 7220319 +3190/20000 train_loss: 2.5750 train_time: 5.8m tok/s: 7219707 +3191/20000 train_loss: 2.4806 train_time: 5.8m tok/s: 7219044 +3192/20000 train_loss: 2.3769 train_time: 5.8m tok/s: 7218417 +3193/20000 train_loss: 2.4639 train_time: 5.8m tok/s: 7217822 +3194/20000 train_loss: 2.4482 train_time: 5.8m tok/s: 7217211 +3195/20000 train_loss: 2.5032 train_time: 5.8m tok/s: 7216595 +3196/20000 train_loss: 2.3916 train_time: 5.8m tok/s: 7215944 +3197/20000 train_loss: 2.4719 train_time: 5.8m tok/s: 7215305 +3198/20000 train_loss: 2.4935 train_time: 5.8m tok/s: 7214696 +3199/20000 train_loss: 2.5080 train_time: 5.8m tok/s: 7214055 +3200/20000 train_loss: 2.6319 train_time: 5.8m tok/s: 7213394 +3201/20000 train_loss: 2.5733 train_time: 5.8m tok/s: 7212775 +3202/20000 train_loss: 2.2688 train_time: 5.8m tok/s: 7212077 +3203/20000 train_loss: 2.5471 train_time: 5.8m tok/s: 7211414 +3204/20000 train_loss: 2.4490 train_time: 5.8m tok/s: 7210790 +3205/20000 train_loss: 2.5025 train_time: 5.8m tok/s: 7210142 +3206/20000 train_loss: 2.4460 train_time: 5.8m tok/s: 7209533 +3207/20000 train_loss: 2.5566 train_time: 5.8m tok/s: 7208896 +3208/20000 train_loss: 2.5849 train_time: 5.8m tok/s: 7208286 +3209/20000 train_loss: 2.4561 train_time: 5.8m tok/s: 7207655 +3210/20000 train_loss: 2.7169 train_time: 5.8m tok/s: 7207002 +3211/20000 train_loss: 2.5140 train_time: 5.8m tok/s: 7206399 +3212/20000 train_loss: 2.4080 train_time: 5.8m tok/s: 7205737 +3213/20000 train_loss: 2.5585 train_time: 5.8m tok/s: 7205100 +3214/20000 train_loss: 2.4162 train_time: 5.8m tok/s: 7204425 +3215/20000 train_loss: 2.4349 train_time: 5.8m tok/s: 7203791 +3216/20000 train_loss: 2.3248 train_time: 5.9m tok/s: 7203150 +3217/20000 train_loss: 2.4588 train_time: 5.9m tok/s: 7202517 +3218/20000 train_loss: 2.5018 train_time: 5.9m tok/s: 7201918 +3219/20000 train_loss: 2.5663 train_time: 5.9m tok/s: 7201334 +3220/20000 train_loss: 2.6261 train_time: 5.9m tok/s: 7200700 +3221/20000 train_loss: 2.4277 train_time: 5.9m tok/s: 7200076 +3222/20000 train_loss: 2.5270 train_time: 5.9m tok/s: 7199458 +3223/20000 train_loss: 2.6328 train_time: 5.9m tok/s: 7198812 +3224/20000 train_loss: 2.9696 train_time: 5.9m tok/s: 7198147 +3225/20000 train_loss: 2.5827 train_time: 5.9m tok/s: 7197498 +3226/20000 train_loss: 2.4072 train_time: 5.9m tok/s: 7196885 +3227/20000 train_loss: 2.4828 train_time: 5.9m tok/s: 7196257 +3228/20000 train_loss: 2.8144 train_time: 5.9m tok/s: 7195649 +3229/20000 train_loss: 2.4080 train_time: 5.9m tok/s: 7195048 +3230/20000 train_loss: 2.4469 train_time: 5.9m tok/s: 7194427 +3231/20000 train_loss: 2.5547 train_time: 5.9m tok/s: 7193829 +3232/20000 train_loss: 2.5359 train_time: 5.9m tok/s: 7193211 +3233/20000 train_loss: 2.5243 train_time: 5.9m tok/s: 7192577 +3234/20000 train_loss: 2.5957 train_time: 5.9m tok/s: 7191933 +3235/20000 train_loss: 2.5274 train_time: 5.9m tok/s: 7191276 +3236/20000 train_loss: 2.2806 train_time: 5.9m tok/s: 7190655 +3237/20000 train_loss: 2.4702 train_time: 5.9m tok/s: 7190067 +3238/20000 train_loss: 2.3593 train_time: 5.9m tok/s: 7189441 +3239/20000 train_loss: 2.3982 train_time: 5.9m tok/s: 7188791 +3240/20000 train_loss: 2.4979 train_time: 5.9m tok/s: 7188214 +3241/20000 train_loss: 2.4965 train_time: 5.9m tok/s: 7187577 +3242/20000 train_loss: 2.5329 train_time: 5.9m tok/s: 7186969 +3243/20000 train_loss: 2.4478 train_time: 5.9m tok/s: 7186342 +3244/20000 train_loss: 2.5607 train_time: 5.9m tok/s: 7185723 +3245/20000 train_loss: 2.6326 train_time: 5.9m tok/s: 7185112 +3246/20000 train_loss: 2.5152 train_time: 5.9m tok/s: 7184498 +3247/20000 train_loss: 2.4113 train_time: 5.9m tok/s: 7183872 +3248/20000 train_loss: 2.4674 train_time: 5.9m tok/s: 7183254 +3249/20000 train_loss: 2.6019 train_time: 5.9m tok/s: 7182647 +3250/20000 train_loss: 2.5531 train_time: 5.9m tok/s: 7182019 +3251/20000 train_loss: 2.5894 train_time: 5.9m tok/s: 7181373 +3252/20000 train_loss: 2.4510 train_time: 5.9m tok/s: 7180737 +3253/20000 train_loss: 2.4436 train_time: 5.9m tok/s: 7180136 +3254/20000 train_loss: 2.9140 train_time: 5.9m tok/s: 7179480 +3255/20000 train_loss: 2.4613 train_time: 5.9m tok/s: 7178865 +3256/20000 train_loss: 2.4998 train_time: 5.9m tok/s: 7178272 +3257/20000 train_loss: 2.5791 train_time: 5.9m tok/s: 7177656 +3258/20000 train_loss: 2.5393 train_time: 5.9m tok/s: 7177069 +3259/20000 train_loss: 2.4912 train_time: 6.0m tok/s: 7176462 +3260/20000 train_loss: 2.4839 train_time: 6.0m tok/s: 7175872 +3261/20000 train_loss: 2.5252 train_time: 6.0m tok/s: 7175268 +3262/20000 train_loss: 2.4041 train_time: 6.0m tok/s: 7174646 +3263/20000 train_loss: 2.4686 train_time: 6.0m tok/s: 7174030 +3264/20000 train_loss: 2.5176 train_time: 6.0m tok/s: 7173418 +3265/20000 train_loss: 2.5348 train_time: 6.0m tok/s: 7172844 +3266/20000 train_loss: 2.5435 train_time: 6.0m tok/s: 7172221 +3267/20000 train_loss: 2.4903 train_time: 6.0m tok/s: 7171637 +3268/20000 train_loss: 2.5135 train_time: 6.0m tok/s: 7171029 +3269/20000 train_loss: 2.6222 train_time: 6.0m tok/s: 7170416 +3270/20000 train_loss: 2.4711 train_time: 6.0m tok/s: 7169808 +3271/20000 train_loss: 2.5545 train_time: 6.0m tok/s: 7169164 +3272/20000 train_loss: 2.5210 train_time: 6.0m tok/s: 7168571 +3273/20000 train_loss: 2.6738 train_time: 6.0m tok/s: 7167973 +3274/20000 train_loss: 2.4765 train_time: 6.0m tok/s: 7167375 +3275/20000 train_loss: 2.5261 train_time: 6.0m tok/s: 7166769 +3276/20000 train_loss: 2.5031 train_time: 6.0m tok/s: 7166192 +3277/20000 train_loss: 2.5093 train_time: 6.0m tok/s: 7165513 +3278/20000 train_loss: 2.4019 train_time: 6.0m tok/s: 7164922 +3279/20000 train_loss: 2.4698 train_time: 6.0m tok/s: 7164344 +3280/20000 train_loss: 2.3864 train_time: 6.0m tok/s: 7163743 +3281/20000 train_loss: 2.4274 train_time: 6.0m tok/s: 7163090 +3282/20000 train_loss: 2.4951 train_time: 6.0m tok/s: 7162458 +3283/20000 train_loss: 2.4298 train_time: 6.0m tok/s: 7161869 +3284/20000 train_loss: 2.6399 train_time: 6.0m tok/s: 7161301 +3285/20000 train_loss: 2.4914 train_time: 6.0m tok/s: 7160687 +3286/20000 train_loss: 2.4456 train_time: 6.0m tok/s: 7160108 +3287/20000 train_loss: 2.5139 train_time: 6.0m tok/s: 7159540 +3288/20000 train_loss: 2.5090 train_time: 6.0m tok/s: 7158957 +3289/20000 train_loss: 2.4853 train_time: 6.0m tok/s: 7158381 +3290/20000 train_loss: 2.4588 train_time: 6.0m tok/s: 7157789 +3291/20000 train_loss: 2.4476 train_time: 6.0m tok/s: 7157177 +3292/20000 train_loss: 2.4681 train_time: 6.0m tok/s: 7156593 +3293/20000 train_loss: 2.4596 train_time: 6.0m tok/s: 7155978 +3294/20000 train_loss: 2.4311 train_time: 6.0m tok/s: 7155383 +3295/20000 train_loss: 2.5229 train_time: 6.0m tok/s: 7154786 +3296/20000 train_loss: 2.3949 train_time: 6.0m tok/s: 7154173 +3297/20000 train_loss: 2.4086 train_time: 6.0m tok/s: 7153584 +3298/20000 train_loss: 2.4851 train_time: 6.0m tok/s: 7152979 +3299/20000 train_loss: 2.3040 train_time: 6.0m tok/s: 7152326 +3300/20000 train_loss: 2.4643 train_time: 6.0m tok/s: 7151739 +3301/20000 train_loss: 2.4312 train_time: 6.1m tok/s: 7151164 +3302/20000 train_loss: 2.2267 train_time: 6.1m tok/s: 7150502 +3303/20000 train_loss: 2.6561 train_time: 6.1m tok/s: 7149912 +3304/20000 train_loss: 2.4991 train_time: 6.1m tok/s: 7149353 +3305/20000 train_loss: 2.5643 train_time: 6.1m tok/s: 7148786 +3306/20000 train_loss: 2.6168 train_time: 6.1m tok/s: 7148218 +3307/20000 train_loss: 2.5239 train_time: 6.1m tok/s: 7147612 +3308/20000 train_loss: 2.2484 train_time: 6.1m tok/s: 7147013 +3309/20000 train_loss: 2.4725 train_time: 6.1m tok/s: 7146428 +3310/20000 train_loss: 2.5375 train_time: 6.1m tok/s: 7145887 +3311/20000 train_loss: 2.6626 train_time: 6.1m tok/s: 7145264 +3312/20000 train_loss: 2.5294 train_time: 6.1m tok/s: 7144690 +3313/20000 train_loss: 2.3781 train_time: 6.1m tok/s: 7144096 +3314/20000 train_loss: 2.4964 train_time: 6.1m tok/s: 7143535 +3315/20000 train_loss: 2.5214 train_time: 6.1m tok/s: 7142971 +3316/20000 train_loss: 2.5076 train_time: 6.1m tok/s: 7142407 +3317/20000 train_loss: 2.4246 train_time: 6.1m tok/s: 7141786 +3318/20000 train_loss: 2.3986 train_time: 6.1m tok/s: 7141234 +3319/20000 train_loss: 2.4014 train_time: 6.1m tok/s: 7140639 +3320/20000 train_loss: 2.4225 train_time: 6.1m tok/s: 7140038 +3321/20000 train_loss: 2.4471 train_time: 6.1m tok/s: 7139463 +3322/20000 train_loss: 2.4012 train_time: 6.1m tok/s: 7138914 +3323/20000 train_loss: 2.4101 train_time: 6.1m tok/s: 7138310 +3324/20000 train_loss: 2.4538 train_time: 6.1m tok/s: 7137720 +3325/20000 train_loss: 2.5599 train_time: 6.1m tok/s: 7137165 +3326/20000 train_loss: 2.5093 train_time: 6.1m tok/s: 7136611 +3327/20000 train_loss: 2.4268 train_time: 6.1m tok/s: 7136043 +3328/20000 train_loss: 2.5720 train_time: 6.1m tok/s: 7135404 +3329/20000 train_loss: 2.4641 train_time: 6.1m tok/s: 7134849 +3330/20000 train_loss: 2.8003 train_time: 6.1m tok/s: 7134277 +3331/20000 train_loss: 2.6203 train_time: 6.1m tok/s: 7133688 +3332/20000 train_loss: 2.5303 train_time: 6.1m tok/s: 7133104 +3333/20000 train_loss: 2.3874 train_time: 6.1m tok/s: 7132560 +3334/20000 train_loss: 2.4707 train_time: 6.1m tok/s: 7131967 +3335/20000 train_loss: 2.5692 train_time: 6.1m tok/s: 7131360 +3336/20000 train_loss: 2.5179 train_time: 6.1m tok/s: 7130803 +3337/20000 train_loss: 2.2973 train_time: 6.1m tok/s: 7130255 +3338/20000 train_loss: 2.4692 train_time: 6.1m tok/s: 7129694 +3339/20000 train_loss: 2.4135 train_time: 6.1m tok/s: 7129108 +3340/20000 train_loss: 2.4346 train_time: 6.1m tok/s: 7128524 +3341/20000 train_loss: 2.4493 train_time: 6.1m tok/s: 7127943 +3342/20000 train_loss: 2.3855 train_time: 6.1m tok/s: 7127351 +3343/20000 train_loss: 2.3888 train_time: 6.1m tok/s: 7126775 +3344/20000 train_loss: 2.5460 train_time: 6.2m tok/s: 7126219 +3345/20000 train_loss: 2.4660 train_time: 6.2m tok/s: 7125671 +3346/20000 train_loss: 2.4869 train_time: 6.2m tok/s: 7125064 +3347/20000 train_loss: 2.5283 train_time: 6.2m tok/s: 7124500 +3348/20000 train_loss: 2.5826 train_time: 6.2m tok/s: 7123922 +3349/20000 train_loss: 2.4954 train_time: 6.2m tok/s: 7123354 +3350/20000 train_loss: 2.4553 train_time: 6.2m tok/s: 7122792 +3351/20000 train_loss: 2.4838 train_time: 6.2m tok/s: 7122250 +3352/20000 train_loss: 2.4949 train_time: 6.2m tok/s: 7121651 +3353/20000 train_loss: 2.4361 train_time: 6.2m tok/s: 7121051 +3354/20000 train_loss: 2.5244 train_time: 6.2m tok/s: 7120513 +3355/20000 train_loss: 2.5670 train_time: 6.2m tok/s: 7119936 +3356/20000 train_loss: 2.3559 train_time: 6.2m tok/s: 7119301 +3357/20000 train_loss: 2.3591 train_time: 6.2m tok/s: 7118738 +3358/20000 train_loss: 2.5033 train_time: 6.2m tok/s: 7118142 +3359/20000 train_loss: 2.4424 train_time: 6.2m tok/s: 7117567 +3360/20000 train_loss: 2.4578 train_time: 6.2m tok/s: 7117023 +3361/20000 train_loss: 2.5192 train_time: 6.2m tok/s: 7116488 +3362/20000 train_loss: 2.4917 train_time: 6.2m tok/s: 7115937 +3363/20000 train_loss: 2.4232 train_time: 6.2m tok/s: 7115355 +3364/20000 train_loss: 2.5068 train_time: 6.2m tok/s: 7114776 +3365/20000 train_loss: 2.5160 train_time: 6.2m tok/s: 7114230 +3366/20000 train_loss: 2.3836 train_time: 6.2m tok/s: 7113659 +3367/20000 train_loss: 2.4243 train_time: 6.2m tok/s: 7113102 +3368/20000 train_loss: 2.3775 train_time: 6.2m tok/s: 7112560 +3369/20000 train_loss: 2.6125 train_time: 6.2m tok/s: 7111950 +3370/20000 train_loss: 2.5861 train_time: 6.2m tok/s: 7111382 +3371/20000 train_loss: 2.5039 train_time: 6.2m tok/s: 7110813 +3372/20000 train_loss: 2.5780 train_time: 6.2m tok/s: 7110286 +3373/20000 train_loss: 2.4672 train_time: 6.2m tok/s: 7109729 +3374/20000 train_loss: 2.4684 train_time: 6.2m tok/s: 7109198 +3375/20000 train_loss: 2.4781 train_time: 6.2m tok/s: 7108621 +3376/20000 train_loss: 2.4213 train_time: 6.2m tok/s: 7108063 +3377/20000 train_loss: 2.5551 train_time: 6.2m tok/s: 7107535 +3378/20000 train_loss: 2.3389 train_time: 6.2m tok/s: 7106949 +3379/20000 train_loss: 2.4134 train_time: 6.2m tok/s: 7106416 +3380/20000 train_loss: 2.3614 train_time: 6.2m tok/s: 7105823 +3381/20000 train_loss: 2.3320 train_time: 6.2m tok/s: 7105251 +3382/20000 train_loss: 2.5785 train_time: 6.2m tok/s: 7104731 +3383/20000 train_loss: 2.4970 train_time: 6.2m tok/s: 7104203 +3384/20000 train_loss: 2.4833 train_time: 6.2m tok/s: 7103617 +3385/20000 train_loss: 2.4777 train_time: 6.2m tok/s: 7103088 +3386/20000 train_loss: 2.4953 train_time: 6.2m tok/s: 7102556 +3387/20000 train_loss: 2.5231 train_time: 6.3m tok/s: 7101997 +3388/20000 train_loss: 2.2916 train_time: 6.3m tok/s: 7101352 +3389/20000 train_loss: 2.4252 train_time: 6.3m tok/s: 7100831 +3390/20000 train_loss: 2.5006 train_time: 6.3m tok/s: 7100304 +3391/20000 train_loss: 2.4990 train_time: 6.3m tok/s: 7099739 +3392/20000 train_loss: 2.4614 train_time: 6.3m tok/s: 7099229 +3393/20000 train_loss: 2.4516 train_time: 6.3m tok/s: 7098700 +3394/20000 train_loss: 2.4593 train_time: 6.3m tok/s: 7098162 +3395/20000 train_loss: 2.4620 train_time: 6.3m tok/s: 7097598 +3396/20000 train_loss: 2.6108 train_time: 6.3m tok/s: 7097043 +3397/20000 train_loss: 2.5258 train_time: 6.3m tok/s: 7096456 +3398/20000 train_loss: 2.3687 train_time: 6.3m tok/s: 7095907 +3399/20000 train_loss: 2.4296 train_time: 6.3m tok/s: 7095333 +3400/20000 train_loss: 2.5297 train_time: 6.3m tok/s: 7094799 +3401/20000 train_loss: 2.4085 train_time: 6.3m tok/s: 7094255 +3402/20000 train_loss: 2.4478 train_time: 6.3m tok/s: 7093724 +3403/20000 train_loss: 2.4860 train_time: 6.3m tok/s: 7093185 +3404/20000 train_loss: 2.4490 train_time: 6.3m tok/s: 7092659 +3405/20000 train_loss: 2.6275 train_time: 6.3m tok/s: 7092111 +3406/20000 train_loss: 2.4747 train_time: 6.3m tok/s: 7091571 +3407/20000 train_loss: 2.5055 train_time: 6.3m tok/s: 7091019 +3408/20000 train_loss: 2.5730 train_time: 6.3m tok/s: 7090461 +3409/20000 train_loss: 2.4158 train_time: 6.3m tok/s: 7089914 +3410/20000 train_loss: 2.4194 train_time: 6.3m tok/s: 7089351 +3411/20000 train_loss: 2.3473 train_time: 6.3m tok/s: 7088811 +3412/20000 train_loss: 2.3464 train_time: 6.3m tok/s: 7088264 +3413/20000 train_loss: 2.3379 train_time: 6.3m tok/s: 7087713 +3414/20000 train_loss: 2.4425 train_time: 6.3m tok/s: 7087184 +3415/20000 train_loss: 2.5941 train_time: 6.3m tok/s: 7086640 +3416/20000 train_loss: 2.5030 train_time: 6.3m tok/s: 7086091 +3417/20000 train_loss: 2.5701 train_time: 6.3m tok/s: 7085559 +3418/20000 train_loss: 2.5053 train_time: 6.3m tok/s: 7085045 +3419/20000 train_loss: 2.5307 train_time: 6.3m tok/s: 7084484 +3420/20000 train_loss: 2.5385 train_time: 6.3m tok/s: 7083920 +3421/20000 train_loss: 2.3727 train_time: 6.3m tok/s: 7083383 +3422/20000 train_loss: 2.6885 train_time: 6.3m tok/s: 7082811 +3423/20000 train_loss: 2.4024 train_time: 6.3m tok/s: 7082286 +3424/20000 train_loss: 2.4562 train_time: 6.3m tok/s: 7081774 +3425/20000 train_loss: 2.4284 train_time: 6.3m tok/s: 7081225 +3426/20000 train_loss: 2.5062 train_time: 6.3m tok/s: 7080716 +3427/20000 train_loss: 2.4589 train_time: 6.3m tok/s: 7080181 +3428/20000 train_loss: 2.4567 train_time: 6.3m tok/s: 7079633 +3429/20000 train_loss: 2.5972 train_time: 6.3m tok/s: 7079082 +3430/20000 train_loss: 2.4168 train_time: 6.4m tok/s: 7078544 +3431/20000 train_loss: 2.4671 train_time: 6.4m tok/s: 7078022 +3432/20000 train_loss: 2.5937 train_time: 6.4m tok/s: 7077475 +3433/20000 train_loss: 2.4587 train_time: 6.4m tok/s: 7076954 +3434/20000 train_loss: 2.5114 train_time: 6.4m tok/s: 7076424 +3435/20000 train_loss: 2.4381 train_time: 6.4m tok/s: 7075842 +3436/20000 train_loss: 2.4113 train_time: 6.4m tok/s: 7075291 +3437/20000 train_loss: 2.5533 train_time: 6.4m tok/s: 7074782 +3438/20000 train_loss: 2.4879 train_time: 6.4m tok/s: 7074260 +3439/20000 train_loss: 2.3297 train_time: 6.4m tok/s: 7073717 +3440/20000 train_loss: 2.4784 train_time: 6.4m tok/s: 7073175 +3441/20000 train_loss: 2.3797 train_time: 6.4m tok/s: 7072663 +3442/20000 train_loss: 2.4074 train_time: 6.4m tok/s: 7072127 +3443/20000 train_loss: 2.6583 train_time: 6.4m tok/s: 7071540 +3444/20000 train_loss: 2.3317 train_time: 6.4m tok/s: 7071019 +3445/20000 train_loss: 2.4636 train_time: 6.4m tok/s: 7070546 +3446/20000 train_loss: 2.5295 train_time: 6.4m tok/s: 7070023 +3447/20000 train_loss: 2.4870 train_time: 6.4m tok/s: 7069495 +3448/20000 train_loss: 2.6063 train_time: 6.4m tok/s: 7068987 +3449/20000 train_loss: 2.4730 train_time: 6.4m tok/s: 7068444 +3450/20000 train_loss: 2.4293 train_time: 6.4m tok/s: 7067907 +3451/20000 train_loss: 2.4922 train_time: 6.4m tok/s: 7067378 +3452/20000 train_loss: 2.5205 train_time: 6.4m tok/s: 7066868 +3453/20000 train_loss: 2.4038 train_time: 6.4m tok/s: 7066318 +3454/20000 train_loss: 2.5209 train_time: 6.4m tok/s: 7065796 +3455/20000 train_loss: 2.4832 train_time: 6.4m tok/s: 7065274 +3456/20000 train_loss: 2.4816 train_time: 6.4m tok/s: 7064727 +3457/20000 train_loss: 2.4435 train_time: 6.4m tok/s: 7064176 +3458/20000 train_loss: 2.4001 train_time: 6.4m tok/s: 7063654 +3459/20000 train_loss: 2.3661 train_time: 6.4m tok/s: 7063121 +3460/20000 train_loss: 2.5216 train_time: 6.4m tok/s: 7062618 +3461/20000 train_loss: 2.6264 train_time: 6.4m tok/s: 7062092 +3462/20000 train_loss: 2.5513 train_time: 6.4m tok/s: 7061593 +3463/20000 train_loss: 2.5733 train_time: 6.4m tok/s: 7061064 +3464/20000 train_loss: 2.4782 train_time: 6.4m tok/s: 7060538 +3465/20000 train_loss: 2.5323 train_time: 6.4m tok/s: 7059999 +3466/20000 train_loss: 2.5888 train_time: 6.4m tok/s: 7059468 +3467/20000 train_loss: 2.4455 train_time: 6.4m tok/s: 7058953 +3468/20000 train_loss: 2.6316 train_time: 6.4m tok/s: 7058396 +3469/20000 train_loss: 2.3935 train_time: 6.4m tok/s: 7057856 +3470/20000 train_loss: 2.4013 train_time: 6.4m tok/s: 7057329 +3471/20000 train_loss: 2.3913 train_time: 6.4m tok/s: 7056805 +3472/20000 train_loss: 2.4370 train_time: 6.4m tok/s: 7056276 +3473/20000 train_loss: 2.3988 train_time: 6.5m tok/s: 7055747 +3474/20000 train_loss: 2.4715 train_time: 6.5m tok/s: 7055260 +3475/20000 train_loss: 2.5200 train_time: 6.5m tok/s: 7054755 +3476/20000 train_loss: 2.4559 train_time: 6.5m tok/s: 7054250 +3477/20000 train_loss: 2.5328 train_time: 6.5m tok/s: 7053734 +3478/20000 train_loss: 2.5566 train_time: 6.5m tok/s: 7053195 +3479/20000 train_loss: 2.4568 train_time: 6.5m tok/s: 7052700 +3480/20000 train_loss: 2.5240 train_time: 6.5m tok/s: 7052200 +3481/20000 train_loss: 2.4842 train_time: 6.5m tok/s: 7051673 +3482/20000 train_loss: 2.4284 train_time: 6.5m tok/s: 7051168 +3483/20000 train_loss: 2.4176 train_time: 6.5m tok/s: 7050637 +3484/20000 train_loss: 2.4657 train_time: 6.5m tok/s: 7050086 +3485/20000 train_loss: 2.3998 train_time: 6.5m tok/s: 7049549 +3486/20000 train_loss: 2.3827 train_time: 6.5m tok/s: 7049057 +3487/20000 train_loss: 2.4728 train_time: 6.5m tok/s: 7048542 +3488/20000 train_loss: 2.3670 train_time: 6.5m tok/s: 7048057 +3489/20000 train_loss: 2.3847 train_time: 6.5m tok/s: 7047537 +3490/20000 train_loss: 2.4831 train_time: 6.5m tok/s: 7047030 +3491/20000 train_loss: 2.5851 train_time: 6.5m tok/s: 7046516 +3492/20000 train_loss: 2.4249 train_time: 6.5m tok/s: 7045986 +3493/20000 train_loss: 2.5518 train_time: 6.5m tok/s: 7045491 +3494/20000 train_loss: 2.5255 train_time: 6.5m tok/s: 7044975 +3495/20000 train_loss: 2.4897 train_time: 6.5m tok/s: 7044496 +3496/20000 train_loss: 2.5902 train_time: 6.5m tok/s: 7043916 +3497/20000 train_loss: 2.6833 train_time: 6.5m tok/s: 7043438 +3498/20000 train_loss: 2.3501 train_time: 6.5m tok/s: 7042879 +3499/20000 train_loss: 2.4567 train_time: 6.5m tok/s: 7042375 +3500/20000 train_loss: 2.3682 train_time: 6.5m tok/s: 7041903 +3501/20000 train_loss: 2.4153 train_time: 6.5m tok/s: 7041388 +3502/20000 train_loss: 3.2599 train_time: 6.5m tok/s: 7040836 +3503/20000 train_loss: 2.4594 train_time: 6.5m tok/s: 7040330 +3504/20000 train_loss: 2.5537 train_time: 6.5m tok/s: 7039846 +3505/20000 train_loss: 2.5006 train_time: 6.5m tok/s: 7039289 +3506/20000 train_loss: 2.5611 train_time: 6.5m tok/s: 7038771 +3507/20000 train_loss: 2.5285 train_time: 6.5m tok/s: 7038304 +3508/20000 train_loss: 2.6195 train_time: 6.5m tok/s: 7037787 +3509/20000 train_loss: 2.4367 train_time: 6.5m tok/s: 7037262 +3510/20000 train_loss: 2.4319 train_time: 6.5m tok/s: 7036765 +3511/20000 train_loss: 2.4160 train_time: 6.5m tok/s: 7036246 +3512/20000 train_loss: 2.4662 train_time: 6.5m tok/s: 7035743 +3513/20000 train_loss: 2.4145 train_time: 6.5m tok/s: 7035254 +3514/20000 train_loss: 2.5389 train_time: 6.5m tok/s: 7034763 +3515/20000 train_loss: 2.3609 train_time: 6.5m tok/s: 7034264 +3516/20000 train_loss: 2.2683 train_time: 6.6m tok/s: 7033734 +3517/20000 train_loss: 2.4475 train_time: 6.6m tok/s: 7033238 +3518/20000 train_loss: 2.4162 train_time: 6.6m tok/s: 7032767 +3519/20000 train_loss: 2.8153 train_time: 6.6m tok/s: 7032237 +3520/20000 train_loss: 2.6448 train_time: 6.6m tok/s: 7031710 +3521/20000 train_loss: 2.4962 train_time: 6.6m tok/s: 7031201 +3522/20000 train_loss: 2.5119 train_time: 6.6m tok/s: 7030709 +3523/20000 train_loss: 2.3845 train_time: 6.6m tok/s: 7030199 +3524/20000 train_loss: 2.7599 train_time: 6.6m tok/s: 7029682 +3525/20000 train_loss: 2.5223 train_time: 6.6m tok/s: 7029211 +3526/20000 train_loss: 2.4859 train_time: 6.6m tok/s: 7028701 +3527/20000 train_loss: 2.4165 train_time: 6.6m tok/s: 7028202 +3528/20000 train_loss: 2.4586 train_time: 6.6m tok/s: 7027720 +3529/20000 train_loss: 2.4526 train_time: 6.6m tok/s: 7027232 +3530/20000 train_loss: 2.6725 train_time: 6.6m tok/s: 7026726 +3531/20000 train_loss: 2.2993 train_time: 6.6m tok/s: 7026221 +3532/20000 train_loss: 2.2768 train_time: 6.6m tok/s: 7025697 +3533/20000 train_loss: 2.4383 train_time: 6.6m tok/s: 7025002 +3534/20000 train_loss: 2.4678 train_time: 6.6m tok/s: 7024536 +3535/20000 train_loss: 2.4056 train_time: 6.6m tok/s: 7024014 +3536/20000 train_loss: 2.4690 train_time: 6.6m tok/s: 7023391 +3537/20000 train_loss: 2.6030 train_time: 6.6m tok/s: 7022864 +3538/20000 train_loss: 2.4436 train_time: 6.6m tok/s: 7022272 +3539/20000 train_loss: 2.4769 train_time: 6.6m tok/s: 7021738 +3540/20000 train_loss: 2.4926 train_time: 6.6m tok/s: 7021164 +3541/20000 train_loss: 2.2931 train_time: 6.6m tok/s: 7020608 +3542/20000 train_loss: 2.4074 train_time: 6.6m tok/s: 7020027 +3543/20000 train_loss: 2.5149 train_time: 6.6m tok/s: 7019562 +3544/20000 train_loss: 2.4237 train_time: 6.6m tok/s: 7018873 +3545/20000 train_loss: 2.4581 train_time: 6.6m tok/s: 7018412 +3546/20000 train_loss: 2.3633 train_time: 6.6m tok/s: 7017700 +3547/20000 train_loss: 2.3366 train_time: 6.6m tok/s: 7017244 +3548/20000 train_loss: 2.5778 train_time: 6.6m tok/s: 7016578 +3549/20000 train_loss: 2.5466 train_time: 6.6m tok/s: 7016138 +3550/20000 train_loss: 2.5509 train_time: 6.6m tok/s: 7015662 +3551/20000 train_loss: 2.5730 train_time: 6.6m tok/s: 7015190 +3552/20000 train_loss: 2.4716 train_time: 6.6m tok/s: 7014720 +3553/20000 train_loss: 2.5847 train_time: 6.6m tok/s: 7014227 +3554/20000 train_loss: 2.5000 train_time: 6.6m tok/s: 7013750 +3555/20000 train_loss: 2.5148 train_time: 6.6m tok/s: 7013256 +3556/20000 train_loss: 2.5248 train_time: 6.6m tok/s: 7012807 +3557/20000 train_loss: 2.4403 train_time: 6.6m tok/s: 7012328 +3558/20000 train_loss: 2.6190 train_time: 6.7m tok/s: 7011850 +3559/20000 train_loss: 2.4920 train_time: 6.7m tok/s: 7011356 +3560/20000 train_loss: 2.4479 train_time: 6.7m tok/s: 7010888 +3561/20000 train_loss: 3.1562 train_time: 6.7m tok/s: 7010368 +3562/20000 train_loss: 2.3816 train_time: 6.7m tok/s: 7009896 +3563/20000 train_loss: 2.4772 train_time: 6.7m tok/s: 7009422 +3564/20000 train_loss: 2.4788 train_time: 6.7m tok/s: 7008941 +3565/20000 train_loss: 2.4716 train_time: 6.7m tok/s: 7008459 +3566/20000 train_loss: 2.4677 train_time: 6.7m tok/s: 7007963 +3567/20000 train_loss: 2.5263 train_time: 6.7m tok/s: 7007477 +3568/20000 train_loss: 2.5329 train_time: 6.7m tok/s: 7006972 +3569/20000 train_loss: 2.3260 train_time: 6.7m tok/s: 7006464 +3570/20000 train_loss: 2.2992 train_time: 6.7m tok/s: 7005985 +3571/20000 train_loss: 2.3864 train_time: 6.7m tok/s: 7005505 +3572/20000 train_loss: 2.3967 train_time: 6.7m tok/s: 7005031 +3573/20000 train_loss: 2.2703 train_time: 6.7m tok/s: 7004518 +3574/20000 train_loss: 2.4184 train_time: 6.7m tok/s: 7004011 +3575/20000 train_loss: 2.5051 train_time: 6.7m tok/s: 7003535 +3576/20000 train_loss: 2.5069 train_time: 6.7m tok/s: 7003076 +3577/20000 train_loss: 2.4878 train_time: 6.7m tok/s: 7002593 +3578/20000 train_loss: 2.5614 train_time: 6.7m tok/s: 7002136 +3579/20000 train_loss: 2.5191 train_time: 6.7m tok/s: 7001654 +3580/20000 train_loss: 2.5254 train_time: 6.7m tok/s: 7001178 +3581/20000 train_loss: 2.5076 train_time: 6.7m tok/s: 7000690 +3582/20000 train_loss: 2.3763 train_time: 6.7m tok/s: 7000217 +3583/20000 train_loss: 2.4122 train_time: 6.7m tok/s: 6999715 +3584/20000 train_loss: 2.3555 train_time: 6.7m tok/s: 6999209 +3585/20000 train_loss: 2.3406 train_time: 6.7m tok/s: 6998711 +3586/20000 train_loss: 2.1784 train_time: 6.7m tok/s: 6998187 +3587/20000 train_loss: 2.3308 train_time: 6.7m tok/s: 6997721 +3588/20000 train_loss: 2.3832 train_time: 6.7m tok/s: 6997259 +3589/20000 train_loss: 2.6153 train_time: 6.7m tok/s: 6996764 +3590/20000 train_loss: 2.5114 train_time: 6.7m tok/s: 6996291 +3591/20000 train_loss: 2.5289 train_time: 6.7m tok/s: 6995814 +3592/20000 train_loss: 2.5170 train_time: 6.7m tok/s: 6995363 +3593/20000 train_loss: 2.5708 train_time: 6.7m tok/s: 6994895 +3594/20000 train_loss: 2.4897 train_time: 6.7m tok/s: 6994433 +3595/20000 train_loss: 2.5366 train_time: 6.7m tok/s: 6993964 +3596/20000 train_loss: 2.5132 train_time: 6.7m tok/s: 6993506 +3597/20000 train_loss: 2.5473 train_time: 6.7m tok/s: 6993024 +3598/20000 train_loss: 2.2728 train_time: 6.7m tok/s: 6992533 +3599/20000 train_loss: 2.4737 train_time: 6.7m tok/s: 6992074 +3600/20000 train_loss: 2.5595 train_time: 6.7m tok/s: 6991623 +3601/20000 train_loss: 2.4287 train_time: 6.8m tok/s: 6991145 +3602/20000 train_loss: 2.3174 train_time: 6.8m tok/s: 6990648 +3603/20000 train_loss: 2.5491 train_time: 6.8m tok/s: 6990192 +3604/20000 train_loss: 2.5779 train_time: 6.8m tok/s: 6989732 +3605/20000 train_loss: 2.4442 train_time: 6.8m tok/s: 6989263 +3606/20000 train_loss: 2.4501 train_time: 6.8m tok/s: 6988799 +3607/20000 train_loss: 2.5761 train_time: 6.8m tok/s: 6988315 +3608/20000 train_loss: 2.4193 train_time: 6.8m tok/s: 6987846 +3609/20000 train_loss: 2.4301 train_time: 6.8m tok/s: 6987365 +3610/20000 train_loss: 2.5454 train_time: 6.8m tok/s: 6986886 +3611/20000 train_loss: 2.5322 train_time: 6.8m tok/s: 6986417 +3612/20000 train_loss: 2.4075 train_time: 6.8m tok/s: 6985955 +3613/20000 train_loss: 2.4461 train_time: 6.8m tok/s: 6985501 +3614/20000 train_loss: 2.5438 train_time: 6.8m tok/s: 6985030 +3615/20000 train_loss: 2.3721 train_time: 6.8m tok/s: 6984570 +3616/20000 train_loss: 2.4741 train_time: 6.8m tok/s: 6984084 +3617/20000 train_loss: 2.4307 train_time: 6.8m tok/s: 6983586 +3618/20000 train_loss: 2.3723 train_time: 6.8m tok/s: 6983121 +3619/20000 train_loss: 2.6418 train_time: 6.8m tok/s: 6982652 +3620/20000 train_loss: 2.3826 train_time: 6.8m tok/s: 6982213 +3621/20000 train_loss: 2.5042 train_time: 6.8m tok/s: 6981724 +3622/20000 train_loss: 2.5257 train_time: 6.8m tok/s: 6981222 +3623/20000 train_loss: 2.5347 train_time: 6.8m tok/s: 6980726 +3624/20000 train_loss: 2.5901 train_time: 6.8m tok/s: 6980260 +3625/20000 train_loss: 2.4405 train_time: 6.8m tok/s: 6979807 +3626/20000 train_loss: 2.4035 train_time: 6.8m tok/s: 6979365 +3627/20000 train_loss: 2.3802 train_time: 6.8m tok/s: 6978926 +3628/20000 train_loss: 2.3959 train_time: 6.8m tok/s: 6978443 +3629/20000 train_loss: 2.5293 train_time: 6.8m tok/s: 6977970 +3630/20000 train_loss: 2.5424 train_time: 6.8m tok/s: 6977517 +3631/20000 train_loss: 2.5378 train_time: 6.8m tok/s: 6977061 +3632/20000 train_loss: 2.5817 train_time: 6.8m tok/s: 6976582 +3633/20000 train_loss: 2.3959 train_time: 6.8m tok/s: 6976115 +3634/20000 train_loss: 2.5141 train_time: 6.8m tok/s: 6975672 +3635/20000 train_loss: 2.4993 train_time: 6.8m tok/s: 6975215 +3636/20000 train_loss: 2.4864 train_time: 6.8m tok/s: 6974735 +3637/20000 train_loss: 2.4451 train_time: 6.8m tok/s: 6974255 +3638/20000 train_loss: 2.4271 train_time: 6.8m tok/s: 6973815 +3639/20000 train_loss: 2.4557 train_time: 6.8m tok/s: 6973341 +3640/20000 train_loss: 2.3966 train_time: 6.8m tok/s: 6972872 +3641/20000 train_loss: 2.3847 train_time: 6.8m tok/s: 6972415 +3642/20000 train_loss: 2.2956 train_time: 6.8m tok/s: 6971934 +3643/20000 train_loss: 2.3865 train_time: 6.8m tok/s: 6971495 +3644/20000 train_loss: 2.1038 train_time: 6.9m tok/s: 6971007 +3645/20000 train_loss: 2.5196 train_time: 6.9m tok/s: 6970553 +3646/20000 train_loss: 2.5336 train_time: 6.9m tok/s: 6970130 +3647/20000 train_loss: 2.5356 train_time: 6.9m tok/s: 6969670 +3648/20000 train_loss: 2.4820 train_time: 6.9m tok/s: 6969193 +3649/20000 train_loss: 2.4504 train_time: 6.9m tok/s: 6968763 +3650/20000 train_loss: 2.6565 train_time: 6.9m tok/s: 6968270 +3651/20000 train_loss: 2.5580 train_time: 6.9m tok/s: 6967809 +3652/20000 train_loss: 2.4339 train_time: 6.9m tok/s: 6967354 +3653/20000 train_loss: 2.4448 train_time: 6.9m tok/s: 6966920 +3654/20000 train_loss: 2.3828 train_time: 6.9m tok/s: 6966464 +3655/20000 train_loss: 2.4094 train_time: 6.9m tok/s: 6965980 +3656/20000 train_loss: 2.3837 train_time: 6.9m tok/s: 6965469 +3657/20000 train_loss: 2.4286 train_time: 6.9m tok/s: 6965030 +3658/20000 train_loss: 2.5608 train_time: 6.9m tok/s: 6964598 +3659/20000 train_loss: 2.4329 train_time: 6.9m tok/s: 6964143 +3660/20000 train_loss: 2.3413 train_time: 6.9m tok/s: 6963697 +3661/20000 train_loss: 2.4467 train_time: 6.9m tok/s: 6963233 +3662/20000 train_loss: 2.4647 train_time: 6.9m tok/s: 6962762 +3663/20000 train_loss: 2.0448 train_time: 6.9m tok/s: 6962290 +3664/20000 train_loss: 2.6475 train_time: 6.9m tok/s: 6961834 +3665/20000 train_loss: 2.4911 train_time: 6.9m tok/s: 6961399 +3666/20000 train_loss: 2.4420 train_time: 6.9m tok/s: 6960947 +3667/20000 train_loss: 2.3211 train_time: 6.9m tok/s: 6960485 +3668/20000 train_loss: 2.2289 train_time: 6.9m tok/s: 6960006 +3669/20000 train_loss: 2.3880 train_time: 6.9m tok/s: 6959508 +3670/20000 train_loss: 2.4242 train_time: 6.9m tok/s: 6959054 +3671/20000 train_loss: 2.4320 train_time: 6.9m tok/s: 6958629 +3672/20000 train_loss: 2.5434 train_time: 6.9m tok/s: 6958178 +3673/20000 train_loss: 2.4553 train_time: 6.9m tok/s: 6957764 +3674/20000 train_loss: 2.5423 train_time: 6.9m tok/s: 6957327 +3675/20000 train_loss: 2.5777 train_time: 6.9m tok/s: 6956882 +3676/20000 train_loss: 2.4721 train_time: 6.9m tok/s: 6956421 +3677/20000 train_loss: 2.5358 train_time: 6.9m tok/s: 6955983 +3678/20000 train_loss: 2.4734 train_time: 6.9m tok/s: 6955523 +3679/20000 train_loss: 2.5738 train_time: 6.9m tok/s: 6955042 +3680/20000 train_loss: 2.4719 train_time: 6.9m tok/s: 6954615 +3681/20000 train_loss: 2.5368 train_time: 6.9m tok/s: 6954206 +3682/20000 train_loss: 2.3521 train_time: 6.9m tok/s: 6953735 +3683/20000 train_loss: 2.5740 train_time: 6.9m tok/s: 6953283 +3684/20000 train_loss: 2.4052 train_time: 6.9m tok/s: 6952837 +3685/20000 train_loss: 2.6505 train_time: 6.9m tok/s: 6952367 +3686/20000 train_loss: 2.4956 train_time: 6.9m tok/s: 6951912 +3687/20000 train_loss: 2.5683 train_time: 7.0m tok/s: 6951481 +3688/20000 train_loss: 2.4755 train_time: 7.0m tok/s: 6951036 +3689/20000 train_loss: 2.5449 train_time: 7.0m tok/s: 6950587 +3690/20000 train_loss: 2.6014 train_time: 7.0m tok/s: 6950138 +3691/20000 train_loss: 2.5556 train_time: 7.0m tok/s: 6949724 +3692/20000 train_loss: 2.4947 train_time: 7.0m tok/s: 6949256 +3693/20000 train_loss: 2.5070 train_time: 7.0m tok/s: 6948824 +3694/20000 train_loss: 2.5052 train_time: 7.0m tok/s: 6948369 +3695/20000 train_loss: 2.4278 train_time: 7.0m tok/s: 6947937 +3696/20000 train_loss: 2.4108 train_time: 7.0m tok/s: 6947501 +3697/20000 train_loss: 2.4250 train_time: 7.0m tok/s: 6947031 +3698/20000 train_loss: 2.4746 train_time: 7.0m tok/s: 6946600 +3699/20000 train_loss: 2.4925 train_time: 7.0m tok/s: 6946163 +3700/20000 train_loss: 2.6317 train_time: 7.0m tok/s: 6945694 +3701/20000 train_loss: 2.5312 train_time: 7.0m tok/s: 6945272 +3702/20000 train_loss: 2.6084 train_time: 7.0m tok/s: 6944822 +3703/20000 train_loss: 2.5411 train_time: 7.0m tok/s: 6944349 +3704/20000 train_loss: 2.8528 train_time: 7.0m tok/s: 6943885 +3705/20000 train_loss: 2.4411 train_time: 7.0m tok/s: 6943463 +3706/20000 train_loss: 2.4623 train_time: 7.0m tok/s: 6943026 +3707/20000 train_loss: 2.4565 train_time: 7.0m tok/s: 6942586 +3708/20000 train_loss: 2.4034 train_time: 7.0m tok/s: 6942150 +3709/20000 train_loss: 2.5407 train_time: 7.0m tok/s: 6941726 +3710/20000 train_loss: 2.4037 train_time: 7.0m tok/s: 6941273 +3711/20000 train_loss: 2.5435 train_time: 7.0m tok/s: 6940828 +3712/20000 train_loss: 2.4546 train_time: 7.0m tok/s: 6940413 +3713/20000 train_loss: 2.4640 train_time: 7.0m tok/s: 6939979 +3714/20000 train_loss: 2.4312 train_time: 7.0m tok/s: 6939539 +3715/20000 train_loss: 2.4224 train_time: 7.0m tok/s: 6939092 +3716/20000 train_loss: 2.4183 train_time: 7.0m tok/s: 6938640 +3717/20000 train_loss: 2.3532 train_time: 7.0m tok/s: 6938178 +3718/20000 train_loss: 2.5454 train_time: 7.0m tok/s: 6937756 +3719/20000 train_loss: 2.4358 train_time: 7.0m tok/s: 6937321 +3720/20000 train_loss: 2.5067 train_time: 7.0m tok/s: 6936866 +3721/20000 train_loss: 2.5575 train_time: 7.0m tok/s: 6936426 +3722/20000 train_loss: 2.4460 train_time: 7.0m tok/s: 6935978 +3723/20000 train_loss: 2.6126 train_time: 7.0m tok/s: 6935572 +3724/20000 train_loss: 2.4237 train_time: 7.0m tok/s: 6935103 +3725/20000 train_loss: 2.4707 train_time: 7.0m tok/s: 6934697 +3726/20000 train_loss: 2.4343 train_time: 7.0m tok/s: 6934263 +3727/20000 train_loss: 2.4277 train_time: 7.0m tok/s: 6933816 +3728/20000 train_loss: 2.4447 train_time: 7.0m tok/s: 6933393 +3729/20000 train_loss: 2.5118 train_time: 7.0m tok/s: 6932972 +3730/20000 train_loss: 2.4277 train_time: 7.1m tok/s: 6932526 +3731/20000 train_loss: 2.5125 train_time: 7.1m tok/s: 6932085 +3732/20000 train_loss: 2.4335 train_time: 7.1m tok/s: 6931654 +3733/20000 train_loss: 2.5386 train_time: 7.1m tok/s: 6931227 +3734/20000 train_loss: 2.4519 train_time: 7.1m tok/s: 6930791 +3735/20000 train_loss: 2.5249 train_time: 7.1m tok/s: 6930341 +3736/20000 train_loss: 2.4268 train_time: 7.1m tok/s: 6929900 +3737/20000 train_loss: 2.4459 train_time: 7.1m tok/s: 6929460 +3738/20000 train_loss: 2.3910 train_time: 7.1m tok/s: 6929016 +3739/20000 train_loss: 2.4599 train_time: 7.1m tok/s: 6928602 +3740/20000 train_loss: 2.5322 train_time: 7.1m tok/s: 6928181 +3741/20000 train_loss: 2.3289 train_time: 7.1m tok/s: 6927708 +3742/20000 train_loss: 2.4609 train_time: 7.1m tok/s: 6927256 +3743/20000 train_loss: 2.4095 train_time: 7.1m tok/s: 6926818 +3744/20000 train_loss: 2.4370 train_time: 7.1m tok/s: 6926400 +3745/20000 train_loss: 2.5130 train_time: 7.1m tok/s: 6925975 +3746/20000 train_loss: 2.3718 train_time: 7.1m tok/s: 6925546 +3747/20000 train_loss: 2.4353 train_time: 7.1m tok/s: 6925147 +3748/20000 train_loss: 2.5527 train_time: 7.1m tok/s: 6924716 +3749/20000 train_loss: 2.5432 train_time: 7.1m tok/s: 6924279 +3750/20000 train_loss: 2.4807 train_time: 7.1m tok/s: 6923833 +3751/20000 train_loss: 2.5811 train_time: 7.1m tok/s: 6923431 +3752/20000 train_loss: 2.4286 train_time: 7.1m tok/s: 6923021 +3753/20000 train_loss: 2.3575 train_time: 7.1m tok/s: 6922571 +3754/20000 train_loss: 2.4453 train_time: 7.1m tok/s: 6922138 +3755/20000 train_loss: 2.3945 train_time: 7.1m tok/s: 6921715 +3756/20000 train_loss: 2.4815 train_time: 7.1m tok/s: 6921303 +3757/20000 train_loss: 2.3703 train_time: 7.1m tok/s: 6920858 +3758/20000 train_loss: 2.7261 train_time: 7.1m tok/s: 6920420 +3759/20000 train_loss: 2.6071 train_time: 7.1m tok/s: 6919991 +3760/20000 train_loss: 2.5432 train_time: 7.1m tok/s: 6919573 +3761/20000 train_loss: 2.6229 train_time: 7.1m tok/s: 6919135 +3762/20000 train_loss: 2.4664 train_time: 7.1m tok/s: 6918701 +3763/20000 train_loss: 2.4939 train_time: 7.1m tok/s: 6918312 +3764/20000 train_loss: 2.4505 train_time: 7.1m tok/s: 6917888 +3765/20000 train_loss: 2.4309 train_time: 7.1m tok/s: 6917463 +3766/20000 train_loss: 2.4184 train_time: 7.1m tok/s: 6917065 +3767/20000 train_loss: 2.4756 train_time: 7.1m tok/s: 6916637 +3768/20000 train_loss: 2.4762 train_time: 7.1m tok/s: 6916194 +3769/20000 train_loss: 2.4118 train_time: 7.1m tok/s: 6915781 +3770/20000 train_loss: 2.5599 train_time: 7.1m tok/s: 6915340 +3771/20000 train_loss: 2.4483 train_time: 7.1m tok/s: 6914919 +3772/20000 train_loss: 2.5140 train_time: 7.2m tok/s: 6914503 +3773/20000 train_loss: 2.4858 train_time: 7.2m tok/s: 6914068 +3774/20000 train_loss: 2.3876 train_time: 7.2m tok/s: 6913661 +3775/20000 train_loss: 2.6272 train_time: 7.2m tok/s: 6913254 +3776/20000 train_loss: 2.4997 train_time: 7.2m tok/s: 6912809 +3777/20000 train_loss: 2.4013 train_time: 7.2m tok/s: 6912362 +3778/20000 train_loss: 2.4642 train_time: 7.2m tok/s: 6911943 +3779/20000 train_loss: 2.4587 train_time: 7.2m tok/s: 6911531 +3780/20000 train_loss: 2.3815 train_time: 7.2m tok/s: 6911118 +3781/20000 train_loss: 2.4527 train_time: 7.2m tok/s: 6910676 +3782/20000 train_loss: 2.3815 train_time: 7.2m tok/s: 6910226 +3783/20000 train_loss: 2.4565 train_time: 7.2m tok/s: 6909810 +3784/20000 train_loss: 2.4634 train_time: 7.2m tok/s: 6909428 +3785/20000 train_loss: 2.4813 train_time: 7.2m tok/s: 6909031 +3786/20000 train_loss: 2.4997 train_time: 7.2m tok/s: 6908601 +3787/20000 train_loss: 2.5092 train_time: 7.2m tok/s: 6908188 +3788/20000 train_loss: 2.4226 train_time: 7.2m tok/s: 6907788 +3789/20000 train_loss: 2.3868 train_time: 7.2m tok/s: 6907377 +3790/20000 train_loss: 2.4455 train_time: 7.2m tok/s: 6906959 +3791/20000 train_loss: 2.3436 train_time: 7.2m tok/s: 6906525 +3792/20000 train_loss: 2.4596 train_time: 7.2m tok/s: 6906084 +3793/20000 train_loss: 2.3867 train_time: 7.2m tok/s: 6905640 +3794/20000 train_loss: 2.4807 train_time: 7.2m tok/s: 6905212 +3795/20000 train_loss: 2.4708 train_time: 7.2m tok/s: 6904825 +3796/20000 train_loss: 2.5393 train_time: 7.2m tok/s: 6904403 +3797/20000 train_loss: 2.5797 train_time: 7.2m tok/s: 6903998 +3798/20000 train_loss: 2.6871 train_time: 7.2m tok/s: 6903571 +3799/20000 train_loss: 2.4849 train_time: 7.2m tok/s: 6903143 +3800/20000 train_loss: 2.4904 train_time: 7.2m tok/s: 6902706 +3801/20000 train_loss: 2.2729 train_time: 7.2m tok/s: 6902262 +3802/20000 train_loss: 2.4865 train_time: 7.2m tok/s: 6901857 +3803/20000 train_loss: 2.3967 train_time: 7.2m tok/s: 6901460 +3804/20000 train_loss: 2.4698 train_time: 7.2m tok/s: 6901083 +3805/20000 train_loss: 2.3914 train_time: 7.2m tok/s: 6900687 +3806/20000 train_loss: 2.4776 train_time: 7.2m tok/s: 6900275 +3807/20000 train_loss: 2.4707 train_time: 7.2m tok/s: 6899850 +3808/20000 train_loss: 2.6560 train_time: 7.2m tok/s: 6899432 +3809/20000 train_loss: 2.5676 train_time: 7.2m tok/s: 6899024 +3810/20000 train_loss: 2.4296 train_time: 7.2m tok/s: 6898590 +3811/20000 train_loss: 2.4407 train_time: 7.2m tok/s: 6898197 +3812/20000 train_loss: 2.4401 train_time: 7.2m tok/s: 6897781 +3813/20000 train_loss: 2.4568 train_time: 7.2m tok/s: 6897380 +3814/20000 train_loss: 4.3519 train_time: 7.2m tok/s: 6896911 +3815/20000 train_loss: 2.4409 train_time: 7.3m tok/s: 6896499 +3816/20000 train_loss: 2.5511 train_time: 7.3m tok/s: 6896053 +3817/20000 train_loss: 2.4939 train_time: 7.3m tok/s: 6895654 +3818/20000 train_loss: 2.5150 train_time: 7.3m tok/s: 6895265 +3819/20000 train_loss: 2.3863 train_time: 7.3m tok/s: 6894859 +3820/20000 train_loss: 2.5493 train_time: 7.3m tok/s: 6894441 +3821/20000 train_loss: 2.4242 train_time: 7.3m tok/s: 6894047 +3822/20000 train_loss: 2.4591 train_time: 7.3m tok/s: 6893643 +3823/20000 train_loss: 2.4813 train_time: 7.3m tok/s: 6893238 +3824/20000 train_loss: 2.5113 train_time: 7.3m tok/s: 6892840 +3825/20000 train_loss: 2.4801 train_time: 7.3m tok/s: 6892452 +3826/20000 train_loss: 2.5593 train_time: 7.3m tok/s: 6892029 +3827/20000 train_loss: 2.5688 train_time: 7.3m tok/s: 6891620 +3828/20000 train_loss: 2.5518 train_time: 7.3m tok/s: 6891203 +3829/20000 train_loss: 2.5110 train_time: 7.3m tok/s: 6890793 +3830/20000 train_loss: 2.5487 train_time: 7.3m tok/s: 6890390 +3831/20000 train_loss: 2.5090 train_time: 7.3m tok/s: 6889994 +3832/20000 train_loss: 2.4559 train_time: 7.3m tok/s: 6889581 +3833/20000 train_loss: 2.4874 train_time: 7.3m tok/s: 6889170 +3834/20000 train_loss: 2.4170 train_time: 7.3m tok/s: 6888768 +3835/20000 train_loss: 2.4582 train_time: 7.3m tok/s: 6888360 +3836/20000 train_loss: 2.4245 train_time: 7.3m tok/s: 6887945 +3837/20000 train_loss: 2.4431 train_time: 7.3m tok/s: 6887536 +3838/20000 train_loss: 2.4533 train_time: 7.3m tok/s: 6887131 +3839/20000 train_loss: 2.4347 train_time: 7.3m tok/s: 6886728 +3840/20000 train_loss: 2.3931 train_time: 7.3m tok/s: 6886321 +3841/20000 train_loss: 2.5566 train_time: 7.3m tok/s: 6885923 +3842/20000 train_loss: 2.4215 train_time: 7.3m tok/s: 6885509 +3843/20000 train_loss: 2.4472 train_time: 7.3m tok/s: 6885100 +3844/20000 train_loss: 2.3906 train_time: 7.3m tok/s: 6884721 +3845/20000 train_loss: 2.5047 train_time: 7.3m tok/s: 6884321 +3846/20000 train_loss: 2.6002 train_time: 7.3m tok/s: 6883916 +3847/20000 train_loss: 2.4141 train_time: 7.3m tok/s: 6883508 +3848/20000 train_loss: 2.4408 train_time: 7.3m tok/s: 6883104 +3849/20000 train_loss: 2.5696 train_time: 7.3m tok/s: 6882693 +3850/20000 train_loss: 2.5470 train_time: 7.3m tok/s: 6882273 +3851/20000 train_loss: 2.4608 train_time: 7.3m tok/s: 6881876 +3852/20000 train_loss: 2.4417 train_time: 7.3m tok/s: 6881472 +3853/20000 train_loss: 2.3922 train_time: 7.3m tok/s: 6881073 +3854/20000 train_loss: 2.2862 train_time: 7.3m tok/s: 6880679 +3855/20000 train_loss: 2.4901 train_time: 7.3m tok/s: 6880303 +3856/20000 train_loss: 2.4145 train_time: 7.3m tok/s: 6879891 +3857/20000 train_loss: 2.2076 train_time: 7.3m tok/s: 6879441 +3858/20000 train_loss: 2.4866 train_time: 7.4m tok/s: 6879030 +3859/20000 train_loss: 2.3978 train_time: 7.4m tok/s: 6878643 +3860/20000 train_loss: 2.3328 train_time: 7.4m tok/s: 6878264 +3861/20000 train_loss: 2.4730 train_time: 7.4m tok/s: 6877881 +3862/20000 train_loss: 2.5986 train_time: 7.4m tok/s: 6877495 +3863/20000 train_loss: 2.5130 train_time: 7.4m tok/s: 6877089 +3864/20000 train_loss: 2.4652 train_time: 7.4m tok/s: 6876680 +3865/20000 train_loss: 2.4883 train_time: 7.4m tok/s: 6876312 +3866/20000 train_loss: 2.4088 train_time: 7.4m tok/s: 6875923 +3867/20000 train_loss: 2.9000 train_time: 7.4m tok/s: 6875489 +3868/20000 train_loss: 2.3200 train_time: 7.4m tok/s: 6875039 +3869/20000 train_loss: 2.4428 train_time: 7.4m tok/s: 6874641 +3870/20000 train_loss: 2.5604 train_time: 7.4m tok/s: 6874268 +3871/20000 train_loss: 2.4081 train_time: 7.4m tok/s: 6873880 +3872/20000 train_loss: 2.4529 train_time: 7.4m tok/s: 6873498 +3873/20000 train_loss: 2.3845 train_time: 7.4m tok/s: 6873104 +3874/20000 train_loss: 2.4454 train_time: 7.4m tok/s: 6872730 +3875/20000 train_loss: 2.4432 train_time: 7.4m tok/s: 6872319 +3876/20000 train_loss: 2.3310 train_time: 7.4m tok/s: 6871920 +3877/20000 train_loss: 2.6133 train_time: 7.4m tok/s: 6871516 +3878/20000 train_loss: 2.9822 train_time: 7.4m tok/s: 6871078 +3879/20000 train_loss: 2.4192 train_time: 7.4m tok/s: 6870694 +3880/20000 train_loss: 2.4141 train_time: 7.4m tok/s: 6870308 +3881/20000 train_loss: 2.4094 train_time: 7.4m tok/s: 6869907 +3882/20000 train_loss: 2.3116 train_time: 7.4m tok/s: 6869501 +3883/20000 train_loss: 2.4688 train_time: 7.4m tok/s: 6869118 +3884/20000 train_loss: 2.4630 train_time: 7.4m tok/s: 6868722 +3885/20000 train_loss: 2.4130 train_time: 7.4m tok/s: 6868337 +3886/20000 train_loss: 2.3907 train_time: 7.4m tok/s: 6867977 +3887/20000 train_loss: 2.4180 train_time: 7.4m tok/s: 6867554 +3888/20000 train_loss: 2.4262 train_time: 7.4m tok/s: 6867187 +3889/20000 train_loss: 2.4126 train_time: 7.4m tok/s: 6866790 +3890/20000 train_loss: 2.3509 train_time: 7.4m tok/s: 6866399 +3891/20000 train_loss: 2.5906 train_time: 7.4m tok/s: 6866006 +3892/20000 train_loss: 2.3918 train_time: 7.4m tok/s: 6865602 +3893/20000 train_loss: 2.4996 train_time: 7.4m tok/s: 6865217 +3894/20000 train_loss: 2.2332 train_time: 7.4m tok/s: 6864811 +3895/20000 train_loss: 2.5625 train_time: 7.4m tok/s: 6864442 +3896/20000 train_loss: 2.4701 train_time: 7.4m tok/s: 6864042 +3897/20000 train_loss: 2.4656 train_time: 7.4m tok/s: 6863642 +3898/20000 train_loss: 2.3838 train_time: 7.4m tok/s: 6863260 +3899/20000 train_loss: 2.6190 train_time: 7.4m tok/s: 6862877 +3900/20000 train_loss: 2.5393 train_time: 7.4m tok/s: 6862453 +3901/20000 train_loss: 2.4487 train_time: 7.5m tok/s: 6862080 +3902/20000 train_loss: 2.4846 train_time: 7.5m tok/s: 6861665 +3903/20000 train_loss: 2.4824 train_time: 7.5m tok/s: 6861264 +3904/20000 train_loss: 2.3099 train_time: 7.5m tok/s: 6860857 +3905/20000 train_loss: 2.4326 train_time: 7.5m tok/s: 6860489 +3906/20000 train_loss: 2.5464 train_time: 7.5m tok/s: 6860133 +3907/20000 train_loss: 2.4664 train_time: 7.5m tok/s: 6859738 +3908/20000 train_loss: 2.4653 train_time: 7.5m tok/s: 6859353 +3909/20000 train_loss: 2.4722 train_time: 7.5m tok/s: 6858966 +3910/20000 train_loss: 2.4512 train_time: 7.5m tok/s: 6858579 +3911/20000 train_loss: 2.5226 train_time: 7.5m tok/s: 6858188 +3912/20000 train_loss: 2.5801 train_time: 7.5m tok/s: 6857796 +3913/20000 train_loss: 2.4880 train_time: 7.5m tok/s: 6857444 +3914/20000 train_loss: 2.4592 train_time: 7.5m tok/s: 6857041 +3915/20000 train_loss: 2.3521 train_time: 7.5m tok/s: 6856677 +3916/20000 train_loss: 2.4117 train_time: 7.5m tok/s: 6856293 +3917/20000 train_loss: 2.5813 train_time: 7.5m tok/s: 6855884 +3918/20000 train_loss: 2.4463 train_time: 7.5m tok/s: 6855496 +3919/20000 train_loss: 2.4059 train_time: 7.5m tok/s: 6855121 +3920/20000 train_loss: 2.3556 train_time: 7.5m tok/s: 6854741 +3921/20000 train_loss: 2.4929 train_time: 7.5m tok/s: 6854339 +3922/20000 train_loss: 2.5822 train_time: 7.5m tok/s: 6853942 +3923/20000 train_loss: 2.4931 train_time: 7.5m tok/s: 6853590 +3924/20000 train_loss: 2.5566 train_time: 7.5m tok/s: 6853202 +3925/20000 train_loss: 2.4436 train_time: 7.5m tok/s: 6852817 +3926/20000 train_loss: 2.3616 train_time: 7.5m tok/s: 6852416 +3927/20000 train_loss: 2.3416 train_time: 7.5m tok/s: 6852035 +3928/20000 train_loss: 2.4441 train_time: 7.5m tok/s: 6851640 +3929/20000 train_loss: 2.4321 train_time: 7.5m tok/s: 6851279 +3930/20000 train_loss: 2.5243 train_time: 7.5m tok/s: 6850901 +3931/20000 train_loss: 2.4869 train_time: 7.5m tok/s: 6850503 +3932/20000 train_loss: 2.4706 train_time: 7.5m tok/s: 6850100 +3933/20000 train_loss: 2.5444 train_time: 7.5m tok/s: 6849716 +3934/20000 train_loss: 2.4874 train_time: 7.5m tok/s: 6849344 +3935/20000 train_loss: 2.4046 train_time: 7.5m tok/s: 6848930 +3936/20000 train_loss: 2.3868 train_time: 7.5m tok/s: 6848552 +3937/20000 train_loss: 2.4047 train_time: 7.5m tok/s: 6848167 +3938/20000 train_loss: 2.3564 train_time: 7.5m tok/s: 6847813 +3939/20000 train_loss: 2.4955 train_time: 7.5m tok/s: 6847446 +3940/20000 train_loss: 2.4208 train_time: 7.5m tok/s: 6847019 +3941/20000 train_loss: 2.5829 train_time: 7.5m tok/s: 6846648 +3942/20000 train_loss: 2.3550 train_time: 7.5m tok/s: 6846255 +3943/20000 train_loss: 2.4204 train_time: 7.5m tok/s: 6845894 +3944/20000 train_loss: 2.4959 train_time: 7.6m tok/s: 6845524 +3945/20000 train_loss: 2.4134 train_time: 7.6m tok/s: 6845152 +3946/20000 train_loss: 2.4436 train_time: 7.6m tok/s: 6844765 +3947/20000 train_loss: 2.4690 train_time: 7.6m tok/s: 6844419 +3948/20000 train_loss: 2.3503 train_time: 7.6m tok/s: 6844025 +3949/20000 train_loss: 2.3712 train_time: 7.6m tok/s: 6843654 +3950/20000 train_loss: 2.3397 train_time: 7.6m tok/s: 6843291 +3951/20000 train_loss: 2.4713 train_time: 7.6m tok/s: 6842885 +3952/20000 train_loss: 2.4487 train_time: 7.6m tok/s: 6842509 +3953/20000 train_loss: 2.4707 train_time: 7.6m tok/s: 6842152 +3954/20000 train_loss: 2.4893 train_time: 7.6m tok/s: 6841761 +3955/20000 train_loss: 2.4770 train_time: 7.6m tok/s: 6841393 +3956/20000 train_loss: 2.4633 train_time: 7.6m tok/s: 6841005 +3957/20000 train_loss: 2.3417 train_time: 7.6m tok/s: 6840621 +3958/20000 train_loss: 2.5283 train_time: 7.6m tok/s: 6840248 +3959/20000 train_loss: 2.2671 train_time: 7.6m tok/s: 6839861 +3960/20000 train_loss: 2.4039 train_time: 7.6m tok/s: 6839479 +3961/20000 train_loss: 2.3840 train_time: 7.6m tok/s: 6839096 +3962/20000 train_loss: 2.3897 train_time: 7.6m tok/s: 6838714 +3963/20000 train_loss: 2.4836 train_time: 7.6m tok/s: 6838336 +3964/20000 train_loss: 2.2938 train_time: 7.6m tok/s: 6837946 +3965/20000 train_loss: 2.5968 train_time: 7.6m tok/s: 6837591 +3966/20000 train_loss: 2.4592 train_time: 7.6m tok/s: 6837245 +3967/20000 train_loss: 2.4269 train_time: 7.6m tok/s: 6836890 +3968/20000 train_loss: 2.4661 train_time: 7.6m tok/s: 6836502 +3969/20000 train_loss: 2.4154 train_time: 7.6m tok/s: 6836136 +3970/20000 train_loss: 2.5132 train_time: 7.6m tok/s: 6835770 +3971/20000 train_loss: 2.4153 train_time: 7.6m tok/s: 6835386 +3972/20000 train_loss: 2.4334 train_time: 7.6m tok/s: 6834998 +3973/20000 train_loss: 2.3944 train_time: 7.6m tok/s: 6834640 +3974/20000 train_loss: 2.3546 train_time: 7.6m tok/s: 6834263 +3975/20000 train_loss: 2.5170 train_time: 7.6m tok/s: 6833882 +3976/20000 train_loss: 2.4917 train_time: 7.6m tok/s: 6833525 +3977/20000 train_loss: 2.7617 train_time: 7.6m tok/s: 6833157 +3978/20000 train_loss: 2.4891 train_time: 7.6m tok/s: 6832775 +3979/20000 train_loss: 3.1321 train_time: 7.6m tok/s: 6832355 +3980/20000 train_loss: 2.4538 train_time: 7.6m tok/s: 6831974 +3981/20000 train_loss: 2.3807 train_time: 7.6m tok/s: 6831620 +3982/20000 train_loss: 2.4763 train_time: 7.6m tok/s: 6831247 +3983/20000 train_loss: 2.3951 train_time: 7.6m tok/s: 6830876 +3984/20000 train_loss: 2.5177 train_time: 7.6m tok/s: 6830521 +3985/20000 train_loss: 2.4754 train_time: 7.6m tok/s: 6830127 +3986/20000 train_loss: 2.4067 train_time: 7.6m tok/s: 6829769 +3987/20000 train_loss: 2.5605 train_time: 7.7m tok/s: 6829415 +3988/20000 train_loss: 2.4915 train_time: 7.7m tok/s: 6829069 +3989/20000 train_loss: 2.4377 train_time: 7.7m tok/s: 6828702 +3990/20000 train_loss: 2.4293 train_time: 7.7m tok/s: 6828335 +3991/20000 train_loss: 2.4458 train_time: 7.7m tok/s: 6827967 +3992/20000 train_loss: 2.4558 train_time: 7.7m tok/s: 6827616 +3993/20000 train_loss: 2.3848 train_time: 7.7m tok/s: 6827245 +3994/20000 train_loss: 2.1462 train_time: 7.7m tok/s: 6826838 +3995/20000 train_loss: 2.4542 train_time: 7.7m tok/s: 6826467 +3996/20000 train_loss: 2.3648 train_time: 7.7m tok/s: 6826112 +3997/20000 train_loss: 2.4724 train_time: 7.7m tok/s: 6825729 +3998/20000 train_loss: 2.4034 train_time: 7.7m tok/s: 6825328 +3999/20000 train_loss: 2.4134 train_time: 7.7m tok/s: 6824970 +4000/20000 train_loss: 2.5086 train_time: 7.7m tok/s: 6824630 +4001/20000 train_loss: 2.4248 train_time: 7.7m tok/s: 6824253 +4002/20000 train_loss: 2.4293 train_time: 7.7m tok/s: 6823898 +4003/20000 train_loss: 2.3754 train_time: 7.7m tok/s: 6823549 +4004/20000 train_loss: 2.4888 train_time: 7.7m tok/s: 6823178 +4005/20000 train_loss: 2.4371 train_time: 7.7m tok/s: 6822845 +4006/20000 train_loss: 2.4689 train_time: 7.7m tok/s: 6822467 +4007/20000 train_loss: 2.4741 train_time: 7.7m tok/s: 6822085 +4008/20000 train_loss: 2.3585 train_time: 7.7m tok/s: 6821717 +4009/20000 train_loss: 2.3883 train_time: 7.7m tok/s: 6821353 +4010/20000 train_loss: 2.4924 train_time: 7.7m tok/s: 6821004 +4011/20000 train_loss: 2.4562 train_time: 7.7m tok/s: 6820668 +4012/20000 train_loss: 2.4738 train_time: 7.7m tok/s: 6820305 +4013/20000 train_loss: 2.3530 train_time: 7.7m tok/s: 6819897 +4014/20000 train_loss: 2.3991 train_time: 7.7m tok/s: 6819534 +4015/20000 train_loss: 2.4246 train_time: 7.7m tok/s: 6819175 +4016/20000 train_loss: 2.4245 train_time: 7.7m tok/s: 6818793 +4017/20000 train_loss: 2.5373 train_time: 7.7m tok/s: 6818451 +4018/20000 train_loss: 2.3874 train_time: 7.7m tok/s: 6818077 +4019/20000 train_loss: 2.2990 train_time: 7.7m tok/s: 6817710 +4020/20000 train_loss: 2.3367 train_time: 7.7m tok/s: 6817353 +4021/20000 train_loss: 2.3800 train_time: 7.7m tok/s: 6816974 +4022/20000 train_loss: 2.4651 train_time: 7.7m tok/s: 6816608 +4023/20000 train_loss: 2.4449 train_time: 7.7m tok/s: 6816249 +4024/20000 train_loss: 2.5105 train_time: 7.7m tok/s: 6815898 +4025/20000 train_loss: 2.4879 train_time: 7.7m tok/s: 6815558 +4026/20000 train_loss: 2.4138 train_time: 7.7m tok/s: 6815198 +4027/20000 train_loss: 2.3432 train_time: 7.7m tok/s: 6814833 +4028/20000 train_loss: 2.2402 train_time: 7.7m tok/s: 6814458 +4029/20000 train_loss: 2.3892 train_time: 7.7m tok/s: 6814096 +4030/20000 train_loss: 2.3981 train_time: 7.8m tok/s: 6813745 +4031/20000 train_loss: 2.4777 train_time: 7.8m tok/s: 6813405 +4032/20000 train_loss: 2.3730 train_time: 7.8m tok/s: 6813047 +4033/20000 train_loss: 2.4299 train_time: 7.8m tok/s: 6812682 +4034/20000 train_loss: 2.5988 train_time: 7.8m tok/s: 6812349 +4035/20000 train_loss: 2.5305 train_time: 7.8m tok/s: 6812002 +4036/20000 train_loss: 2.4943 train_time: 7.8m tok/s: 6811664 +4037/20000 train_loss: 2.4833 train_time: 7.8m tok/s: 6811307 +4038/20000 train_loss: 2.4900 train_time: 7.8m tok/s: 6810965 +4039/20000 train_loss: 2.4756 train_time: 7.8m tok/s: 6810637 +4040/20000 train_loss: 2.4218 train_time: 7.8m tok/s: 6810239 +4041/20000 train_loss: 2.3898 train_time: 7.8m tok/s: 6809883 +4042/20000 train_loss: 2.3638 train_time: 7.8m tok/s: 6809530 +4043/20000 train_loss: 2.3782 train_time: 7.8m tok/s: 6809175 +4044/20000 train_loss: 2.4241 train_time: 7.8m tok/s: 6808814 +4045/20000 train_loss: 2.4580 train_time: 7.8m tok/s: 6808431 +4046/20000 train_loss: 2.5261 train_time: 7.8m tok/s: 6808063 +4047/20000 train_loss: 2.5003 train_time: 7.8m tok/s: 6807691 +4048/20000 train_loss: 2.4269 train_time: 7.8m tok/s: 6807330 +4049/20000 train_loss: 2.5538 train_time: 7.8m tok/s: 6806967 +4050/20000 train_loss: 2.4431 train_time: 7.8m tok/s: 6806586 +4051/20000 train_loss: 2.4053 train_time: 7.8m tok/s: 6806250 +4052/20000 train_loss: 2.4874 train_time: 7.8m tok/s: 6805881 +4053/20000 train_loss: 2.4375 train_time: 7.8m tok/s: 6805521 +4054/20000 train_loss: 2.4157 train_time: 7.8m tok/s: 6805168 +4055/20000 train_loss: 2.4445 train_time: 7.8m tok/s: 6804811 +4056/20000 train_loss: 2.3251 train_time: 7.8m tok/s: 6804436 +4057/20000 train_loss: 2.4155 train_time: 7.8m tok/s: 6804084 +4058/20000 train_loss: 2.5582 train_time: 7.8m tok/s: 6803707 +4059/20000 train_loss: 2.2789 train_time: 7.8m tok/s: 6803364 +4060/20000 train_loss: 2.3945 train_time: 7.8m tok/s: 6803038 +4061/20000 train_loss: 2.4996 train_time: 7.8m tok/s: 6802705 +4062/20000 train_loss: 2.4868 train_time: 7.8m tok/s: 6802355 +4063/20000 train_loss: 2.4501 train_time: 7.8m tok/s: 6801988 +4064/20000 train_loss: 2.4304 train_time: 7.8m tok/s: 6801634 +4065/20000 train_loss: 2.5011 train_time: 7.8m tok/s: 6801319 +4066/20000 train_loss: 2.4089 train_time: 7.8m tok/s: 6800941 +4067/20000 train_loss: 2.4782 train_time: 7.8m tok/s: 6800592 +4068/20000 train_loss: 2.4794 train_time: 7.8m tok/s: 6800239 +4069/20000 train_loss: 2.3655 train_time: 7.8m tok/s: 6799879 +4070/20000 train_loss: 2.3726 train_time: 7.8m tok/s: 6799546 +4071/20000 train_loss: 2.7212 train_time: 7.8m tok/s: 6799177 +4072/20000 train_loss: 2.4522 train_time: 7.9m tok/s: 6798822 +4073/20000 train_loss: 2.4496 train_time: 7.9m tok/s: 6798468 +4074/20000 train_loss: 2.4084 train_time: 7.9m tok/s: 6798123 +4075/20000 train_loss: 2.5591 train_time: 7.9m tok/s: 6797776 +4076/20000 train_loss: 2.5841 train_time: 7.9m tok/s: 6797420 +4077/20000 train_loss: 2.4936 train_time: 7.9m tok/s: 6797077 +4078/20000 train_loss: 2.3752 train_time: 7.9m tok/s: 6796721 +4079/20000 train_loss: 3.0162 train_time: 7.9m tok/s: 6796330 +4080/20000 train_loss: 2.3605 train_time: 7.9m tok/s: 6795975 +4081/20000 train_loss: 2.3883 train_time: 7.9m tok/s: 6795624 +4082/20000 train_loss: 2.4052 train_time: 7.9m tok/s: 6795298 +4083/20000 train_loss: 2.3182 train_time: 7.9m tok/s: 6794958 +4084/20000 train_loss: 2.2602 train_time: 7.9m tok/s: 6794602 +4085/20000 train_loss: 2.3571 train_time: 7.9m tok/s: 6794255 +4086/20000 train_loss: 2.4926 train_time: 7.9m tok/s: 6793907 +4087/20000 train_loss: 2.5340 train_time: 7.9m tok/s: 6793564 +4088/20000 train_loss: 2.5375 train_time: 7.9m tok/s: 6793227 +4089/20000 train_loss: 2.4422 train_time: 7.9m tok/s: 6792890 +4090/20000 train_loss: 2.3684 train_time: 7.9m tok/s: 6792556 +4091/20000 train_loss: 2.5207 train_time: 7.9m tok/s: 6792212 +4092/20000 train_loss: 2.5266 train_time: 7.9m tok/s: 6791844 +4093/20000 train_loss: 2.4965 train_time: 7.9m tok/s: 6791482 +4094/20000 train_loss: 2.2972 train_time: 7.9m tok/s: 6791100 +4095/20000 train_loss: 2.4546 train_time: 7.9m tok/s: 6790771 +4096/20000 train_loss: 2.2624 train_time: 7.9m tok/s: 6790401 +4097/20000 train_loss: 2.3104 train_time: 7.9m tok/s: 6790057 +4098/20000 train_loss: 2.4172 train_time: 7.9m tok/s: 6789736 +4099/20000 train_loss: 2.1982 train_time: 7.9m tok/s: 6789370 +4100/20000 train_loss: 2.4212 train_time: 7.9m tok/s: 6789025 +4101/20000 train_loss: 2.5318 train_time: 7.9m tok/s: 6788689 +4102/20000 train_loss: 2.2881 train_time: 7.9m tok/s: 6788328 +4103/20000 train_loss: 2.4378 train_time: 7.9m tok/s: 6788011 +4104/20000 train_loss: 2.3829 train_time: 7.9m tok/s: 6787661 +4105/20000 train_loss: 2.4353 train_time: 7.9m tok/s: 6787287 +4106/20000 train_loss: 2.4633 train_time: 7.9m tok/s: 6786981 +4107/20000 train_loss: 2.4107 train_time: 7.9m tok/s: 6786630 +4108/20000 train_loss: 2.4178 train_time: 7.9m tok/s: 6786289 +4109/20000 train_loss: 2.3910 train_time: 7.9m tok/s: 6785968 +4110/20000 train_loss: 2.3480 train_time: 7.9m tok/s: 6785620 +4111/20000 train_loss: 2.3982 train_time: 7.9m tok/s: 6785288 +4112/20000 train_loss: 2.4445 train_time: 7.9m tok/s: 6784938 +4113/20000 train_loss: 2.4217 train_time: 7.9m tok/s: 6784604 +4114/20000 train_loss: 2.3916 train_time: 7.9m tok/s: 6784287 +4115/20000 train_loss: 2.5291 train_time: 8.0m tok/s: 6783948 +4116/20000 train_loss: 2.3682 train_time: 8.0m tok/s: 6783615 +4117/20000 train_loss: 2.3827 train_time: 8.0m tok/s: 6783267 +4118/20000 train_loss: 2.4309 train_time: 8.0m tok/s: 6782914 +4119/20000 train_loss: 2.4676 train_time: 8.0m tok/s: 6782568 +4120/20000 train_loss: 2.4346 train_time: 8.0m tok/s: 6782203 +4121/20000 train_loss: 2.4029 train_time: 8.0m tok/s: 6781842 +4122/20000 train_loss: 2.2872 train_time: 8.0m tok/s: 6781480 +4123/20000 train_loss: 2.3867 train_time: 8.0m tok/s: 6781144 +4124/20000 train_loss: 2.2565 train_time: 8.0m tok/s: 6780797 +4125/20000 train_loss: 2.4529 train_time: 8.0m tok/s: 6780473 +4126/20000 train_loss: 2.3724 train_time: 8.0m tok/s: 6780149 +4127/20000 train_loss: 2.3765 train_time: 8.0m tok/s: 6779811 +4128/20000 train_loss: 2.4317 train_time: 8.0m tok/s: 6779476 +4129/20000 train_loss: 2.4646 train_time: 8.0m tok/s: 6779146 +4130/20000 train_loss: 2.4017 train_time: 8.0m tok/s: 6778801 +4131/20000 train_loss: 2.4511 train_time: 8.0m tok/s: 6778454 +4132/20000 train_loss: 2.4239 train_time: 8.0m tok/s: 6778113 +4133/20000 train_loss: 2.4400 train_time: 8.0m tok/s: 6777748 +4134/20000 train_loss: 2.4769 train_time: 8.0m tok/s: 6777410 +4135/20000 train_loss: 2.4648 train_time: 8.0m tok/s: 6777077 +4136/20000 train_loss: 2.2946 train_time: 8.0m tok/s: 6776752 +4137/20000 train_loss: 2.4986 train_time: 8.0m tok/s: 6776420 +4138/20000 train_loss: 2.3036 train_time: 8.0m tok/s: 6776081 +4139/20000 train_loss: 2.4589 train_time: 8.0m tok/s: 6775735 +4140/20000 train_loss: 2.5240 train_time: 8.0m tok/s: 6775402 +4141/20000 train_loss: 2.3323 train_time: 8.0m tok/s: 6775065 +4142/20000 train_loss: 2.5065 train_time: 8.0m tok/s: 6774748 +4143/20000 train_loss: 2.4067 train_time: 8.0m tok/s: 6774437 +4144/20000 train_loss: 2.5090 train_time: 8.0m tok/s: 6774092 +4145/20000 train_loss: 2.2740 train_time: 8.0m tok/s: 6773742 +4146/20000 train_loss: 2.4508 train_time: 8.0m tok/s: 6773404 +4147/20000 train_loss: 2.5448 train_time: 8.0m tok/s: 6773088 +4148/20000 train_loss: 2.4373 train_time: 8.0m tok/s: 6772730 +4149/20000 train_loss: 2.2973 train_time: 8.0m tok/s: 6772384 +4150/20000 train_loss: 2.3496 train_time: 8.0m tok/s: 6772060 +4151/20000 train_loss: 2.4387 train_time: 8.0m tok/s: 6771732 +4152/20000 train_loss: 2.4169 train_time: 8.0m tok/s: 6771408 +4153/20000 train_loss: 2.5086 train_time: 8.0m tok/s: 6771075 +4154/20000 train_loss: 2.3613 train_time: 8.0m tok/s: 6770753 +4155/20000 train_loss: 2.5002 train_time: 8.0m tok/s: 6770439 +4156/20000 train_loss: 2.4295 train_time: 8.0m tok/s: 6770128 +4157/20000 train_loss: 2.4865 train_time: 8.0m tok/s: 6769799 +4158/20000 train_loss: 2.4446 train_time: 8.1m tok/s: 6769481 +4159/20000 train_loss: 2.3494 train_time: 8.1m tok/s: 6769138 +4160/20000 train_loss: 2.3496 train_time: 8.1m tok/s: 6768828 +4161/20000 train_loss: 2.4090 train_time: 8.1m tok/s: 6768477 +4162/20000 train_loss: 2.3205 train_time: 8.1m tok/s: 6768128 +4163/20000 train_loss: 2.5203 train_time: 8.1m tok/s: 6767795 +4164/20000 train_loss: 2.4421 train_time: 8.1m tok/s: 6767465 +4165/20000 train_loss: 2.4440 train_time: 8.1m tok/s: 6767135 +4166/20000 train_loss: 2.4453 train_time: 8.1m tok/s: 6766790 +4167/20000 train_loss: 2.5254 train_time: 8.1m tok/s: 6766449 +4168/20000 train_loss: 2.4335 train_time: 8.1m tok/s: 6766136 +4169/20000 train_loss: 2.4216 train_time: 8.1m tok/s: 6765786 +4170/20000 train_loss: 2.3402 train_time: 8.1m tok/s: 6765460 +4171/20000 train_loss: 2.4381 train_time: 8.1m tok/s: 6765113 +4172/20000 train_loss: 2.3941 train_time: 8.1m tok/s: 6764770 +4173/20000 train_loss: 2.5120 train_time: 8.1m tok/s: 6764444 +4174/20000 train_loss: 2.3082 train_time: 8.1m tok/s: 6764114 +4175/20000 train_loss: 2.4047 train_time: 8.1m tok/s: 6763789 +4176/20000 train_loss: 2.5040 train_time: 8.1m tok/s: 6763464 +4177/20000 train_loss: 2.3790 train_time: 8.1m tok/s: 6763154 +4178/20000 train_loss: 2.4510 train_time: 8.1m tok/s: 6762847 +4179/20000 train_loss: 2.3983 train_time: 8.1m tok/s: 6762508 +4180/20000 train_loss: 2.3898 train_time: 8.1m tok/s: 6762198 +4181/20000 train_loss: 2.5457 train_time: 8.1m tok/s: 6761880 +4182/20000 train_loss: 2.4741 train_time: 8.1m tok/s: 6761554 +4183/20000 train_loss: 2.4599 train_time: 8.1m tok/s: 6761246 +4184/20000 train_loss: 2.4768 train_time: 8.1m tok/s: 6760917 +4185/20000 train_loss: 2.5708 train_time: 8.1m tok/s: 6760595 +4186/20000 train_loss: 2.4733 train_time: 8.1m tok/s: 6760290 +4187/20000 train_loss: 2.2928 train_time: 8.1m tok/s: 6759951 +4188/20000 train_loss: 2.4576 train_time: 8.1m tok/s: 6759596 +4189/20000 train_loss: 2.4503 train_time: 8.1m tok/s: 6759282 +4190/20000 train_loss: 2.5355 train_time: 8.1m tok/s: 6758958 +4191/20000 train_loss: 2.4213 train_time: 8.1m tok/s: 6758670 +4192/20000 train_loss: 2.3713 train_time: 8.1m tok/s: 6758348 +4193/20000 train_loss: 2.4034 train_time: 8.1m tok/s: 6758045 +4194/20000 train_loss: 2.4716 train_time: 8.1m tok/s: 6757714 +4195/20000 train_loss: 2.4858 train_time: 8.1m tok/s: 6757386 +4196/20000 train_loss: 2.3112 train_time: 8.1m tok/s: 6757083 +4197/20000 train_loss: 2.3249 train_time: 8.1m tok/s: 6756762 +4198/20000 train_loss: 2.4306 train_time: 8.1m tok/s: 6756445 +4199/20000 train_loss: 2.2418 train_time: 8.1m tok/s: 6756139 +4200/20000 train_loss: 2.4311 train_time: 8.1m tok/s: 6755770 +4201/20000 train_loss: 2.3674 train_time: 8.2m tok/s: 6755428 +4202/20000 train_loss: 2.4532 train_time: 8.2m tok/s: 6755107 +4203/20000 train_loss: 2.5882 train_time: 8.2m tok/s: 6754804 +4204/20000 train_loss: 2.4920 train_time: 8.2m tok/s: 6754477 +4205/20000 train_loss: 2.4699 train_time: 8.2m tok/s: 6754142 +4206/20000 train_loss: 2.3964 train_time: 8.2m tok/s: 6753812 +4207/20000 train_loss: 2.4046 train_time: 8.2m tok/s: 6753490 +4208/20000 train_loss: 2.3843 train_time: 8.2m tok/s: 6753153 +4209/20000 train_loss: 2.3872 train_time: 8.2m tok/s: 6752824 +4210/20000 train_loss: 2.3831 train_time: 8.2m tok/s: 6752528 +4211/20000 train_loss: 2.5570 train_time: 8.2m tok/s: 6752208 +4212/20000 train_loss: 2.3708 train_time: 8.2m tok/s: 6751876 +4213/20000 train_loss: 2.3994 train_time: 8.2m tok/s: 6751557 +4214/20000 train_loss: 2.4226 train_time: 8.2m tok/s: 6751237 +4215/20000 train_loss: 2.4551 train_time: 8.2m tok/s: 6750924 +4216/20000 train_loss: 2.5508 train_time: 8.2m tok/s: 6750585 +4217/20000 train_loss: 2.3023 train_time: 8.2m tok/s: 6750268 +4218/20000 train_loss: 2.4560 train_time: 8.2m tok/s: 6749944 +4219/20000 train_loss: 2.3809 train_time: 8.2m tok/s: 6749620 +4220/20000 train_loss: 2.0869 train_time: 8.2m tok/s: 6749290 +4221/20000 train_loss: 2.4712 train_time: 8.2m tok/s: 6748985 +4222/20000 train_loss: 2.4673 train_time: 8.2m tok/s: 6748650 +4223/20000 train_loss: 2.5657 train_time: 8.2m tok/s: 6748328 +4224/20000 train_loss: 2.4521 train_time: 8.2m tok/s: 6748009 +4225/20000 train_loss: 2.4568 train_time: 8.2m tok/s: 6747686 +4226/20000 train_loss: 2.4251 train_time: 8.2m tok/s: 6747370 +4227/20000 train_loss: 2.4672 train_time: 8.2m tok/s: 6747049 +4228/20000 train_loss: 2.4428 train_time: 8.2m tok/s: 6746713 +4229/20000 train_loss: 2.4380 train_time: 8.2m tok/s: 6746377 +4230/20000 train_loss: 2.3752 train_time: 8.2m tok/s: 6746045 +4231/20000 train_loss: 2.3637 train_time: 8.2m tok/s: 6745742 +4232/20000 train_loss: 2.4025 train_time: 8.2m tok/s: 6745425 +4233/20000 train_loss: 2.5403 train_time: 8.2m tok/s: 6745114 +4234/20000 train_loss: 2.3450 train_time: 8.2m tok/s: 6744815 +4235/20000 train_loss: 2.5143 train_time: 8.2m tok/s: 6744472 +4236/20000 train_loss: 2.5647 train_time: 8.2m tok/s: 6744150 +4237/20000 train_loss: 2.4392 train_time: 8.2m tok/s: 6743822 +4238/20000 train_loss: 2.5937 train_time: 8.2m tok/s: 6743522 +4239/20000 train_loss: 2.4969 train_time: 8.2m tok/s: 6743220 +4240/20000 train_loss: 2.5684 train_time: 8.2m tok/s: 6742920 +4241/20000 train_loss: 2.3559 train_time: 8.2m tok/s: 6742616 +4242/20000 train_loss: 2.4619 train_time: 8.2m tok/s: 6742269 +4243/20000 train_loss: 2.4046 train_time: 8.2m tok/s: 6741953 +4244/20000 train_loss: 2.4581 train_time: 8.3m tok/s: 6741637 +4245/20000 train_loss: 2.4523 train_time: 8.3m tok/s: 6741292 +4246/20000 train_loss: 2.4917 train_time: 8.3m tok/s: 6740967 +4247/20000 train_loss: 2.3733 train_time: 8.3m tok/s: 6740645 +4248/20000 train_loss: 2.4122 train_time: 8.3m tok/s: 6740342 +4249/20000 train_loss: 2.4317 train_time: 8.3m tok/s: 6740022 +4250/20000 train_loss: 2.3883 train_time: 8.3m tok/s: 6739707 +4251/20000 train_loss: 2.4064 train_time: 8.3m tok/s: 6739410 +4252/20000 train_loss: 2.4818 train_time: 8.3m tok/s: 6739105 +4253/20000 train_loss: 2.5399 train_time: 8.3m tok/s: 6738761 +4254/20000 train_loss: 2.6597 train_time: 8.3m tok/s: 6738440 +4255/20000 train_loss: 2.4498 train_time: 8.3m tok/s: 6738141 +4256/20000 train_loss: 2.5076 train_time: 8.3m tok/s: 6737820 +4257/20000 train_loss: 2.5249 train_time: 8.3m tok/s: 6737495 +4258/20000 train_loss: 2.3313 train_time: 8.3m tok/s: 6737182 +4259/20000 train_loss: 2.4224 train_time: 8.3m tok/s: 6736867 +4260/20000 train_loss: 2.3930 train_time: 8.3m tok/s: 6736549 +4261/20000 train_loss: 2.2937 train_time: 8.3m tok/s: 6736236 +4262/20000 train_loss: 2.3722 train_time: 8.3m tok/s: 6735933 +4263/20000 train_loss: 2.3381 train_time: 8.3m tok/s: 6735626 +4264/20000 train_loss: 2.3720 train_time: 8.3m tok/s: 6735305 +4265/20000 train_loss: 2.4368 train_time: 8.3m tok/s: 6734996 +4266/20000 train_loss: 2.5154 train_time: 8.3m tok/s: 6734689 +4267/20000 train_loss: 2.4061 train_time: 8.3m tok/s: 6734380 +4268/20000 train_loss: 2.4661 train_time: 8.3m tok/s: 6734051 +4269/20000 train_loss: 2.4080 train_time: 8.3m tok/s: 6733739 +4270/20000 train_loss: 2.3160 train_time: 8.3m tok/s: 6733430 +4271/20000 train_loss: 2.4296 train_time: 8.3m tok/s: 6733134 +4272/20000 train_loss: 2.3842 train_time: 8.3m tok/s: 6732802 +4273/20000 train_loss: 2.4125 train_time: 8.3m tok/s: 6732501 +4274/20000 train_loss: 2.4570 train_time: 8.3m tok/s: 6732154 +4275/20000 train_loss: 2.3029 train_time: 8.3m tok/s: 6731839 +4276/20000 train_loss: 2.3877 train_time: 8.3m tok/s: 6731523 +4277/20000 train_loss: 2.3578 train_time: 8.3m tok/s: 6731213 +4278/20000 train_loss: 2.3801 train_time: 8.3m tok/s: 6730909 +4279/20000 train_loss: 2.3704 train_time: 8.3m tok/s: 6730603 +4280/20000 train_loss: 2.5244 train_time: 8.3m tok/s: 6730281 +4281/20000 train_loss: 2.6060 train_time: 8.3m tok/s: 6729991 +4282/20000 train_loss: 2.4336 train_time: 8.3m tok/s: 6729673 +4283/20000 train_loss: 2.6003 train_time: 8.3m tok/s: 6729366 +4284/20000 train_loss: 2.5172 train_time: 8.3m tok/s: 6729068 +4285/20000 train_loss: 2.3987 train_time: 8.3m tok/s: 6728756 +4286/20000 train_loss: 2.4837 train_time: 8.3m tok/s: 6728439 +4287/20000 train_loss: 2.3741 train_time: 8.4m tok/s: 6728120 +4288/20000 train_loss: 2.3980 train_time: 8.4m tok/s: 6727835 +4289/20000 train_loss: 2.5729 train_time: 8.4m tok/s: 6727472 +4290/20000 train_loss: 2.3629 train_time: 8.4m tok/s: 6727174 +4291/20000 train_loss: 2.3197 train_time: 8.4m tok/s: 6726876 +4292/20000 train_loss: 2.4616 train_time: 8.4m tok/s: 6726550 +4293/20000 train_loss: 2.4285 train_time: 8.4m tok/s: 6726254 +4294/20000 train_loss: 2.3789 train_time: 8.4m tok/s: 6725941 +4295/20000 train_loss: 2.4187 train_time: 8.4m tok/s: 6725640 +4296/20000 train_loss: 2.2147 train_time: 8.4m tok/s: 6725336 +4297/20000 train_loss: 2.4772 train_time: 8.4m tok/s: 6725036 +4298/20000 train_loss: 2.4033 train_time: 8.4m tok/s: 6724721 +4299/20000 train_loss: 2.3425 train_time: 8.4m tok/s: 6724416 +4300/20000 train_loss: 2.5179 train_time: 8.4m tok/s: 6724120 +4301/20000 train_loss: 2.1917 train_time: 8.4m tok/s: 6723769 +4302/20000 train_loss: 2.3421 train_time: 8.4m tok/s: 6723434 +4303/20000 train_loss: 2.3342 train_time: 8.4m tok/s: 6723145 +4304/20000 train_loss: 2.3477 train_time: 8.4m tok/s: 6722855 +4305/20000 train_loss: 2.3295 train_time: 8.4m tok/s: 6722547 +4306/20000 train_loss: 2.3932 train_time: 8.4m tok/s: 6722218 +4307/20000 train_loss: 2.3384 train_time: 8.4m tok/s: 6721927 +4308/20000 train_loss: 2.3776 train_time: 8.4m tok/s: 6721634 +4309/20000 train_loss: 2.5795 train_time: 8.4m tok/s: 6721327 +4310/20000 train_loss: 2.4539 train_time: 8.4m tok/s: 6721018 +4311/20000 train_loss: 2.5404 train_time: 8.4m tok/s: 6720720 +4312/20000 train_loss: 2.4182 train_time: 8.4m tok/s: 6720425 +4313/20000 train_loss: 2.2548 train_time: 8.4m tok/s: 6720130 +4314/20000 train_loss: 2.4311 train_time: 8.4m tok/s: 6719824 +4315/20000 train_loss: 2.4057 train_time: 8.4m tok/s: 6719504 +4316/20000 train_loss: 2.4178 train_time: 8.4m tok/s: 6719202 +4317/20000 train_loss: 2.4920 train_time: 8.4m tok/s: 6718879 +4318/20000 train_loss: 2.4902 train_time: 8.4m tok/s: 6718569 +4319/20000 train_loss: 2.1803 train_time: 8.4m tok/s: 6718263 +4320/20000 train_loss: 2.2687 train_time: 8.4m tok/s: 6717964 +4321/20000 train_loss: 2.3737 train_time: 8.4m tok/s: 6717680 +4322/20000 train_loss: 2.4196 train_time: 8.4m tok/s: 6717352 +4323/20000 train_loss: 2.4183 train_time: 8.4m tok/s: 6717056 +4324/20000 train_loss: 2.4391 train_time: 8.4m tok/s: 6716772 +4325/20000 train_loss: 2.4631 train_time: 8.4m tok/s: 6716470 +4326/20000 train_loss: 2.3656 train_time: 8.4m tok/s: 6716152 +4327/20000 train_loss: 2.4104 train_time: 8.4m tok/s: 6715841 +4328/20000 train_loss: 2.3848 train_time: 8.4m tok/s: 6715556 +4329/20000 train_loss: 2.4337 train_time: 8.4m tok/s: 6715255 +4330/20000 train_loss: 2.3048 train_time: 8.5m tok/s: 6714945 +4331/20000 train_loss: 2.3909 train_time: 8.5m tok/s: 6714645 +4332/20000 train_loss: 2.9206 train_time: 8.5m tok/s: 6714297 +4333/20000 train_loss: 2.3804 train_time: 8.5m tok/s: 6714006 +4334/20000 train_loss: 2.4459 train_time: 8.5m tok/s: 6713705 +4335/20000 train_loss: 2.5386 train_time: 8.5m tok/s: 6713392 +4336/20000 train_loss: 2.4135 train_time: 8.5m tok/s: 6713089 +4337/20000 train_loss: 2.5094 train_time: 8.5m tok/s: 6712822 +4338/20000 train_loss: 2.5376 train_time: 8.5m tok/s: 6712520 +4339/20000 train_loss: 2.4445 train_time: 8.5m tok/s: 6712241 +4340/20000 train_loss: 2.4248 train_time: 8.5m tok/s: 6711944 +4341/20000 train_loss: 2.4110 train_time: 8.5m tok/s: 6711627 +4342/20000 train_loss: 2.5170 train_time: 8.5m tok/s: 6711318 +4343/20000 train_loss: 2.4869 train_time: 8.5m tok/s: 6711024 +4344/20000 train_loss: 2.4891 train_time: 8.5m tok/s: 6710725 +4345/20000 train_loss: 2.3232 train_time: 8.5m tok/s: 6710397 +4346/20000 train_loss: 2.3951 train_time: 8.5m tok/s: 6710093 +4347/20000 train_loss: 2.4071 train_time: 8.5m tok/s: 6709800 +4348/20000 train_loss: 2.4008 train_time: 8.5m tok/s: 6709489 +4349/20000 train_loss: 1.8638 train_time: 8.5m tok/s: 6709134 +4350/20000 train_loss: 2.2322 train_time: 8.5m tok/s: 6708827 +4351/20000 train_loss: 2.4177 train_time: 8.5m tok/s: 6708548 +4352/20000 train_loss: 2.3678 train_time: 8.5m tok/s: 6708248 +4353/20000 train_loss: 2.4608 train_time: 8.5m tok/s: 6707961 +4354/20000 train_loss: 2.4817 train_time: 8.5m tok/s: 6707688 +4355/20000 train_loss: 2.4084 train_time: 8.5m tok/s: 6707399 +4356/20000 train_loss: 2.4721 train_time: 8.5m tok/s: 6707101 +4357/20000 train_loss: 2.5066 train_time: 8.5m tok/s: 6706787 +4358/20000 train_loss: 2.4083 train_time: 8.5m tok/s: 6706503 +4359/20000 train_loss: 2.4198 train_time: 8.5m tok/s: 6706202 +4360/20000 train_loss: 2.3205 train_time: 8.5m tok/s: 6705936 +4361/20000 train_loss: 2.2836 train_time: 8.5m tok/s: 6705639 +4362/20000 train_loss: 2.4564 train_time: 8.5m tok/s: 6705347 +4363/20000 train_loss: 2.2157 train_time: 8.5m tok/s: 6705030 +4364/20000 train_loss: 2.3510 train_time: 8.5m tok/s: 6704729 +4365/20000 train_loss: 2.5510 train_time: 8.5m tok/s: 6704437 +4366/20000 train_loss: 2.7785 train_time: 8.5m tok/s: 6704126 +4367/20000 train_loss: 2.5286 train_time: 8.5m tok/s: 6703837 +4368/20000 train_loss: 2.5046 train_time: 8.5m tok/s: 6703545 +4369/20000 train_loss: 2.3111 train_time: 8.5m tok/s: 6703248 +4370/20000 train_loss: 2.4952 train_time: 8.5m tok/s: 6702951 +4371/20000 train_loss: 2.3901 train_time: 8.5m tok/s: 6702636 +4372/20000 train_loss: 2.4821 train_time: 8.5m tok/s: 6702336 +4373/20000 train_loss: 2.2513 train_time: 8.6m tok/s: 6702044 +4374/20000 train_loss: 2.3145 train_time: 8.6m tok/s: 6701744 +4375/20000 train_loss: 2.4068 train_time: 8.6m tok/s: 6701452 +4376/20000 train_loss: 2.3847 train_time: 8.6m tok/s: 6701151 +4377/20000 train_loss: 2.5562 train_time: 8.6m tok/s: 6700865 +4378/20000 train_loss: 2.2681 train_time: 8.6m tok/s: 6700553 +4379/20000 train_loss: 2.3312 train_time: 8.6m tok/s: 6700264 +4380/20000 train_loss: 2.4350 train_time: 8.6m tok/s: 6699983 +4381/20000 train_loss: 2.5042 train_time: 8.6m tok/s: 6699715 +4382/20000 train_loss: 2.4889 train_time: 8.6m tok/s: 6699401 +4383/20000 train_loss: 2.4346 train_time: 8.6m tok/s: 6699104 +4384/20000 train_loss: 2.4967 train_time: 8.6m tok/s: 6698818 +4385/20000 train_loss: 2.4229 train_time: 8.6m tok/s: 6698510 +4386/20000 train_loss: 2.4015 train_time: 8.6m tok/s: 6698217 +4387/20000 train_loss: 2.4702 train_time: 8.6m tok/s: 6697918 +4388/20000 train_loss: 2.3868 train_time: 8.6m tok/s: 6697618 +4389/20000 train_loss: 2.2641 train_time: 8.6m tok/s: 6697349 +4390/20000 train_loss: 2.4065 train_time: 8.6m tok/s: 6697053 +4391/20000 train_loss: 2.2857 train_time: 8.6m tok/s: 6696759 +4392/20000 train_loss: 2.3938 train_time: 8.6m tok/s: 6696459 +4393/20000 train_loss: 2.3941 train_time: 8.6m tok/s: 6696175 +4394/20000 train_loss: 2.3729 train_time: 8.6m tok/s: 6695889 +4395/20000 train_loss: 2.3396 train_time: 8.6m tok/s: 6695600 +4396/20000 train_loss: 2.5864 train_time: 8.6m tok/s: 6695294 +4397/20000 train_loss: 2.4150 train_time: 8.6m tok/s: 6694995 +4398/20000 train_loss: 2.3446 train_time: 8.6m tok/s: 6694711 +4399/20000 train_loss: 2.3617 train_time: 8.6m tok/s: 6694423 +4400/20000 train_loss: 2.5097 train_time: 8.6m tok/s: 6694125 +4401/20000 train_loss: 2.4340 train_time: 8.6m tok/s: 6693846 +4402/20000 train_loss: 2.3791 train_time: 8.6m tok/s: 6693550 +4403/20000 train_loss: 2.3714 train_time: 8.6m tok/s: 6693274 +4404/20000 train_loss: 2.3185 train_time: 8.6m tok/s: 6692975 +4405/20000 train_loss: 2.5384 train_time: 8.6m tok/s: 6692672 +4406/20000 train_loss: 2.4067 train_time: 8.6m tok/s: 6692376 +4407/20000 train_loss: 2.4543 train_time: 8.6m tok/s: 6692077 +4408/20000 train_loss: 2.4780 train_time: 8.6m tok/s: 6691801 +4409/20000 train_loss: 2.4206 train_time: 8.6m tok/s: 6691532 +4410/20000 train_loss: 2.4486 train_time: 8.6m tok/s: 6691249 +4411/20000 train_loss: 2.4662 train_time: 8.6m tok/s: 6690973 +4412/20000 train_loss: 2.5253 train_time: 8.6m tok/s: 6690688 +4413/20000 train_loss: 2.3458 train_time: 8.6m tok/s: 6690401 +4414/20000 train_loss: 2.3558 train_time: 8.6m tok/s: 6690117 +4415/20000 train_loss: 2.5658 train_time: 8.7m tok/s: 6689799 +4416/20000 train_loss: 2.3855 train_time: 8.7m tok/s: 6689487 +4417/20000 train_loss: 2.2975 train_time: 8.7m tok/s: 6689198 +4418/20000 train_loss: 2.3765 train_time: 8.7m tok/s: 6688913 +4419/20000 train_loss: 2.3659 train_time: 8.7m tok/s: 6688626 +4420/20000 train_loss: 2.2860 train_time: 8.7m tok/s: 6688336 +4421/20000 train_loss: 2.2528 train_time: 8.7m tok/s: 6688042 +4422/20000 train_loss: 2.3869 train_time: 8.7m tok/s: 6687754 +4423/20000 train_loss: 2.5344 train_time: 8.7m tok/s: 6687448 +4424/20000 train_loss: 2.4663 train_time: 8.7m tok/s: 6687170 +4425/20000 train_loss: 2.5008 train_time: 8.7m tok/s: 6686893 +4426/20000 train_loss: 2.3054 train_time: 8.7m tok/s: 6686603 +4427/20000 train_loss: 2.3365 train_time: 8.7m tok/s: 6686310 +4428/20000 train_loss: 2.4030 train_time: 8.7m tok/s: 6686044 +4429/20000 train_loss: 2.4062 train_time: 8.7m tok/s: 6685751 +4430/20000 train_loss: 2.5672 train_time: 8.7m tok/s: 6685427 +4431/20000 train_loss: 2.4672 train_time: 8.7m tok/s: 6685108 +4432/20000 train_loss: 2.3617 train_time: 8.7m tok/s: 6684824 +4433/20000 train_loss: 2.2767 train_time: 8.7m tok/s: 6684551 +4434/20000 train_loss: 2.5710 train_time: 8.7m tok/s: 6684237 +4435/20000 train_loss: 2.4378 train_time: 8.7m tok/s: 6683966 +4436/20000 train_loss: 2.4098 train_time: 8.7m tok/s: 6683684 +4437/20000 train_loss: 2.2327 train_time: 8.7m tok/s: 6683389 +4438/20000 train_loss: 2.5216 train_time: 8.7m tok/s: 6683107 +4439/20000 train_loss: 2.4851 train_time: 8.7m tok/s: 6682801 +4440/20000 train_loss: 2.4778 train_time: 8.7m tok/s: 6682532 +4441/20000 train_loss: 2.3551 train_time: 8.7m tok/s: 6682274 +4442/20000 train_loss: 2.4525 train_time: 8.7m tok/s: 6682006 +4443/20000 train_loss: 2.5247 train_time: 8.7m tok/s: 6681728 +4444/20000 train_loss: 2.3760 train_time: 8.7m tok/s: 6681448 +4445/20000 train_loss: 2.3886 train_time: 8.7m tok/s: 6681157 +4446/20000 train_loss: 2.3161 train_time: 8.7m tok/s: 6680875 +4447/20000 train_loss: 2.3658 train_time: 8.7m tok/s: 6680599 +4448/20000 train_loss: 2.3119 train_time: 8.7m tok/s: 6680313 +4449/20000 train_loss: 2.5146 train_time: 8.7m tok/s: 6680025 +4450/20000 train_loss: 2.3453 train_time: 8.7m tok/s: 6679756 +4451/20000 train_loss: 2.4477 train_time: 8.7m tok/s: 6679460 +4452/20000 train_loss: 2.3726 train_time: 8.7m tok/s: 6679171 +4453/20000 train_loss: 2.4806 train_time: 8.7m tok/s: 6678883 +4454/20000 train_loss: 2.3381 train_time: 8.7m tok/s: 6678599 +4455/20000 train_loss: 2.4081 train_time: 8.7m tok/s: 6678330 +4456/20000 train_loss: 2.4185 train_time: 8.7m tok/s: 6678040 +4457/20000 train_loss: 2.6178 train_time: 8.7m tok/s: 6677755 +4458/20000 train_loss: 2.3903 train_time: 8.8m tok/s: 6677481 +4459/20000 train_loss: 2.2668 train_time: 8.8m tok/s: 6677200 +4460/20000 train_loss: 2.3203 train_time: 8.8m tok/s: 6676903 +4461/20000 train_loss: 2.3534 train_time: 8.8m tok/s: 6676630 +4462/20000 train_loss: 2.4823 train_time: 8.8m tok/s: 6676355 +4463/20000 train_loss: 2.2405 train_time: 8.8m tok/s: 6676058 +4464/20000 train_loss: 2.4964 train_time: 8.8m tok/s: 6675779 +4465/20000 train_loss: 2.4620 train_time: 8.8m tok/s: 6675503 +4466/20000 train_loss: 2.5115 train_time: 8.8m tok/s: 6675216 +4467/20000 train_loss: 2.4556 train_time: 8.8m tok/s: 6674934 +4468/20000 train_loss: 2.2133 train_time: 8.8m tok/s: 6674635 +4469/20000 train_loss: 2.3627 train_time: 8.8m tok/s: 6674336 +4470/20000 train_loss: 2.3495 train_time: 8.8m tok/s: 6674080 +4471/20000 train_loss: 2.4358 train_time: 8.8m tok/s: 6673813 +4472/20000 train_loss: 2.4008 train_time: 8.8m tok/s: 6673522 +4473/20000 train_loss: 2.5130 train_time: 8.8m tok/s: 6673244 +4474/20000 train_loss: 2.3601 train_time: 8.8m tok/s: 6672943 +4475/20000 train_loss: 2.4801 train_time: 8.8m tok/s: 6672668 +4476/20000 train_loss: 2.4198 train_time: 8.8m tok/s: 6672396 +4477/20000 train_loss: 2.3434 train_time: 8.8m tok/s: 6672101 +4478/20000 train_loss: 2.3656 train_time: 8.8m tok/s: 6671839 +4479/20000 train_loss: 2.4300 train_time: 8.8m tok/s: 6671565 +4480/20000 train_loss: 2.3815 train_time: 8.8m tok/s: 6671297 +4481/20000 train_loss: 2.4183 train_time: 8.8m tok/s: 6671012 +4482/20000 train_loss: 2.3793 train_time: 8.8m tok/s: 6670721 +4483/20000 train_loss: 2.4184 train_time: 8.8m tok/s: 6670441 +4484/20000 train_loss: 2.5013 train_time: 8.8m tok/s: 6670171 +4485/20000 train_loss: 2.3385 train_time: 8.8m tok/s: 6669904 +4486/20000 train_loss: 2.3924 train_time: 8.8m tok/s: 6669630 +4487/20000 train_loss: 2.3824 train_time: 8.8m tok/s: 6669347 +4488/20000 train_loss: 2.3792 train_time: 8.8m tok/s: 6669059 +4489/20000 train_loss: 2.2798 train_time: 8.8m tok/s: 6668777 +4490/20000 train_loss: 2.4654 train_time: 8.8m tok/s: 6668490 +4491/20000 train_loss: 2.3999 train_time: 8.8m tok/s: 6668217 +4492/20000 train_loss: 2.4507 train_time: 8.8m tok/s: 6667943 +4493/20000 train_loss: 2.4621 train_time: 8.8m tok/s: 6667675 +4494/20000 train_loss: 2.4904 train_time: 8.8m tok/s: 6667402 +4495/20000 train_loss: 2.4479 train_time: 8.8m tok/s: 6667125 +4496/20000 train_loss: 2.3325 train_time: 8.8m tok/s: 6666847 +4497/20000 train_loss: 2.4422 train_time: 8.8m tok/s: 6666556 +4498/20000 train_loss: 2.3722 train_time: 8.8m tok/s: 6666291 +4499/20000 train_loss: 2.3627 train_time: 8.8m tok/s: 6666016 +4500/20000 train_loss: 2.3893 train_time: 8.8m tok/s: 6665741 +4501/20000 train_loss: 2.0625 train_time: 8.9m tok/s: 6665423 +4502/20000 train_loss: 2.3541 train_time: 8.9m tok/s: 6665137 +4503/20000 train_loss: 2.2331 train_time: 8.9m tok/s: 6664858 +4504/20000 train_loss: 2.3204 train_time: 8.9m tok/s: 6664571 +4505/20000 train_loss: 2.3234 train_time: 8.9m tok/s: 6664306 +4506/20000 train_loss: 2.2662 train_time: 8.9m tok/s: 6664033 +4507/20000 train_loss: 2.3547 train_time: 8.9m tok/s: 6663773 +4508/20000 train_loss: 2.4647 train_time: 8.9m tok/s: 6663512 +4509/20000 train_loss: 2.4255 train_time: 8.9m tok/s: 6663242 +4510/20000 train_loss: 2.4542 train_time: 8.9m tok/s: 6662969 +4511/20000 train_loss: 2.2619 train_time: 8.9m tok/s: 6662695 +4512/20000 train_loss: 2.3576 train_time: 8.9m tok/s: 6662426 +4513/20000 train_loss: 2.3776 train_time: 8.9m tok/s: 6662154 +4514/20000 train_loss: 2.3911 train_time: 8.9m tok/s: 6661878 +4515/20000 train_loss: 2.3164 train_time: 8.9m tok/s: 6661594 +4516/20000 train_loss: 2.5665 train_time: 8.9m tok/s: 6661300 +4517/20000 train_loss: 2.6633 train_time: 8.9m tok/s: 6661013 +4518/20000 train_loss: 2.4489 train_time: 8.9m tok/s: 6660748 +4519/20000 train_loss: 2.3679 train_time: 8.9m tok/s: 6660483 +4520/20000 train_loss: 2.3118 train_time: 8.9m tok/s: 6660208 +4521/20000 train_loss: 2.4242 train_time: 8.9m tok/s: 6659941 +4522/20000 train_loss: 2.3461 train_time: 8.9m tok/s: 6659679 +4523/20000 train_loss: 2.4175 train_time: 8.9m tok/s: 6659418 +4524/20000 train_loss: 2.3145 train_time: 8.9m tok/s: 6659142 +4525/20000 train_loss: 2.4896 train_time: 8.9m tok/s: 6658881 +4526/20000 train_loss: 2.4314 train_time: 8.9m tok/s: 6658622 +4527/20000 train_loss: 2.4175 train_time: 8.9m tok/s: 6658353 +4528/20000 train_loss: 2.3307 train_time: 8.9m tok/s: 6658081 +4529/20000 train_loss: 2.3807 train_time: 8.9m tok/s: 6657791 +4530/20000 train_loss: 2.2935 train_time: 8.9m tok/s: 6657508 +4531/20000 train_loss: 2.5323 train_time: 8.9m tok/s: 6657218 +4532/20000 train_loss: 2.3998 train_time: 8.9m tok/s: 6656946 +4533/20000 train_loss: 2.2842 train_time: 8.9m tok/s: 6656699 +4534/20000 train_loss: 2.4274 train_time: 8.9m tok/s: 6656434 +4535/20000 train_loss: 2.4902 train_time: 8.9m tok/s: 6656156 +4536/20000 train_loss: 2.2850 train_time: 8.9m tok/s: 6655868 +4537/20000 train_loss: 2.3818 train_time: 8.9m tok/s: 6655596 +4538/20000 train_loss: 2.1585 train_time: 8.9m tok/s: 6655323 +4539/20000 train_loss: 2.4333 train_time: 8.9m tok/s: 6655046 +4540/20000 train_loss: 2.3692 train_time: 8.9m tok/s: 6654773 +4541/20000 train_loss: 2.2982 train_time: 8.9m tok/s: 6654519 +4542/20000 train_loss: 2.3913 train_time: 8.9m tok/s: 6654250 +4543/20000 train_loss: 2.2984 train_time: 8.9m tok/s: 6653974 +4544/20000 train_loss: 2.6393 train_time: 9.0m tok/s: 6653678 +4545/20000 train_loss: 2.3788 train_time: 9.0m tok/s: 6653429 +4546/20000 train_loss: 2.3848 train_time: 9.0m tok/s: 6653163 +4547/20000 train_loss: 2.3433 train_time: 9.0m tok/s: 6652902 +4548/20000 train_loss: 2.2340 train_time: 9.0m tok/s: 6652637 +4549/20000 train_loss: 2.4305 train_time: 9.0m tok/s: 6652367 +4550/20000 train_loss: 2.4547 train_time: 9.0m tok/s: 6652112 +4551/20000 train_loss: 2.3528 train_time: 9.0m tok/s: 6651843 +4552/20000 train_loss: 2.3424 train_time: 9.0m tok/s: 6651551 +4553/20000 train_loss: 2.2867 train_time: 9.0m tok/s: 6651299 +4554/20000 train_loss: 2.4562 train_time: 9.0m tok/s: 6651008 +4555/20000 train_loss: 2.4521 train_time: 9.0m tok/s: 6650723 +4556/20000 train_loss: 2.4146 train_time: 9.0m tok/s: 6650472 +4557/20000 train_loss: 2.4615 train_time: 9.0m tok/s: 6650210 +4558/20000 train_loss: 2.5737 train_time: 9.0m tok/s: 6649944 +4559/20000 train_loss: 2.4103 train_time: 9.0m tok/s: 6649683 +4560/20000 train_loss: 2.4290 train_time: 9.0m tok/s: 6649409 +4561/20000 train_loss: 2.4408 train_time: 9.0m tok/s: 6649148 +4562/20000 train_loss: 2.3892 train_time: 9.0m tok/s: 6648884 +4563/20000 train_loss: 2.5012 train_time: 9.0m tok/s: 6648617 +4564/20000 train_loss: 2.3222 train_time: 9.0m tok/s: 6648338 +4565/20000 train_loss: 2.3954 train_time: 9.0m tok/s: 6648081 +4566/20000 train_loss: 2.3654 train_time: 9.0m tok/s: 6647802 +4567/20000 train_loss: 2.2291 train_time: 9.0m tok/s: 6647526 +4568/20000 train_loss: 2.2569 train_time: 9.0m tok/s: 6647270 +4569/20000 train_loss: 2.4761 train_time: 9.0m tok/s: 6647010 +4570/20000 train_loss: 2.3456 train_time: 9.0m tok/s: 6646745 +4571/20000 train_loss: 2.4166 train_time: 9.0m tok/s: 6646490 +4572/20000 train_loss: 2.3973 train_time: 9.0m tok/s: 6646233 +4573/20000 train_loss: 2.3936 train_time: 9.0m tok/s: 6645970 +4574/20000 train_loss: 2.4040 train_time: 9.0m tok/s: 6645669 +4575/20000 train_loss: 2.2820 train_time: 9.0m tok/s: 6645382 +4576/20000 train_loss: 2.4210 train_time: 9.0m tok/s: 6645126 +4577/20000 train_loss: 2.4132 train_time: 9.0m tok/s: 6644856 +4578/20000 train_loss: 2.3683 train_time: 9.0m tok/s: 6644602 +4579/20000 train_loss: 2.4723 train_time: 9.0m tok/s: 6644326 +4580/20000 train_loss: 2.2539 train_time: 9.0m tok/s: 6644045 +4581/20000 train_loss: 1.9360 train_time: 9.0m tok/s: 6643735 +4582/20000 train_loss: 2.3699 train_time: 9.0m tok/s: 6643467 +4583/20000 train_loss: 2.4077 train_time: 9.0m tok/s: 6643230 +4584/20000 train_loss: 2.4607 train_time: 9.0m tok/s: 6642978 +4585/20000 train_loss: 2.3160 train_time: 9.0m tok/s: 6642734 +4586/20000 train_loss: 2.3568 train_time: 9.0m tok/s: 6642490 +4587/20000 train_loss: 2.5044 train_time: 9.1m tok/s: 6642202 +4588/20000 train_loss: 2.3972 train_time: 9.1m tok/s: 6641940 +4589/20000 train_loss: 2.4570 train_time: 9.1m tok/s: 6641707 +4590/20000 train_loss: 2.3079 train_time: 9.1m tok/s: 6641435 +4591/20000 train_loss: 2.3678 train_time: 9.1m tok/s: 6641170 +4592/20000 train_loss: 2.2545 train_time: 9.1m tok/s: 6640903 +4593/20000 train_loss: 2.4548 train_time: 9.1m tok/s: 6640641 +4594/20000 train_loss: 2.3201 train_time: 9.1m tok/s: 6640383 +4595/20000 train_loss: 2.1861 train_time: 9.1m tok/s: 6640094 +4596/20000 train_loss: 2.4291 train_time: 9.1m tok/s: 6639854 +4597/20000 train_loss: 2.4121 train_time: 9.1m tok/s: 6639569 +4598/20000 train_loss: 2.4620 train_time: 9.1m tok/s: 6639325 +4599/20000 train_loss: 2.3751 train_time: 9.1m tok/s: 6639077 +4600/20000 train_loss: 2.5605 train_time: 9.1m tok/s: 6638805 +4601/20000 train_loss: 2.4847 train_time: 9.1m tok/s: 6638538 +4602/20000 train_loss: 2.5111 train_time: 9.1m tok/s: 6638265 +4603/20000 train_loss: 2.3835 train_time: 9.1m tok/s: 6638012 +4604/20000 train_loss: 2.3461 train_time: 9.1m tok/s: 6637749 +4605/20000 train_loss: 2.3489 train_time: 9.1m tok/s: 6637475 +4606/20000 train_loss: 2.3481 train_time: 9.1m tok/s: 6637229 +4607/20000 train_loss: 2.3534 train_time: 9.1m tok/s: 6636980 +4608/20000 train_loss: 2.3675 train_time: 9.1m tok/s: 6636723 +4609/20000 train_loss: 2.3199 train_time: 9.1m tok/s: 6636449 +4610/20000 train_loss: 2.4209 train_time: 9.1m tok/s: 6636172 +4611/20000 train_loss: 2.5822 train_time: 9.1m tok/s: 6635896 +4612/20000 train_loss: 2.4527 train_time: 9.1m tok/s: 6635632 +4613/20000 train_loss: 2.5041 train_time: 9.1m tok/s: 6635393 +4614/20000 train_loss: 2.4581 train_time: 9.1m tok/s: 6635144 +4615/20000 train_loss: 2.3032 train_time: 9.1m tok/s: 6634842 +4616/20000 train_loss: 2.2758 train_time: 9.1m tok/s: 6634604 +4617/20000 train_loss: 2.3151 train_time: 9.1m tok/s: 6634330 +4618/20000 train_loss: 2.3739 train_time: 9.1m tok/s: 6634101 +4619/20000 train_loss: 2.2872 train_time: 9.1m tok/s: 6633847 +4620/20000 train_loss: 2.3496 train_time: 9.1m tok/s: 6633601 +4621/20000 train_loss: 2.3893 train_time: 9.1m tok/s: 6633329 +4622/20000 train_loss: 2.4201 train_time: 9.1m tok/s: 6633048 +4623/20000 train_loss: 2.3730 train_time: 9.1m tok/s: 6632795 +4624/20000 train_loss: 2.5461 train_time: 9.1m tok/s: 6632525 +4625/20000 train_loss: 2.5872 train_time: 9.1m tok/s: 6632255 +4626/20000 train_loss: 2.4973 train_time: 9.1m tok/s: 6631995 +4627/20000 train_loss: 2.3998 train_time: 9.1m tok/s: 6631763 +4628/20000 train_loss: 2.4532 train_time: 9.1m tok/s: 6631496 +4629/20000 train_loss: 2.3730 train_time: 9.1m tok/s: 6631238 +4630/20000 train_loss: 2.1497 train_time: 9.2m tok/s: 6630992 +4631/20000 train_loss: 2.3869 train_time: 9.2m tok/s: 6630731 +4632/20000 train_loss: 2.3133 train_time: 9.2m tok/s: 6630472 +4633/20000 train_loss: 2.2332 train_time: 9.2m tok/s: 6630217 +4634/20000 train_loss: 2.3477 train_time: 9.2m tok/s: 6629970 +4635/20000 train_loss: 2.4743 train_time: 9.2m tok/s: 6629690 +4636/20000 train_loss: 2.3872 train_time: 9.2m tok/s: 6629437 +4637/20000 train_loss: 2.4627 train_time: 9.2m tok/s: 6629183 +4638/20000 train_loss: 2.4532 train_time: 9.2m tok/s: 6628944 +4639/20000 train_loss: 2.3911 train_time: 9.2m tok/s: 6628694 +4640/20000 train_loss: 2.4071 train_time: 9.2m tok/s: 6628431 +4641/20000 train_loss: 2.3469 train_time: 9.2m tok/s: 6628160 +4642/20000 train_loss: 2.3123 train_time: 9.2m tok/s: 6627909 +4643/20000 train_loss: 2.4233 train_time: 9.2m tok/s: 6627656 +4644/20000 train_loss: 2.2570 train_time: 9.2m tok/s: 6627383 +4645/20000 train_loss: 2.3501 train_time: 9.2m tok/s: 6627137 +4646/20000 train_loss: 2.2881 train_time: 9.2m tok/s: 6626877 +4647/20000 train_loss: 2.1976 train_time: 9.2m tok/s: 6626621 +4648/20000 train_loss: 2.3736 train_time: 9.2m tok/s: 6626370 +4649/20000 train_loss: 2.3928 train_time: 9.2m tok/s: 6626112 +4650/20000 train_loss: 2.4094 train_time: 9.2m tok/s: 6625867 +4651/20000 train_loss: 2.4866 train_time: 9.2m tok/s: 6625607 +4652/20000 train_loss: 2.4640 train_time: 9.2m tok/s: 6625341 +4653/20000 train_loss: 2.4327 train_time: 9.2m tok/s: 6625087 +4654/20000 train_loss: 2.2998 train_time: 9.2m tok/s: 6624831 +4655/20000 train_loss: 2.2761 train_time: 9.2m tok/s: 6624558 +4656/20000 train_loss: 2.4700 train_time: 9.2m tok/s: 6624307 +4657/20000 train_loss: 2.3573 train_time: 9.2m tok/s: 6624052 +4658/20000 train_loss: 2.2982 train_time: 9.2m tok/s: 6623802 +4659/20000 train_loss: 2.3307 train_time: 9.2m tok/s: 6623558 +4660/20000 train_loss: 2.3383 train_time: 9.2m tok/s: 6623295 +4661/20000 train_loss: 2.0318 train_time: 9.2m tok/s: 6623011 +4662/20000 train_loss: 2.3587 train_time: 9.2m tok/s: 6622750 +4663/20000 train_loss: 2.3864 train_time: 9.2m tok/s: 6622528 +4664/20000 train_loss: 2.2969 train_time: 9.2m tok/s: 6622271 +4665/20000 train_loss: 2.4456 train_time: 9.2m tok/s: 6622014 +4666/20000 train_loss: 2.3418 train_time: 9.2m tok/s: 6621761 +4667/20000 train_loss: 2.4351 train_time: 9.2m tok/s: 6621489 +4668/20000 train_loss: 2.4018 train_time: 9.2m tok/s: 6621268 +4669/20000 train_loss: 2.6143 train_time: 9.2m tok/s: 6621009 +4670/20000 train_loss: 2.2812 train_time: 9.2m tok/s: 6620756 +4671/20000 train_loss: 2.4567 train_time: 9.2m tok/s: 6620515 +4672/20000 train_loss: 2.3482 train_time: 9.2m tok/s: 6620258 +4673/20000 train_loss: 2.3265 train_time: 9.3m tok/s: 6619998 +4674/20000 train_loss: 2.4360 train_time: 9.3m tok/s: 6619740 +4675/20000 train_loss: 2.3730 train_time: 9.3m tok/s: 6619487 +4676/20000 train_loss: 2.3639 train_time: 9.3m tok/s: 6619231 +4677/20000 train_loss: 2.4182 train_time: 9.3m tok/s: 6618976 +4678/20000 train_loss: 2.3871 train_time: 9.3m tok/s: 6618731 +4679/20000 train_loss: 2.2972 train_time: 9.3m tok/s: 6618489 +4680/20000 train_loss: 2.3032 train_time: 9.3m tok/s: 6618235 +4681/20000 train_loss: 2.3774 train_time: 9.3m tok/s: 6617979 +4682/20000 train_loss: 2.2883 train_time: 9.3m tok/s: 6617726 +4683/20000 train_loss: 2.3618 train_time: 9.3m tok/s: 6617466 +4684/20000 train_loss: 2.3504 train_time: 9.3m tok/s: 6617224 +4685/20000 train_loss: 2.3216 train_time: 9.3m tok/s: 6616957 +4686/20000 train_loss: 2.2538 train_time: 9.3m tok/s: 6616692 +4687/20000 train_loss: 2.3872 train_time: 9.3m tok/s: 6616464 +4688/20000 train_loss: 2.4494 train_time: 9.3m tok/s: 6616198 +4689/20000 train_loss: 2.4383 train_time: 9.3m tok/s: 6615951 +4690/20000 train_loss: 2.5013 train_time: 9.3m tok/s: 6615709 +4691/20000 train_loss: 2.4025 train_time: 9.3m tok/s: 6615465 +4692/20000 train_loss: 2.4222 train_time: 9.3m tok/s: 6615218 +4693/20000 train_loss: 2.3704 train_time: 9.3m tok/s: 6614962 +4694/20000 train_loss: 2.4529 train_time: 9.3m tok/s: 6614699 +4695/20000 train_loss: 2.4441 train_time: 9.3m tok/s: 6614452 +4696/20000 train_loss: 2.3430 train_time: 9.3m tok/s: 6614224 +4697/20000 train_loss: 2.3407 train_time: 9.3m tok/s: 6613989 +4698/20000 train_loss: 2.3711 train_time: 9.3m tok/s: 6613747 +4699/20000 train_loss: 2.3680 train_time: 9.3m tok/s: 6613525 +4700/20000 train_loss: 2.1593 train_time: 9.3m tok/s: 6613285 +4701/20000 train_loss: 2.4174 train_time: 9.3m tok/s: 6613024 +4702/20000 train_loss: 2.5016 train_time: 9.3m tok/s: 6612738 +4703/20000 train_loss: 2.4465 train_time: 9.3m tok/s: 6612475 +4704/20000 train_loss: 2.3664 train_time: 9.3m tok/s: 6612230 +4705/20000 train_loss: 2.3343 train_time: 9.3m tok/s: 6611983 +4706/20000 train_loss: 2.3325 train_time: 9.3m tok/s: 6611732 +4707/20000 train_loss: 2.4488 train_time: 9.3m tok/s: 6611484 +4708/20000 train_loss: 2.3617 train_time: 9.3m tok/s: 6611214 +4709/20000 train_loss: 2.3512 train_time: 9.3m tok/s: 6610974 +4710/20000 train_loss: 2.3956 train_time: 9.3m tok/s: 6610719 +4711/20000 train_loss: 2.3652 train_time: 9.3m tok/s: 6610455 +4712/20000 train_loss: 2.3200 train_time: 9.3m tok/s: 6610212 +4713/20000 train_loss: 2.4440 train_time: 9.3m tok/s: 6609974 +4714/20000 train_loss: 2.3454 train_time: 9.3m tok/s: 6609693 +4715/20000 train_loss: 2.4642 train_time: 9.4m tok/s: 6609446 +4716/20000 train_loss: 2.4964 train_time: 9.4m tok/s: 6609227 +4717/20000 train_loss: 2.3299 train_time: 9.4m tok/s: 6608975 +4718/20000 train_loss: 2.3180 train_time: 9.4m tok/s: 6608728 +4719/20000 train_loss: 2.3508 train_time: 9.4m tok/s: 6608494 +4720/20000 train_loss: 2.3042 train_time: 9.4m tok/s: 6608243 +4721/20000 train_loss: 2.2802 train_time: 9.4m tok/s: 6607989 +4722/20000 train_loss: 2.4781 train_time: 9.4m tok/s: 6607745 +4723/20000 train_loss: 2.3232 train_time: 9.4m tok/s: 6607516 +4724/20000 train_loss: 2.3658 train_time: 9.4m tok/s: 6607285 +4725/20000 train_loss: 2.2568 train_time: 9.4m tok/s: 6607010 +4726/20000 train_loss: 2.3997 train_time: 9.4m tok/s: 6606752 +4727/20000 train_loss: 2.3791 train_time: 9.4m tok/s: 6606512 +4728/20000 train_loss: 2.3625 train_time: 9.4m tok/s: 6606279 +4729/20000 train_loss: 2.3695 train_time: 9.4m tok/s: 6606029 +4730/20000 train_loss: 2.2011 train_time: 9.4m tok/s: 6605777 +4731/20000 train_loss: 2.3642 train_time: 9.4m tok/s: 6605526 +4732/20000 train_loss: 2.3179 train_time: 9.4m tok/s: 6605281 +4733/20000 train_loss: 2.4135 train_time: 9.4m tok/s: 6605048 +4734/20000 train_loss: 2.3972 train_time: 9.4m tok/s: 6604791 +4735/20000 train_loss: 2.1971 train_time: 9.4m tok/s: 6604528 +4736/20000 train_loss: 2.3380 train_time: 9.4m tok/s: 6604297 +4737/20000 train_loss: 2.4789 train_time: 9.4m tok/s: 6604066 +4738/20000 train_loss: 2.3485 train_time: 9.4m tok/s: 6603819 +4739/20000 train_loss: 2.4849 train_time: 9.4m tok/s: 6603513 +4740/20000 train_loss: 2.4546 train_time: 9.4m tok/s: 6603244 +4741/20000 train_loss: 2.3771 train_time: 9.4m tok/s: 6603001 +4742/20000 train_loss: 2.3396 train_time: 9.4m tok/s: 6602770 +4743/20000 train_loss: 2.2962 train_time: 9.4m tok/s: 6602504 +4744/20000 train_loss: 2.3683 train_time: 9.4m tok/s: 6602271 +4745/20000 train_loss: 2.2571 train_time: 9.4m tok/s: 6602037 +4746/20000 train_loss: 2.2258 train_time: 9.4m tok/s: 6601803 +4747/20000 train_loss: 2.4580 train_time: 9.4m tok/s: 6601578 +4748/20000 train_loss: 2.4003 train_time: 9.4m tok/s: 6601324 +4749/20000 train_loss: 2.3674 train_time: 9.4m tok/s: 6601058 +4750/20000 train_loss: 2.4057 train_time: 9.4m tok/s: 6600829 +4751/20000 train_loss: 2.3263 train_time: 9.4m tok/s: 6600612 +4752/20000 train_loss: 2.3807 train_time: 9.4m tok/s: 6600395 +4753/20000 train_loss: 2.2601 train_time: 9.4m tok/s: 6600143 +4754/20000 train_loss: 2.2218 train_time: 9.4m tok/s: 6599902 +4755/20000 train_loss: 2.4101 train_time: 9.4m tok/s: 6599659 +4756/20000 train_loss: 2.3789 train_time: 9.4m tok/s: 6599416 +4757/20000 train_loss: 2.3288 train_time: 9.4m tok/s: 6599184 +4758/20000 train_loss: 2.5561 train_time: 9.5m tok/s: 6598935 +4759/20000 train_loss: 2.3883 train_time: 9.5m tok/s: 6598707 +4760/20000 train_loss: 2.2655 train_time: 9.5m tok/s: 6598458 +4761/20000 train_loss: 2.4424 train_time: 9.5m tok/s: 6598228 +4762/20000 train_loss: 2.3452 train_time: 9.5m tok/s: 6597984 +4763/20000 train_loss: 2.4821 train_time: 9.5m tok/s: 6597740 +4764/20000 train_loss: 2.4402 train_time: 9.5m tok/s: 6597499 +4765/20000 train_loss: 2.4231 train_time: 9.5m tok/s: 6597243 +4766/20000 train_loss: 2.3529 train_time: 9.5m tok/s: 6597009 +4767/20000 train_loss: 2.3098 train_time: 9.5m tok/s: 6596770 +4768/20000 train_loss: 2.3795 train_time: 9.5m tok/s: 6596542 +4769/20000 train_loss: 2.3511 train_time: 9.5m tok/s: 6596286 +4770/20000 train_loss: 2.4208 train_time: 9.5m tok/s: 6596048 +4771/20000 train_loss: 2.3472 train_time: 9.5m tok/s: 6595799 +4772/20000 train_loss: 2.4243 train_time: 9.5m tok/s: 6595553 +4773/20000 train_loss: 2.4387 train_time: 9.5m tok/s: 6595324 +4774/20000 train_loss: 2.5482 train_time: 9.5m tok/s: 6595072 +4775/20000 train_loss: 2.4249 train_time: 9.5m tok/s: 6594828 +4776/20000 train_loss: 2.5234 train_time: 9.5m tok/s: 6594607 +4777/20000 train_loss: 2.3981 train_time: 9.5m tok/s: 6594360 +4778/20000 train_loss: 2.3642 train_time: 9.5m tok/s: 6594125 +4779/20000 train_loss: 2.2030 train_time: 9.5m tok/s: 6593864 +4780/20000 train_loss: 2.3174 train_time: 9.5m tok/s: 6593648 +4781/20000 train_loss: 2.3086 train_time: 9.5m tok/s: 6593412 +4782/20000 train_loss: 2.7115 train_time: 9.5m tok/s: 6593145 +4783/20000 train_loss: 2.4247 train_time: 9.5m tok/s: 6592911 +4784/20000 train_loss: 2.3788 train_time: 9.5m tok/s: 6592670 +4785/20000 train_loss: 2.4254 train_time: 9.5m tok/s: 6592444 +4786/20000 train_loss: 2.4206 train_time: 9.5m tok/s: 6592204 +4787/20000 train_loss: 2.3607 train_time: 9.5m tok/s: 6591967 +4788/20000 train_loss: 2.3773 train_time: 9.5m tok/s: 6591719 +4789/20000 train_loss: 2.1603 train_time: 9.5m tok/s: 6591468 +4790/20000 train_loss: 2.4110 train_time: 9.5m tok/s: 6591226 +4791/20000 train_loss: 2.3540 train_time: 9.5m tok/s: 6590996 +4792/20000 train_loss: 2.2519 train_time: 9.5m tok/s: 6590756 +4793/20000 train_loss: 2.3512 train_time: 9.5m tok/s: 6590528 +4794/20000 train_loss: 2.2290 train_time: 9.5m tok/s: 6590291 +4795/20000 train_loss: 2.3914 train_time: 9.5m tok/s: 6590044 +4796/20000 train_loss: 2.2587 train_time: 9.5m tok/s: 6589825 +4797/20000 train_loss: 2.4032 train_time: 9.5m tok/s: 6589593 +4798/20000 train_loss: 2.5524 train_time: 9.5m tok/s: 6589341 +4799/20000 train_loss: 2.4745 train_time: 9.5m tok/s: 6589099 +4800/20000 train_loss: 2.4482 train_time: 9.5m tok/s: 6588877 +4801/20000 train_loss: 2.4210 train_time: 9.6m tok/s: 6588627 +4802/20000 train_loss: 2.3271 train_time: 9.6m tok/s: 6588388 +4803/20000 train_loss: 2.5179 train_time: 9.6m tok/s: 6588151 +4804/20000 train_loss: 1.9512 train_time: 9.6m tok/s: 6587885 +4805/20000 train_loss: 2.3943 train_time: 9.6m tok/s: 6587630 +4806/20000 train_loss: 2.3767 train_time: 9.6m tok/s: 6587407 +4807/20000 train_loss: 2.3649 train_time: 9.6m tok/s: 6587186 +4808/20000 train_loss: 2.2943 train_time: 9.6m tok/s: 6586974 +4809/20000 train_loss: 2.4094 train_time: 9.6m tok/s: 6586733 +4810/20000 train_loss: 2.4587 train_time: 9.6m tok/s: 6586500 +4811/20000 train_loss: 2.4021 train_time: 9.6m tok/s: 6586270 +4812/20000 train_loss: 2.3557 train_time: 9.6m tok/s: 6586039 +4813/20000 train_loss: 2.3547 train_time: 9.6m tok/s: 6585813 +4814/20000 train_loss: 2.3850 train_time: 9.6m tok/s: 6585598 +4815/20000 train_loss: 2.4439 train_time: 9.6m tok/s: 6585356 +4816/20000 train_loss: 2.3621 train_time: 9.6m tok/s: 6585131 +4817/20000 train_loss: 2.4960 train_time: 9.6m tok/s: 6584880 +4818/20000 train_loss: 2.2846 train_time: 9.6m tok/s: 6584652 +4819/20000 train_loss: 2.4374 train_time: 9.6m tok/s: 6584412 +4820/20000 train_loss: 2.3061 train_time: 9.6m tok/s: 6584171 +4821/20000 train_loss: 2.3938 train_time: 9.6m tok/s: 6583934 +4822/20000 train_loss: 2.4127 train_time: 9.6m tok/s: 6583698 +4823/20000 train_loss: 2.3806 train_time: 9.6m tok/s: 6583463 +4824/20000 train_loss: 2.4818 train_time: 9.6m tok/s: 6583212 +4825/20000 train_loss: 2.5654 train_time: 9.6m tok/s: 6582964 +4826/20000 train_loss: 2.3006 train_time: 9.6m tok/s: 6582731 +4827/20000 train_loss: 2.2837 train_time: 9.6m tok/s: 6582497 +4828/20000 train_loss: 2.3203 train_time: 9.6m tok/s: 6582256 +4829/20000 train_loss: 2.3589 train_time: 9.6m tok/s: 6582034 +4830/20000 train_loss: 2.3424 train_time: 9.6m tok/s: 6581802 +4831/20000 train_loss: 2.2999 train_time: 9.6m tok/s: 6581565 +4832/20000 train_loss: 2.2531 train_time: 9.6m tok/s: 6581330 +4833/20000 train_loss: 2.2698 train_time: 9.6m tok/s: 6581100 +4834/20000 train_loss: 2.3898 train_time: 9.6m tok/s: 6580869 +4835/20000 train_loss: 2.3357 train_time: 9.6m tok/s: 6580631 +4836/20000 train_loss: 2.3537 train_time: 9.6m tok/s: 6580392 +4837/20000 train_loss: 2.5246 train_time: 9.6m tok/s: 6580156 +4838/20000 train_loss: 2.2803 train_time: 9.6m tok/s: 6579933 +4839/20000 train_loss: 2.4344 train_time: 9.6m tok/s: 6579699 +4840/20000 train_loss: 2.2699 train_time: 9.6m tok/s: 6579471 +4841/20000 train_loss: 2.4284 train_time: 9.6m tok/s: 6579235 +4842/20000 train_loss: 2.6822 train_time: 9.6m tok/s: 6578995 +4843/20000 train_loss: 2.3030 train_time: 9.6m tok/s: 6578766 +4844/20000 train_loss: 2.2924 train_time: 9.7m tok/s: 6578534 +4845/20000 train_loss: 2.3260 train_time: 9.7m tok/s: 6578309 +4846/20000 train_loss: 2.3424 train_time: 9.7m tok/s: 6578058 +4847/20000 train_loss: 2.5513 train_time: 9.7m tok/s: 6577827 +4848/20000 train_loss: 2.3126 train_time: 9.7m tok/s: 6577612 +4849/20000 train_loss: 2.4517 train_time: 9.7m tok/s: 6577406 +4850/20000 train_loss: 2.2968 train_time: 9.7m tok/s: 6577176 +4851/20000 train_loss: 2.3395 train_time: 9.7m tok/s: 6576946 +4852/20000 train_loss: 2.3566 train_time: 9.7m tok/s: 6576713 +4853/20000 train_loss: 2.2958 train_time: 9.7m tok/s: 6576461 +4854/20000 train_loss: 2.4586 train_time: 9.7m tok/s: 6576224 +4855/20000 train_loss: 2.3479 train_time: 9.7m tok/s: 6575994 +4856/20000 train_loss: 2.2798 train_time: 9.7m tok/s: 6575764 +4857/20000 train_loss: 2.2888 train_time: 9.7m tok/s: 6575528 +4858/20000 train_loss: 2.2703 train_time: 9.7m tok/s: 6575297 +4859/20000 train_loss: 2.3072 train_time: 9.7m tok/s: 6575054 +4860/20000 train_loss: 2.4742 train_time: 9.7m tok/s: 6574843 +4861/20000 train_loss: 2.4070 train_time: 9.7m tok/s: 6574606 +4862/20000 train_loss: 2.3170 train_time: 9.7m tok/s: 6574377 +4863/20000 train_loss: 2.5817 train_time: 9.7m tok/s: 6574152 +4864/20000 train_loss: 2.4021 train_time: 9.7m tok/s: 6573926 +4865/20000 train_loss: 2.5086 train_time: 9.7m tok/s: 6573684 +4866/20000 train_loss: 2.2684 train_time: 9.7m tok/s: 6573432 +4867/20000 train_loss: 2.2499 train_time: 9.7m tok/s: 6573229 +4868/20000 train_loss: 2.3115 train_time: 9.7m tok/s: 6572994 +4869/20000 train_loss: 2.4386 train_time: 9.7m tok/s: 6572734 +4870/20000 train_loss: 2.3615 train_time: 9.7m tok/s: 6572503 +4871/20000 train_loss: 2.3669 train_time: 9.7m tok/s: 6572260 +4872/20000 train_loss: 2.2766 train_time: 9.7m tok/s: 6572002 +4873/20000 train_loss: 2.5455 train_time: 9.7m tok/s: 6571771 +4874/20000 train_loss: 2.4211 train_time: 9.7m tok/s: 6571567 +4875/20000 train_loss: 2.4091 train_time: 9.7m tok/s: 6571338 +4876/20000 train_loss: 2.4026 train_time: 9.7m tok/s: 6571122 +4877/20000 train_loss: 2.3028 train_time: 9.7m tok/s: 6570906 +4878/20000 train_loss: 2.4061 train_time: 9.7m tok/s: 6570666 +4879/20000 train_loss: 2.3609 train_time: 9.7m tok/s: 6570447 +4880/20000 train_loss: 2.4363 train_time: 9.7m tok/s: 6570209 +4881/20000 train_loss: 2.3474 train_time: 9.7m tok/s: 6569989 +4882/20000 train_loss: 2.2730 train_time: 9.7m tok/s: 6569787 +4883/20000 train_loss: 2.2513 train_time: 9.7m tok/s: 6569565 +4884/20000 train_loss: 2.6065 train_time: 9.7m tok/s: 6569336 +4885/20000 train_loss: 2.4676 train_time: 9.7m tok/s: 6569108 +4886/20000 train_loss: 2.4385 train_time: 9.7m tok/s: 6568877 +4887/20000 train_loss: 2.4054 train_time: 9.8m tok/s: 6568653 +4888/20000 train_loss: 2.4049 train_time: 9.8m tok/s: 6568428 +4889/20000 train_loss: 2.3510 train_time: 9.8m tok/s: 6568204 +4890/20000 train_loss: 2.4081 train_time: 9.8m tok/s: 6567963 +4891/20000 train_loss: 2.1723 train_time: 9.8m tok/s: 6567729 +4892/20000 train_loss: 1.9563 train_time: 9.8m tok/s: 6567455 +4893/20000 train_loss: 2.4398 train_time: 9.8m tok/s: 6567229 +4894/20000 train_loss: 2.3813 train_time: 9.8m tok/s: 6567014 +4895/20000 train_loss: 2.3754 train_time: 9.8m tok/s: 6566811 +4895/20000 val_loss: 2.3589 val_bpb: 1.0778 +stopping_early: wallclock_cap train_time: 586261ms step: 4895/20000 +peak memory allocated: 41707 MiB reserved: 47048 MiB +ema:applying EMA weights +diagnostic pre-quantization post-ema val_loss:2.33485027 val_bpb:1.06686742 eval_time:7879ms +Serialized model: 135418111 bytes +Code size (uncompressed): 182796 bytes +Code size (compressed): 45910 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 4.1s +Quantized weights: + gate_int8_row: blocks.attn.attn_gate_w + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int6)+lqer_asym: blocks.mlp.fc.weight + gptq (int7)+awqgrpint8+lqer_asym: tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, parallel_post_lambdas, parallel_resid_lambdas, skip_gates, skip_weights, smear_gate.weight, smear_lambda, softcap_neg, softcap_pos +Serialize: per-group lrzip compression... +Serialize: per-group compression done in 122.7s +Serialized model quantized+pergroup: 15943979 bytes +Total submission size quantized+pergroup: 15989889 bytes +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 21.1s +diagnostic quantized val_loss:2.35208733 val_bpb:1.07474358 eval_time:11043ms +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 22.5s +ttt_lora:warming up compile (random tokens, no val data) +ttt_lora:compile warmup done (110.0s) +v5:precomputing ngram hints OUTSIDE eval timer +ngram_tilt:hints total=47851520 gated=13023303 token_gate=628130 within_gate=9866847 word_gate=2891588 agree2plus=303177 +ngram_tilt:precompute_outside_timer_done elapsed=164.53s total_targets=47851520 + +beginning TTT eval timer +ngram_tilt:using_precomputed_hints total_targets=47851520 (precompute time excluded from eval) +ttt_phased: total_docs:50000 prefix_docs:2500 suffix_docs:47500 num_phases:3 boundaries:[833, 1666, 2500] +ttp: b780/782 bl:2.2320 bb:1.0752 rl:2.2320 rb:1.0752 dl:13091-17244 gd:0 +ttp: b764/782 bl:2.2901 bb:1.0727 rl:2.2452 rb:1.0746 dl:4284-4392 gd:0 +ttpp: phase:1/3 pd:1296 gd:833 t:224.0s +tttg: c1/131 lr:0.001000 t:0.3s +tttg: c2/131 lr:0.001000 t:0.4s +tttg: c3/131 lr:0.000999 t:0.5s +tttg: c4/131 lr:0.000999 t:0.6s +tttg: c5/131 lr:0.000998 t:0.7s +tttg: c6/131 lr:0.000996 t:0.8s +tttg: c7/131 lr:0.000995 t:0.9s +tttg: c8/131 lr:0.000993 t:0.9s +tttg: c9/131 lr:0.000991 t:1.0s +tttg: c10/131 lr:0.000988 t:1.1s +tttg: c11/131 lr:0.000985 t:1.1s +tttg: c12/131 lr:0.000982 t:1.2s +tttg: c13/131 lr:0.000979 t:1.3s +tttg: c14/131 lr:0.000976 t:1.4s +tttg: c15/131 lr:0.000972 t:1.4s +tttg: c16/131 lr:0.000968 t:1.5s +tttg: c17/131 lr:0.000963 t:1.6s +tttg: c18/131 lr:0.000958 t:1.7s +tttg: c19/131 lr:0.000953 t:1.7s +tttg: c20/131 lr:0.000948 t:1.8s +tttg: c21/131 lr:0.000943 t:1.9s +tttg: c22/131 lr:0.000937 t:2.0s +tttg: c23/131 lr:0.000931 t:2.0s +tttg: c24/131 lr:0.000925 t:2.1s +tttg: c25/131 lr:0.000918 t:2.2s +tttg: c26/131 lr:0.000911 t:2.3s +tttg: c27/131 lr:0.000905 t:2.3s +tttg: c28/131 lr:0.000897 t:2.4s +tttg: c29/131 lr:0.000890 t:2.5s +tttg: c30/131 lr:0.000882 t:2.6s +tttg: c31/131 lr:0.000874 t:2.7s +tttg: c32/131 lr:0.000866 t:2.7s +tttg: c33/131 lr:0.000858 t:2.8s +tttg: c34/131 lr:0.000849 t:2.9s +tttg: c35/131 lr:0.000841 t:2.9s +tttg: c36/131 lr:0.000832 t:3.0s +tttg: c37/131 lr:0.000822 t:3.1s +tttg: c38/131 lr:0.000813 t:3.2s +tttg: c39/131 lr:0.000804 t:3.2s +tttg: c40/131 lr:0.000794 t:3.3s +tttg: c41/131 lr:0.000784 t:3.4s +tttg: c42/131 lr:0.000774 t:3.5s +tttg: c43/131 lr:0.000764 t:3.6s +tttg: c44/131 lr:0.000753 t:3.6s +tttg: c45/131 lr:0.000743 t:3.7s +tttg: c46/131 lr:0.000732 t:3.8s +tttg: c47/131 lr:0.000722 t:3.8s +tttg: c48/131 lr:0.000711 t:3.9s +tttg: c49/131 lr:0.000700 t:4.0s +tttg: c50/131 lr:0.000689 t:4.1s +tttg: c51/131 lr:0.000677 t:4.1s +tttg: c52/131 lr:0.000666 t:4.2s +tttg: c53/131 lr:0.000655 t:4.3s +tttg: c54/131 lr:0.000643 t:4.4s +tttg: c55/131 lr:0.000631 t:4.4s +tttg: c56/131 lr:0.000620 t:4.5s +tttg: c57/131 lr:0.000608 t:4.6s +tttg: c58/131 lr:0.000596 t:4.7s +tttg: c59/131 lr:0.000584 t:4.7s +tttg: c60/131 lr:0.000572 t:4.8s +tttg: c61/131 lr:0.000560 t:4.9s +tttg: c62/131 lr:0.000548 t:5.0s +tttg: c63/131 lr:0.000536 t:5.0s +tttg: c64/131 lr:0.000524 t:5.1s +tttg: c65/131 lr:0.000512 t:5.2s +tttg: c66/131 lr:0.000500 t:5.3s +tttg: c67/131 lr:0.000488 t:5.3s +tttg: c68/131 lr:0.000476 t:5.4s +tttg: c69/131 lr:0.000464 t:5.5s +tttg: c70/131 lr:0.000452 t:5.6s +tttg: c71/131 lr:0.000440 t:5.6s +tttg: c72/131 lr:0.000428 t:5.7s +tttg: c73/131 lr:0.000416 t:5.8s +tttg: c74/131 lr:0.000404 t:5.9s +tttg: c75/131 lr:0.000392 t:6.0s +tttg: c76/131 lr:0.000380 t:6.0s +tttg: c77/131 lr:0.000369 t:6.1s +tttg: c78/131 lr:0.000357 t:6.2s +tttg: c79/131 lr:0.000345 t:6.2s +tttg: c80/131 lr:0.000334 t:6.3s +tttg: c81/131 lr:0.000323 t:6.4s +tttg: c82/131 lr:0.000311 t:6.5s +tttg: c83/131 lr:0.000300 t:6.5s +tttg: c84/131 lr:0.000289 t:6.6s +tttg: c85/131 lr:0.000278 t:6.7s +tttg: c86/131 lr:0.000268 t:6.8s +tttg: c87/131 lr:0.000257 t:6.8s +tttg: c88/131 lr:0.000247 t:6.9s +tttg: c89/131 lr:0.000236 t:7.0s +tttg: c90/131 lr:0.000226 t:7.1s +tttg: c91/131 lr:0.000216 t:7.1s +tttg: c92/131 lr:0.000206 t:7.2s +tttg: c93/131 lr:0.000196 t:7.3s +tttg: c94/131 lr:0.000187 t:7.4s +tttg: c95/131 lr:0.000178 t:7.4s +tttg: c96/131 lr:0.000168 t:7.5s +tttg: c97/131 lr:0.000159 t:7.6s +tttg: c98/131 lr:0.000151 t:7.7s +tttg: c99/131 lr:0.000142 t:7.7s +tttg: c100/131 lr:0.000134 t:7.8s +tttg: c101/131 lr:0.000126 t:7.9s +tttg: c102/131 lr:0.000118 t:8.0s +tttg: c103/131 lr:0.000110 t:8.0s +tttg: c104/131 lr:0.000103 t:8.1s +tttg: c105/131 lr:0.000095 t:8.2s +tttg: c106/131 lr:0.000089 t:8.3s +tttg: c107/131 lr:0.000082 t:8.3s +tttg: c108/131 lr:0.000075 t:8.4s +tttg: c109/131 lr:0.000069 t:8.5s +tttg: c110/131 lr:0.000063 t:8.6s +tttg: c111/131 lr:0.000057 t:8.6s +tttg: c112/131 lr:0.000052 t:8.7s +tttg: c113/131 lr:0.000047 t:8.8s +tttg: c114/131 lr:0.000042 t:8.8s +tttg: c115/131 lr:0.000037 t:8.9s +tttg: c116/131 lr:0.000032 t:9.0s +tttg: c117/131 lr:0.000028 t:9.1s +tttg: c118/131 lr:0.000024 t:9.1s +tttg: c119/131 lr:0.000021 t:9.2s +tttg: c120/131 lr:0.000018 t:9.3s +tttg: c121/131 lr:0.000015 t:9.4s +tttg: c122/131 lr:0.000012 t:9.4s +tttg: c123/131 lr:0.000009 t:9.5s +tttg: c124/131 lr:0.000007 t:9.6s +tttg: c125/131 lr:0.000005 t:9.7s +tttg: c126/131 lr:0.000004 t:9.7s +tttg: c127/131 lr:0.000002 t:9.8s +tttg: c128/131 lr:0.000001 t:9.9s +tttg: c129/131 lr:0.000001 t:10.0s +tttg: c130/131 lr:0.000000 t:10.0s +ttpr: phase:1/3 t:235.7s +ttp: b761/782 bl:2.4169 bb:1.1143 rl:2.2749 rb:1.0817 dl:3916-4032 gd:0 +ttpp: phase:2/3 pd:2128 gd:1666 t:394.9s +tttg: c1/219 lr:0.001000 t:0.1s +tttg: c2/219 lr:0.001000 t:0.2s +tttg: c3/219 lr:0.001000 t:0.3s +tttg: c4/219 lr:0.001000 t:0.3s +tttg: c5/219 lr:0.000999 t:0.4s +tttg: c6/219 lr:0.000999 t:0.5s +tttg: c7/219 lr:0.000998 t:0.6s +tttg: c8/219 lr:0.000997 t:0.6s +tttg: c9/219 lr:0.000997 t:0.7s +tttg: c10/219 lr:0.000996 t:0.8s +tttg: c11/219 lr:0.000995 t:0.9s +tttg: c12/219 lr:0.000994 t:0.9s +tttg: c13/219 lr:0.000993 t:1.0s +tttg: c14/219 lr:0.000991 t:1.1s +tttg: c15/219 lr:0.000990 t:1.2s +tttg: c16/219 lr:0.000988 t:1.2s +tttg: c17/219 lr:0.000987 t:1.3s +tttg: c18/219 lr:0.000985 t:1.4s +tttg: c19/219 lr:0.000983 t:1.5s +tttg: c20/219 lr:0.000981 t:1.5s +tttg: c21/219 lr:0.000979 t:1.6s +tttg: c22/219 lr:0.000977 t:1.7s +tttg: c23/219 lr:0.000975 t:1.8s +tttg: c24/219 lr:0.000973 t:1.8s +tttg: c25/219 lr:0.000970 t:1.9s +tttg: c26/219 lr:0.000968 t:2.0s +tttg: c27/219 lr:0.000965 t:2.1s +tttg: c28/219 lr:0.000963 t:2.1s +tttg: c29/219 lr:0.000960 t:2.2s +tttg: c30/219 lr:0.000957 t:2.3s +tttg: c31/219 lr:0.000954 t:2.4s +tttg: c32/219 lr:0.000951 t:2.4s +tttg: c33/219 lr:0.000948 t:2.5s +tttg: c34/219 lr:0.000945 t:2.6s +tttg: c35/219 lr:0.000941 t:2.7s +tttg: c36/219 lr:0.000938 t:2.7s +tttg: c37/219 lr:0.000934 t:2.8s +tttg: c38/219 lr:0.000931 t:2.9s +tttg: c39/219 lr:0.000927 t:3.0s +tttg: c40/219 lr:0.000923 t:3.0s +tttg: c41/219 lr:0.000919 t:3.1s +tttg: c42/219 lr:0.000915 t:3.2s +tttg: c43/219 lr:0.000911 t:3.3s +tttg: c44/219 lr:0.000907 t:3.3s +tttg: c45/219 lr:0.000903 t:3.4s +tttg: c46/219 lr:0.000898 t:3.5s +tttg: c47/219 lr:0.000894 t:3.6s +tttg: c48/219 lr:0.000890 t:3.6s +tttg: c49/219 lr:0.000885 t:3.7s +tttg: c50/219 lr:0.000880 t:3.8s +tttg: c51/219 lr:0.000876 t:3.9s +tttg: c52/219 lr:0.000871 t:3.9s +tttg: c53/219 lr:0.000866 t:4.0s +tttg: c54/219 lr:0.000861 t:4.1s +tttg: c55/219 lr:0.000856 t:4.2s +tttg: c56/219 lr:0.000851 t:4.2s +tttg: c57/219 lr:0.000846 t:4.3s +tttg: c58/219 lr:0.000841 t:4.4s +tttg: c59/219 lr:0.000835 t:4.5s +tttg: c60/219 lr:0.000830 t:4.5s +tttg: c61/219 lr:0.000824 t:4.6s +tttg: c62/219 lr:0.000819 t:4.7s +tttg: c63/219 lr:0.000813 t:4.8s +tttg: c64/219 lr:0.000808 t:4.8s +tttg: c65/219 lr:0.000802 t:4.9s +tttg: c66/219 lr:0.000796 t:5.0s +tttg: c67/219 lr:0.000790 t:5.1s +tttg: c68/219 lr:0.000784 t:5.1s +tttg: c69/219 lr:0.000779 t:5.2s +tttg: c70/219 lr:0.000773 t:5.3s +tttg: c71/219 lr:0.000766 t:5.4s +tttg: c72/219 lr:0.000760 t:5.4s +tttg: c73/219 lr:0.000754 t:5.5s +tttg: c74/219 lr:0.000748 t:5.6s +tttg: c75/219 lr:0.000742 t:5.7s +tttg: c76/219 lr:0.000735 t:5.8s +tttg: c77/219 lr:0.000729 t:5.8s +tttg: c78/219 lr:0.000722 t:5.9s +tttg: c79/219 lr:0.000716 t:6.0s +tttg: c80/219 lr:0.000709 t:6.1s +tttg: c81/219 lr:0.000703 t:6.1s +tttg: c82/219 lr:0.000696 t:6.2s +tttg: c83/219 lr:0.000690 t:6.3s +tttg: c84/219 lr:0.000683 t:6.4s +tttg: c85/219 lr:0.000676 t:6.4s +tttg: c86/219 lr:0.000670 t:6.5s +tttg: c87/219 lr:0.000663 t:6.6s +tttg: c88/219 lr:0.000656 t:6.6s +tttg: c89/219 lr:0.000649 t:6.7s +tttg: c90/219 lr:0.000642 t:6.8s +tttg: c91/219 lr:0.000635 t:6.9s +tttg: c92/219 lr:0.000628 t:7.0s +tttg: c93/219 lr:0.000621 t:7.0s +tttg: c94/219 lr:0.000614 t:7.1s +tttg: c95/219 lr:0.000607 t:7.2s +tttg: c96/219 lr:0.000600 t:7.3s +tttg: c97/219 lr:0.000593 t:7.3s +tttg: c98/219 lr:0.000586 t:7.4s +tttg: c99/219 lr:0.000579 t:7.5s +tttg: c100/219 lr:0.000572 t:7.6s +tttg: c101/219 lr:0.000565 t:7.6s +tttg: c102/219 lr:0.000558 t:7.7s +tttg: c103/219 lr:0.000550 t:7.8s +tttg: c104/219 lr:0.000543 t:7.9s +tttg: c105/219 lr:0.000536 t:7.9s +tttg: c106/219 lr:0.000529 t:8.0s +tttg: c107/219 lr:0.000522 t:8.1s +tttg: c108/219 lr:0.000514 t:8.2s +tttg: c109/219 lr:0.000507 t:8.2s +tttg: c110/219 lr:0.000500 t:8.3s +tttg: c111/219 lr:0.000493 t:8.4s +tttg: c112/219 lr:0.000486 t:8.5s +tttg: c113/219 lr:0.000478 t:8.5s +tttg: c114/219 lr:0.000471 t:8.6s +tttg: c115/219 lr:0.000464 t:8.7s +tttg: c116/219 lr:0.000457 t:8.8s +tttg: c117/219 lr:0.000450 t:8.8s +tttg: c118/219 lr:0.000442 t:8.9s +tttg: c119/219 lr:0.000435 t:9.0s +tttg: c120/219 lr:0.000428 t:9.1s +tttg: c121/219 lr:0.000421 t:9.1s +tttg: c122/219 lr:0.000414 t:9.2s +tttg: c123/219 lr:0.000407 t:9.3s +tttg: c124/219 lr:0.000400 t:9.4s +tttg: c125/219 lr:0.000393 t:9.5s +tttg: c126/219 lr:0.000386 t:9.5s +tttg: c127/219 lr:0.000379 t:9.6s +tttg: c128/219 lr:0.000372 t:9.7s +tttg: c129/219 lr:0.000365 t:9.7s +tttg: c130/219 lr:0.000358 t:9.8s +tttg: c131/219 lr:0.000351 t:9.9s +tttg: c132/219 lr:0.000344 t:10.0s +tttg: c133/219 lr:0.000337 t:10.1s +tttg: c134/219 lr:0.000330 t:10.1s +tttg: c135/219 lr:0.000324 t:10.2s +tttg: c136/219 lr:0.000317 t:10.3s +tttg: c137/219 lr:0.000310 t:10.4s +tttg: c138/219 lr:0.000304 t:10.4s +tttg: c139/219 lr:0.000297 t:10.5s +tttg: c140/219 lr:0.000291 t:10.6s +tttg: c141/219 lr:0.000284 t:10.7s +tttg: c142/219 lr:0.000278 t:10.7s +tttg: c143/219 lr:0.000271 t:10.8s +tttg: c144/219 lr:0.000265 t:10.9s +tttg: c145/219 lr:0.000258 t:11.0s +tttg: c146/219 lr:0.000252 t:11.1s +tttg: c147/219 lr:0.000246 t:11.1s +tttg: c148/219 lr:0.000240 t:11.2s +tttg: c149/219 lr:0.000234 t:11.3s +tttg: c150/219 lr:0.000227 t:11.4s +tttg: c151/219 lr:0.000221 t:11.4s +tttg: c152/219 lr:0.000216 t:11.5s +tttg: c153/219 lr:0.000210 t:11.6s +tttg: c154/219 lr:0.000204 t:11.7s +tttg: c155/219 lr:0.000198 t:11.7s +tttg: c156/219 lr:0.000192 t:11.8s +tttg: c157/219 lr:0.000187 t:11.9s +tttg: c158/219 lr:0.000181 t:11.9s +tttg: c159/219 lr:0.000176 t:12.0s +tttg: c160/219 lr:0.000170 t:12.1s +tttg: c161/219 lr:0.000165 t:12.2s +tttg: c162/219 lr:0.000159 t:12.3s +tttg: c163/219 lr:0.000154 t:12.3s +tttg: c164/219 lr:0.000149 t:12.4s +tttg: c165/219 lr:0.000144 t:12.5s +tttg: c166/219 lr:0.000139 t:12.6s +tttg: c167/219 lr:0.000134 t:12.6s +tttg: c168/219 lr:0.000129 t:12.7s +tttg: c169/219 lr:0.000124 t:12.8s +tttg: c170/219 lr:0.000120 t:12.9s +tttg: c171/219 lr:0.000115 t:12.9s +tttg: c172/219 lr:0.000110 t:13.0s +tttg: c173/219 lr:0.000106 t:13.1s +tttg: c174/219 lr:0.000102 t:13.2s +tttg: c175/219 lr:0.000097 t:13.2s +tttg: c176/219 lr:0.000093 t:13.3s +tttg: c177/219 lr:0.000089 t:13.4s +tttg: c178/219 lr:0.000085 t:13.5s +tttg: c179/219 lr:0.000081 t:13.5s +tttg: c180/219 lr:0.000077 t:13.6s +tttg: c181/219 lr:0.000073 t:13.7s +tttg: c182/219 lr:0.000069 t:13.8s +tttg: c183/219 lr:0.000066 t:13.8s +tttg: c184/219 lr:0.000062 t:13.9s +tttg: c185/219 lr:0.000059 t:14.0s +tttg: c186/219 lr:0.000055 t:14.1s +tttg: c187/219 lr:0.000052 t:14.1s +tttg: c188/219 lr:0.000049 t:14.2s +tttg: c189/219 lr:0.000046 t:14.3s +tttg: c190/219 lr:0.000043 t:14.4s +tttg: c191/219 lr:0.000040 t:14.4s +tttg: c192/219 lr:0.000037 t:14.5s +tttg: c193/219 lr:0.000035 t:14.6s +tttg: c194/219 lr:0.000032 t:14.7s +tttg: c195/219 lr:0.000030 t:14.7s +tttg: c196/219 lr:0.000027 t:14.8s +tttg: c197/219 lr:0.000025 t:14.9s +tttg: c198/219 lr:0.000023 t:15.0s +tttg: c199/219 lr:0.000021 t:15.0s +tttg: c200/219 lr:0.000019 t:15.1s +tttg: c201/219 lr:0.000017 t:15.2s +tttg: c202/219 lr:0.000015 t:15.3s +tttg: c203/219 lr:0.000013 t:15.3s +tttg: c204/219 lr:0.000012 t:15.4s +tttg: c205/219 lr:0.000010 t:15.5s +tttg: c206/219 lr:0.000009 t:15.6s +tttg: c207/219 lr:0.000007 t:15.6s +tttg: c208/219 lr:0.000006 t:15.7s +tttg: c209/219 lr:0.000005 t:15.8s +tttg: c210/219 lr:0.000004 t:15.9s +tttg: c211/219 lr:0.000003 t:15.9s +tttg: c212/219 lr:0.000003 t:16.0s +tttg: c213/219 lr:0.000002 t:16.1s +tttg: c214/219 lr:0.000001 t:16.2s +tttg: c215/219 lr:0.000001 t:16.2s +tttg: c216/219 lr:0.000000 t:16.3s +tttg: c217/219 lr:0.000000 t:16.4s +tttg: c218/219 lr:0.000000 t:16.5s +ttpr: phase:2/3 t:413.0s +ttp: b743/782 bl:2.3379 bb:1.0652 rl:2.2817 rb:1.0799 dl:2762-2805 gd:0 +ttp: b738/782 bl:2.3088 bb:1.0455 rl:2.2842 rb:1.0766 dl:2583-2618 gd:0 +ttpp: phase:3/3 pd:2960 gd:2500 t:429.3s +tttg: c1/289 lr:0.001000 t:0.1s +tttg: c2/289 lr:0.001000 t:0.2s +tttg: c3/289 lr:0.001000 t:0.2s +tttg: c4/289 lr:0.001000 t:0.3s +tttg: c5/289 lr:0.001000 t:0.4s +tttg: c6/289 lr:0.000999 t:0.5s +tttg: c7/289 lr:0.000999 t:0.5s +tttg: c8/289 lr:0.000999 t:0.6s +tttg: c9/289 lr:0.000998 t:0.7s +tttg: c10/289 lr:0.000998 t:0.7s +tttg: c11/289 lr:0.000997 t:0.8s +tttg: c12/289 lr:0.000996 t:0.9s +tttg: c13/289 lr:0.000996 t:1.0s +tttg: c14/289 lr:0.000995 t:1.1s +tttg: c15/289 lr:0.000994 t:1.1s +tttg: c16/289 lr:0.000993 t:1.2s +tttg: c17/289 lr:0.000992 t:1.3s +tttg: c18/289 lr:0.000991 t:1.4s +tttg: c19/289 lr:0.000990 t:1.4s +tttg: c20/289 lr:0.000989 t:1.5s +tttg: c21/289 lr:0.000988 t:1.6s +tttg: c22/289 lr:0.000987 t:1.6s +tttg: c23/289 lr:0.000986 t:1.7s +tttg: c24/289 lr:0.000984 t:1.8s +tttg: c25/289 lr:0.000983 t:1.9s +tttg: c26/289 lr:0.000982 t:1.9s +tttg: c27/289 lr:0.000980 t:2.0s +tttg: c28/289 lr:0.000978 t:2.1s +tttg: c29/289 lr:0.000977 t:2.2s +tttg: c30/289 lr:0.000975 t:2.3s +tttg: c31/289 lr:0.000973 t:2.3s +tttg: c32/289 lr:0.000972 t:2.4s +tttg: c33/289 lr:0.000970 t:2.5s +tttg: c34/289 lr:0.000968 t:2.6s +tttg: c35/289 lr:0.000966 t:2.6s +tttg: c36/289 lr:0.000964 t:2.7s +tttg: c37/289 lr:0.000962 t:2.8s +tttg: c38/289 lr:0.000960 t:2.9s +tttg: c39/289 lr:0.000958 t:2.9s +tttg: c40/289 lr:0.000955 t:3.0s +tttg: c41/289 lr:0.000953 t:3.1s +tttg: c42/289 lr:0.000951 t:3.2s +tttg: c43/289 lr:0.000948 t:3.2s +tttg: c44/289 lr:0.000946 t:3.3s +tttg: c45/289 lr:0.000944 t:3.4s +tttg: c46/289 lr:0.000941 t:3.5s +tttg: c47/289 lr:0.000938 t:3.5s +tttg: c48/289 lr:0.000936 t:3.6s +tttg: c49/289 lr:0.000933 t:3.7s +tttg: c50/289 lr:0.000930 t:3.8s +tttg: c51/289 lr:0.000927 t:3.8s +tttg: c52/289 lr:0.000925 t:3.9s +tttg: c53/289 lr:0.000922 t:4.0s +tttg: c54/289 lr:0.000919 t:4.1s +tttg: c55/289 lr:0.000916 t:4.1s +tttg: c56/289 lr:0.000913 t:4.2s +tttg: c57/289 lr:0.000910 t:4.3s +tttg: c58/289 lr:0.000906 t:4.4s +tttg: c59/289 lr:0.000903 t:4.4s +tttg: c60/289 lr:0.000900 t:4.5s +tttg: c61/289 lr:0.000897 t:4.6s +tttg: c62/289 lr:0.000893 t:4.7s +tttg: c63/289 lr:0.000890 t:4.7s +tttg: c64/289 lr:0.000887 t:4.8s +tttg: c65/289 lr:0.000883 t:4.9s +tttg: c66/289 lr:0.000879 t:5.0s +tttg: c67/289 lr:0.000876 t:5.0s +tttg: c68/289 lr:0.000872 t:5.1s +tttg: c69/289 lr:0.000869 t:5.2s +tttg: c70/289 lr:0.000865 t:5.3s +tttg: c71/289 lr:0.000861 t:5.3s +tttg: c72/289 lr:0.000857 t:5.4s +tttg: c73/289 lr:0.000854 t:5.5s +tttg: c74/289 lr:0.000850 t:5.6s +tttg: c75/289 lr:0.000846 t:5.6s +tttg: c76/289 lr:0.000842 t:5.7s +tttg: c77/289 lr:0.000838 t:5.8s +tttg: c78/289 lr:0.000834 t:5.9s +tttg: c79/289 lr:0.000830 t:5.9s +tttg: c80/289 lr:0.000826 t:6.0s +tttg: c81/289 lr:0.000821 t:6.1s +tttg: c82/289 lr:0.000817 t:6.2s +tttg: c83/289 lr:0.000813 t:6.2s +tttg: c84/289 lr:0.000809 t:6.3s +tttg: c85/289 lr:0.000804 t:6.4s +tttg: c86/289 lr:0.000800 t:6.5s +tttg: c87/289 lr:0.000796 t:6.5s +tttg: c88/289 lr:0.000791 t:6.6s +tttg: c89/289 lr:0.000787 t:6.7s +tttg: c90/289 lr:0.000782 t:6.8s +tttg: c91/289 lr:0.000778 t:6.8s +tttg: c92/289 lr:0.000773 t:6.9s +tttg: c93/289 lr:0.000769 t:7.0s +tttg: c94/289 lr:0.000764 t:7.1s +tttg: c95/289 lr:0.000759 t:7.1s +tttg: c96/289 lr:0.000755 t:7.2s +tttg: c97/289 lr:0.000750 t:7.3s +tttg: c98/289 lr:0.000745 t:7.4s +tttg: c99/289 lr:0.000740 t:7.4s +tttg: c100/289 lr:0.000736 t:7.5s +tttg: c101/289 lr:0.000731 t:7.6s +tttg: c102/289 lr:0.000726 t:7.7s +tttg: c103/289 lr:0.000721 t:7.7s +tttg: c104/289 lr:0.000716 t:7.8s +tttg: c105/289 lr:0.000711 t:7.9s +tttg: c106/289 lr:0.000706 t:8.0s +tttg: c107/289 lr:0.000701 t:8.0s +tttg: c108/289 lr:0.000696 t:8.1s +tttg: c109/289 lr:0.000691 t:8.2s +tttg: c110/289 lr:0.000686 t:8.3s +tttg: c111/289 lr:0.000681 t:8.3s +tttg: c112/289 lr:0.000676 t:8.4s +tttg: c113/289 lr:0.000671 t:8.5s +tttg: c114/289 lr:0.000666 t:8.6s +tttg: c115/289 lr:0.000661 t:8.6s +tttg: c116/289 lr:0.000656 t:8.7s +tttg: c117/289 lr:0.000650 t:8.8s +tttg: c118/289 lr:0.000645 t:8.9s +tttg: c119/289 lr:0.000640 t:8.9s +tttg: c120/289 lr:0.000635 t:9.0s +tttg: c121/289 lr:0.000629 t:9.1s +tttg: c122/289 lr:0.000624 t:9.2s +tttg: c123/289 lr:0.000619 t:9.2s +tttg: c124/289 lr:0.000614 t:9.3s +tttg: c125/289 lr:0.000608 t:9.4s +tttg: c126/289 lr:0.000603 t:9.5s +tttg: c127/289 lr:0.000598 t:9.5s +tttg: c128/289 lr:0.000592 t:9.6s +tttg: c129/289 lr:0.000587 t:9.7s +tttg: c130/289 lr:0.000581 t:9.8s +tttg: c131/289 lr:0.000576 t:9.8s +tttg: c132/289 lr:0.000571 t:9.9s +tttg: c133/289 lr:0.000565 t:10.0s +tttg: c134/289 lr:0.000560 t:10.1s +tttg: c135/289 lr:0.000554 t:10.1s +tttg: c136/289 lr:0.000549 t:10.2s +tttg: c137/289 lr:0.000544 t:10.3s +tttg: c138/289 lr:0.000538 t:10.3s +tttg: c139/289 lr:0.000533 t:10.4s +tttg: c140/289 lr:0.000527 t:10.5s +tttg: c141/289 lr:0.000522 t:10.6s +tttg: c142/289 lr:0.000516 t:10.6s +tttg: c143/289 lr:0.000511 t:10.7s +tttg: c144/289 lr:0.000505 t:10.8s +tttg: c145/289 lr:0.000500 t:10.9s +tttg: c146/289 lr:0.000495 t:11.0s +tttg: c147/289 lr:0.000489 t:11.0s +tttg: c148/289 lr:0.000484 t:11.1s +tttg: c149/289 lr:0.000478 t:11.3s +tttg: c150/289 lr:0.000473 t:11.3s +tttg: c151/289 lr:0.000467 t:11.5s +tttg: c152/289 lr:0.000462 t:11.5s +tttg: c153/289 lr:0.000456 t:11.6s +tttg: c154/289 lr:0.000451 t:11.7s +tttg: c155/289 lr:0.000446 t:11.8s +tttg: c156/289 lr:0.000440 t:11.8s +tttg: c157/289 lr:0.000435 t:11.9s +tttg: c158/289 lr:0.000429 t:12.0s +tttg: c159/289 lr:0.000424 t:12.1s +tttg: c160/289 lr:0.000419 t:12.1s +tttg: c161/289 lr:0.000413 t:12.2s +tttg: c162/289 lr:0.000408 t:12.3s +tttg: c163/289 lr:0.000402 t:12.4s +tttg: c164/289 lr:0.000397 t:12.4s +tttg: c165/289 lr:0.000392 t:12.5s +tttg: c166/289 lr:0.000386 t:12.6s +tttg: c167/289 lr:0.000381 t:12.7s +tttg: c168/289 lr:0.000376 t:12.8s +tttg: c169/289 lr:0.000371 t:12.8s +tttg: c170/289 lr:0.000365 t:12.9s +tttg: c171/289 lr:0.000360 t:13.0s +tttg: c172/289 lr:0.000355 t:13.1s +tttg: c173/289 lr:0.000350 t:13.1s +tttg: c174/289 lr:0.000344 t:13.2s +tttg: c175/289 lr:0.000339 t:13.3s +tttg: c176/289 lr:0.000334 t:13.4s +tttg: c177/289 lr:0.000329 t:13.4s +tttg: c178/289 lr:0.000324 t:13.5s +tttg: c179/289 lr:0.000319 t:13.6s +tttg: c180/289 lr:0.000314 t:13.6s +tttg: c181/289 lr:0.000309 t:13.7s +tttg: c182/289 lr:0.000304 t:13.8s +tttg: c183/289 lr:0.000299 t:13.9s +tttg: c184/289 lr:0.000294 t:14.0s +tttg: c185/289 lr:0.000289 t:14.0s +tttg: c186/289 lr:0.000284 t:14.1s +tttg: c187/289 lr:0.000279 t:14.2s +tttg: c188/289 lr:0.000274 t:14.3s +tttg: c189/289 lr:0.000269 t:14.3s +tttg: c190/289 lr:0.000264 t:14.4s +tttg: c191/289 lr:0.000260 t:14.5s +tttg: c192/289 lr:0.000255 t:14.5s +tttg: c193/289 lr:0.000250 t:14.6s +tttg: c194/289 lr:0.000245 t:14.7s +tttg: c195/289 lr:0.000241 t:14.8s +tttg: c196/289 lr:0.000236 t:14.9s +tttg: c197/289 lr:0.000231 t:14.9s +tttg: c198/289 lr:0.000227 t:15.0s +tttg: c199/289 lr:0.000222 t:15.1s +tttg: c200/289 lr:0.000218 t:15.1s +tttg: c201/289 lr:0.000213 t:15.2s +tttg: c202/289 lr:0.000209 t:15.3s +tttg: c203/289 lr:0.000204 t:15.4s +tttg: c204/289 lr:0.000200 t:15.5s +tttg: c205/289 lr:0.000196 t:15.5s +tttg: c206/289 lr:0.000191 t:15.6s +tttg: c207/289 lr:0.000187 t:15.7s +tttg: c208/289 lr:0.000183 t:15.7s +tttg: c209/289 lr:0.000179 t:15.8s +tttg: c210/289 lr:0.000174 t:15.9s +tttg: c211/289 lr:0.000170 t:16.0s +tttg: c212/289 lr:0.000166 t:16.1s +tttg: c213/289 lr:0.000162 t:16.1s +tttg: c214/289 lr:0.000158 t:16.2s +tttg: c215/289 lr:0.000154 t:16.3s +tttg: c216/289 lr:0.000150 t:16.4s +tttg: c217/289 lr:0.000146 t:16.4s +tttg: c218/289 lr:0.000143 t:16.5s +tttg: c219/289 lr:0.000139 t:16.6s +tttg: c220/289 lr:0.000135 t:16.7s +tttg: c221/289 lr:0.000131 t:16.7s +tttg: c222/289 lr:0.000128 t:16.8s +tttg: c223/289 lr:0.000124 t:16.9s +tttg: c224/289 lr:0.000121 t:17.0s +tttg: c225/289 lr:0.000117 t:17.0s +tttg: c226/289 lr:0.000113 t:17.1s +tttg: c227/289 lr:0.000110 t:17.2s +tttg: c228/289 lr:0.000107 t:17.3s +tttg: c229/289 lr:0.000103 t:17.3s +tttg: c230/289 lr:0.000100 t:17.4s +tttg: c231/289 lr:0.000097 t:17.5s +tttg: c232/289 lr:0.000094 t:17.6s +tttg: c233/289 lr:0.000090 t:17.6s +tttg: c234/289 lr:0.000087 t:17.7s +tttg: c235/289 lr:0.000084 t:17.8s +tttg: c236/289 lr:0.000081 t:17.9s +tttg: c237/289 lr:0.000078 t:17.9s +tttg: c238/289 lr:0.000075 t:18.0s +tttg: c239/289 lr:0.000073 t:18.1s +tttg: c240/289 lr:0.000070 t:18.2s +tttg: c241/289 lr:0.000067 t:18.2s +tttg: c242/289 lr:0.000064 t:18.3s +tttg: c243/289 lr:0.000062 t:18.4s +tttg: c244/289 lr:0.000059 t:18.5s +tttg: c245/289 lr:0.000056 t:18.5s +tttg: c246/289 lr:0.000054 t:18.6s +tttg: c247/289 lr:0.000052 t:18.7s +tttg: c248/289 lr:0.000049 t:18.8s +tttg: c249/289 lr:0.000047 t:18.8s +tttg: c250/289 lr:0.000045 t:18.9s +tttg: c251/289 lr:0.000042 t:19.0s +tttg: c252/289 lr:0.000040 t:19.1s +tttg: c253/289 lr:0.000038 t:19.1s +tttg: c254/289 lr:0.000036 t:19.2s +tttg: c255/289 lr:0.000034 t:19.3s +tttg: c256/289 lr:0.000032 t:19.4s +tttg: c257/289 lr:0.000030 t:19.4s +tttg: c258/289 lr:0.000028 t:19.5s +tttg: c259/289 lr:0.000027 t:19.6s +tttg: c260/289 lr:0.000025 t:19.7s +tttg: c261/289 lr:0.000023 t:21.4s +tttg: c262/289 lr:0.000022 t:21.5s +tttg: c263/289 lr:0.000020 t:21.6s +tttg: c264/289 lr:0.000018 t:21.7s +tttg: c265/289 lr:0.000017 t:21.7s +tttg: c266/289 lr:0.000016 t:21.8s +tttg: c267/289 lr:0.000014 t:21.9s +tttg: c268/289 lr:0.000013 t:22.0s +tttg: c269/289 lr:0.000012 t:22.0s +tttg: c270/289 lr:0.000011 t:22.1s +tttg: c271/289 lr:0.000010 t:22.2s +tttg: c272/289 lr:0.000009 t:22.3s +tttg: c273/289 lr:0.000008 t:22.3s +tttg: c274/289 lr:0.000007 t:22.4s +tttg: c275/289 lr:0.000006 t:22.5s +tttg: c276/289 lr:0.000005 t:22.6s +tttg: c277/289 lr:0.000004 t:22.6s +tttg: c278/289 lr:0.000004 t:22.7s +tttg: c279/289 lr:0.000003 t:22.8s +tttg: c280/289 lr:0.000002 t:22.9s +tttg: c281/289 lr:0.000002 t:22.9s +tttg: c282/289 lr:0.000001 t:23.0s +tttg: c283/289 lr:0.000001 t:23.1s +tttg: c284/289 lr:0.000001 t:23.2s +tttg: c285/289 lr:0.000000 t:23.2s +tttg: c286/289 lr:0.000000 t:23.3s +tttg: c287/289 lr:0.000000 t:23.4s +tttg: c288/289 lr:0.000000 t:23.5s +ttpr: phase:3/3 t:454.4s +ttp: b731/782 bl:2.3405 bb:1.0438 rl:2.2886 rb:1.0739 dl:2377-2414 gd:1 +ttp: b723/782 bl:2.2948 bb:1.0303 rl:2.2890 rb:1.0709 dl:2185-2203 gd:1 +ttp: b716/782 bl:2.2489 bb:1.0392 rl:2.2866 rb:1.0690 dl:2054-2069 gd:1 +ttp: b705/782 bl:2.3634 bb:1.0623 rl:2.2906 rb:1.0686 dl:1885-1898 gd:1 +ttp: b700/782 bl:2.2713 bb:1.0143 rl:2.2897 rb:1.0660 dl:1824-1834 gd:1 +ttp: b688/782 bl:2.3978 bb:1.0735 rl:2.2942 rb:1.0663 dl:1696-1706 gd:1 +ttp: b683/782 bl:2.2701 bb:1.0567 rl:2.2933 rb:1.0659 dl:1646-1657 gd:1 +ttp: b677/782 bl:2.3072 bb:1.0337 rl:2.2938 rb:1.0647 dl:1595-1601 gd:1 +ttp: b668/782 bl:2.3286 bb:1.0646 rl:2.2949 rb:1.0647 dl:1521-1530 gd:1 +ttp: b662/782 bl:2.2949 bb:1.0258 rl:2.2949 rb:1.0634 dl:1480-1486 gd:1 +ttp: b655/782 bl:2.3777 bb:1.0428 rl:2.2974 rb:1.0628 dl:1432-1439 gd:1 +ttp: b647/782 bl:2.2730 bb:1.0316 rl:2.2967 rb:1.0619 dl:1382-1387 gd:1 +ttp: b639/782 bl:2.3074 bb:1.0304 rl:2.2970 rb:1.0610 dl:1331-1337 gd:1 +ttp: b630/782 bl:2.3229 bb:1.0392 rl:2.2976 rb:1.0605 dl:1280-1285 gd:1 +ttp: b620/782 bl:2.3396 bb:1.0538 rl:2.2986 rb:1.0603 dl:1226-1231 gd:1 +ttp: b611/782 bl:2.2932 bb:1.0240 rl:2.2985 rb:1.0595 dl:1182-1186 gd:1 +ttp: b604/782 bl:2.3738 bb:1.0420 rl:2.3000 rb:1.0591 dl:1150-1154 gd:1 +ttp: b595/782 bl:2.3426 bb:1.0574 rl:2.3009 rb:1.0591 dl:1110-1115 gd:1 +ttp: b587/782 bl:2.4041 bb:1.0668 rl:2.3028 rb:1.0592 dl:1077-1081 gd:1 +ttp: b579/782 bl:2.3386 bb:1.0336 rl:2.3034 rb:1.0588 dl:1044-1048 gd:1 +ttp: b573/782 bl:2.3620 bb:1.0647 rl:2.3044 rb:1.0589 dl:1021-1025 gd:1 +ttp: b564/782 bl:2.2822 bb:1.0155 rl:2.3041 rb:1.0581 dl:990-993 gd:1 +ttp: b553/782 bl:2.2838 bb:1.0297 rl:2.3038 rb:1.0577 dl:952-955 gd:1 +ttp: b546/782 bl:2.3192 bb:1.0312 rl:2.3040 rb:1.0573 dl:930-934 gd:1 +ttp: b538/782 bl:2.3313 bb:1.0437 rl:2.3044 rb:1.0571 dl:905-909 gd:1 +ttp: b529/782 bl:2.3069 bb:1.0134 rl:2.3044 rb:1.0565 dl:878-882 gd:1 +ttp: b520/782 bl:2.3222 bb:1.0013 rl:2.3046 rb:1.0557 dl:852-854 gd:1 +ttp: b513/782 bl:2.3636 bb:1.0376 rl:2.3054 rb:1.0555 dl:832-835 gd:1 +ttp: b505/782 bl:2.3251 bb:1.0633 rl:2.3056 rb:1.0556 dl:809-812 gd:1 +ttp: b497/782 bl:2.3344 bb:1.0411 rl:2.3060 rb:1.0554 dl:788-791 gd:1 +ttp: b489/782 bl:2.3854 bb:1.0732 rl:2.3068 rb:1.0556 dl:769-771 gd:1 +ttp: b478/782 bl:2.3345 bb:1.0750 rl:2.3071 rb:1.0558 dl:742-744 gd:1 +ttp: b470/782 bl:2.3440 bb:1.0549 rl:2.3075 rb:1.0558 dl:724-726 gd:1 +ttp: b462/782 bl:2.3329 bb:1.0354 rl:2.3078 rb:1.0556 dl:706-708 gd:1 +ttp: b454/782 bl:2.3804 bb:1.0811 rl:2.3085 rb:1.0558 dl:689-691 gd:1 +ttp: b446/782 bl:2.2870 bb:1.0750 rl:2.3083 rb:1.0560 dl:672-674 gd:1 +ttp: b437/782 bl:2.2865 bb:1.0521 rl:2.3081 rb:1.0560 dl:653-655 gd:1 +ttp: b429/782 bl:2.2442 bb:1.0237 rl:2.3075 rb:1.0557 dl:638-640 gd:1 +ttp: b421/782 bl:2.2878 bb:1.0016 rl:2.3074 rb:1.0552 dl:622-624 gd:1 +ttp: b413/782 bl:2.3637 bb:1.0594 rl:2.3078 rb:1.0552 dl:607-609 gd:1 +ttp: b406/782 bl:2.3111 bb:1.0643 rl:2.3078 rb:1.0553 dl:593-595 gd:1 +ttp: b397/782 bl:2.3518 bb:1.0430 rl:2.3082 rb:1.0552 dl:577-579 gd:1 +ttp: b389/782 bl:2.2863 bb:1.0829 rl:2.3080 rb:1.0554 dl:563-564 gd:1 +ttp: b381/782 bl:2.4201 bb:1.1001 rl:2.3088 rb:1.0557 dl:549-550 gd:1 +ttp: b373/782 bl:2.4044 bb:1.0972 rl:2.3095 rb:1.0560 dl:535-537 gd:1 +ttp: b365/782 bl:2.3237 bb:1.0325 rl:2.3096 rb:1.0559 dl:522-524 gd:1 +ttp: b357/782 bl:2.3211 bb:1.0641 rl:2.3096 rb:1.0559 dl:508-510 gd:1 +ttp: b351/782 bl:2.3621 bb:1.0813 rl:2.3100 rb:1.0561 dl:498-499 gd:1 +ttp: b343/782 bl:2.2167 bb:1.0432 rl:2.3094 rb:1.0560 dl:486-488 gd:1 +ttp: b335/782 bl:2.3538 bb:1.0663 rl:2.3097 rb:1.0561 dl:474-476 gd:1 +ttp: b326/782 bl:2.3045 bb:1.0553 rl:2.3096 rb:1.0561 dl:461-462 gd:1 +ttp: b317/782 bl:2.3028 bb:1.0463 rl:2.3096 rb:1.0560 dl:446-448 gd:1 +ttp: b309/782 bl:2.4085 bb:1.1052 rl:2.3101 rb:1.0563 dl:435-437 gd:1 +ttp: b301/782 bl:2.3392 bb:1.0859 rl:2.3103 rb:1.0564 dl:422-424 gd:1 +ttp: b293/782 bl:2.4197 bb:1.0910 rl:2.3108 rb:1.0566 dl:410-412 gd:1 +ttp: b285/782 bl:2.3700 bb:1.0797 rl:2.3111 rb:1.0567 dl:399-400 gd:1 +ttp: b277/782 bl:2.2571 bb:1.0629 rl:2.3109 rb:1.0567 dl:388-389 gd:1 +ttp: b270/782 bl:2.3085 bb:1.0563 rl:2.3108 rb:1.0567 dl:379-380 gd:1 +ttp: b264/782 bl:2.4211 bb:1.1033 rl:2.3113 rb:1.0569 dl:371-372 gd:1 +ttp: b259/782 bl:2.3380 bb:1.0964 rl:2.3115 rb:1.0571 dl:365-366 gd:1 +ttp: b253/782 bl:2.3256 bb:1.1047 rl:2.3115 rb:1.0573 dl:357-358 gd:1 +ttp: b246/782 bl:2.3451 bb:1.0961 rl:2.3116 rb:1.0574 dl:349-350 gd:1 +ttp: b238/782 bl:2.3213 bb:1.1071 rl:2.3117 rb:1.0576 dl:338-340 gd:1 +ttp: b229/782 bl:2.3630 bb:1.0650 rl:2.3119 rb:1.0577 dl:328-329 gd:1 +ttp: b221/782 bl:2.4040 bb:1.1202 rl:2.3122 rb:1.0579 dl:318-320 gd:1 +ttp: b213/782 bl:2.2565 bb:1.0720 rl:2.3120 rb:1.0579 dl:309-310 gd:1 +ttp: b204/782 bl:2.4588 bb:1.1537 rl:2.3125 rb:1.0583 dl:300-301 gd:1 +ttp: b194/782 bl:2.4341 bb:1.1151 rl:2.3129 rb:1.0585 dl:289-290 gd:1 +ttp: b184/782 bl:2.3839 bb:1.1238 rl:2.3132 rb:1.0587 dl:278-279 gd:1 +ttp: b176/782 bl:2.3144 bb:1.1241 rl:2.3132 rb:1.0588 dl:270-271 gd:1 +ttp: b167/782 bl:2.5192 bb:1.1239 rl:2.3138 rb:1.0590 dl:262-263 gd:1 +ttp: b159/782 bl:2.4681 bb:1.1451 rl:2.3142 rb:1.0593 dl:254-255 gd:1 +ttp: b152/782 bl:2.3728 bb:1.1364 rl:2.3144 rb:1.0595 dl:247-248 gd:1 +ttp: b144/782 bl:2.3451 bb:1.1023 rl:2.3145 rb:1.0596 dl:239-240 gd:1 +ttp: b137/782 bl:2.4122 bb:1.1524 rl:2.3147 rb:1.0598 dl:233-233 gd:1 +ttp: b128/782 bl:2.3765 bb:1.1486 rl:2.3149 rb:1.0601 dl:224-225 gd:1 +ttp: b120/782 bl:2.3873 bb:1.1093 rl:2.3151 rb:1.0602 dl:217-218 gd:1 +ttp: b113/782 bl:2.5458 bb:1.1319 rl:2.3156 rb:1.0604 dl:210-211 gd:1 +ttp: b107/782 bl:2.4352 bb:1.1662 rl:2.3159 rb:1.0606 dl:205-206 gd:1 +ttp: b100/782 bl:2.4158 bb:1.1557 rl:2.3161 rb:1.0608 dl:199-200 gd:1 +ttp: b92/782 bl:2.4346 bb:1.1584 rl:2.3164 rb:1.0610 dl:191-192 gd:1 +ttp: b86/782 bl:2.4565 bb:1.1333 rl:2.3166 rb:1.0611 dl:186-187 gd:1 +ttp: b79/782 bl:2.3690 bb:1.1325 rl:2.3167 rb:1.0613 dl:180-181 gd:1 +ttp: b70/782 bl:2.5169 bb:1.2265 rl:2.3171 rb:1.0616 dl:172-173 gd:1 +ttp: b63/782 bl:2.5187 bb:1.2014 rl:2.3175 rb:1.0618 dl:166-166 gd:1 +ttp: b54/782 bl:2.4728 bb:1.2130 rl:2.3178 rb:1.0621 dl:157-158 gd:1 +ttp: b46/782 bl:2.5462 bb:1.2157 rl:2.3181 rb:1.0623 dl:149-150 gd:1 +ttp: b37/782 bl:2.5731 bb:1.2128 rl:2.3185 rb:1.0625 dl:140-141 gd:1 +ttp: b30/782 bl:2.5759 bb:1.2560 rl:2.3189 rb:1.0628 dl:133-134 gd:1 +ttp: b22/782 bl:2.5484 bb:1.1929 rl:2.3192 rb:1.0630 dl:124-126 gd:1 +ttp: b16/782 bl:2.6183 bb:1.2546 rl:2.3196 rb:1.0632 dl:117-118 gd:1 +ttp: b8/782 bl:2.7678 bb:1.2849 rl:2.3201 rb:1.0634 dl:103-105 gd:1 +quantized_ttt_phased val_loss:2.31717973 val_bpb:1.05886205 eval_time:553279ms +total_eval_time:553.3s diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed1234.log b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed1234.log new file mode 100644 index 0000000000..4b1bc980bc --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed1234.log @@ -0,0 +1,5846 @@ +nohup: ignoring input +==================================================== + v5 PRIMARY noLC fulltilt + precompute outside timer: V21 + #1953 + #1948 + fulltilt-tilt SEED=1234 Thu Apr 30 07:02:04 UTC 2026 + LeakyReLU slope 0.3 (code patch + v5 hint-precompute-outside-timer), EVAL_SEQ_LEN 2048 (no long-ctx for cap), no_qv, fulltilt-tilt +==================================================== +W0430 07:02:05.947000 1130344 torch/distributed/run.py:803] +W0430 07:02:05.947000 1130344 torch/distributed/run.py:803] ***************************************** +W0430 07:02:05.947000 1130344 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0430 07:02:05.947000 1130344 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + agree_add_boost: 0.5 + artifact_dir: + attn_clip_sigmas: 13.0 + attn_out_gate_enabled: False + attn_out_gate_src: proj + awq_lite_bits: 8 + awq_lite_enabled: True + awq_lite_group_size: 64 + awq_lite_group_top_k: 1 + beta1: 0.9 + beta2: 0.99 + caseops_enabled: True + compressor: pergroup + data_dir: /runpod-volume/caseops_data/datasets + datasets_dir: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved + distributed: True + ema_decay: 0.9965 + embed_bits: 7 + embed_clip_sigmas: 14.0 + embed_lr: 0.6 + embed_wd: 0.085 + enable_looping_at: 0.35 + eval_seq_len: 2048 + eval_stride: 64 + fused_ce_enabled: True + gate_window: 12 + gated_attn_enabled: False + gated_attn_init_std: 0.01 + gated_attn_quant_gate: True + global_ttt_batch_seqs: 32 + global_ttt_chunk_tokens: 32768 + global_ttt_epochs: 1 + global_ttt_grad_clip: 1.0 + global_ttt_lr: 0.001 + global_ttt_momentum: 0.9 + global_ttt_respect_doc_boundaries: True + global_ttt_warmup_chunks: 0 + global_ttt_warmup_start_lr: 0.0 + gptq_calibration_batches: 16 + gptq_reserve_seconds: 0.5 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/2f461a67-fc1a-4567-9c23-d7dc2c178233.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + lqer_asym_enabled: True + lqer_asym_group: 64 + lqer_enabled: True + lqer_factor_bits: 4 + lqer_gain_select: False + lqer_rank: 4 + lqer_scope: all + lqer_top_k: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.026 + max_wallclock_seconds: 600.0 + min_lr: 0.1 + mlp_clip_sigmas: 11.5 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + ngram_hint_precompute_outside: True + ngram_tilt_enabled: True + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_final_lane: mean + parallel_start_layer: 8 + phased_ttt_num_phases: 3 + phased_ttt_prefix_docs: 2500 + qk_gain_init: 5.25 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: 2f461a67-fc1a-4567-9c23-d7dc2c178233 + scalar_lr: 0.02 + seed: 1234 + skip_gates_enabled: True + smear_gate_enabled: True + sparse_attn_gate_enabled: True + sparse_attn_gate_init_std: 0.0 + sparse_attn_gate_scale: 0.5 + temperature_scale: 1.0 + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + token_boost: 2.625 + token_order: 16 + token_threshold: 0.8 + tokenizer_path: /runpod-volume/caseops_data/datasets/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model + train_batch_tokens: 786432 + train_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.99 + ttt_chunk_size: 48 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 80 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_bytes_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_bytes_*.bin + val_doc_fraction: 1.0 + val_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_*.bin + val_loss_every: 0 + vocab_size: 8192 + warmdown_frac: 0.85 + warmup_steps: 20 + within_boost: 0.75 + within_tau: 0.45 + word_boost: 0.75 + word_normalize: strip_punct_lower + word_order: 4 + word_tau: 0.65 + world_size: 8 + xsa_last_n: 11 +train_shards: 1499 +val_tokens: 47851520 +model_params:35945673 +gptq:reserving 0s, effective=599500ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +1/20000 train_loss: 9.0017 train_time: 0.0m tok/s: 17596046 +2/20000 train_loss: 12.9266 train_time: 0.0m tok/s: 11224289 +3/20000 train_loss: 10.1998 train_time: 0.0m tok/s: 10130919 +4/20000 train_loss: 8.6931 train_time: 0.0m tok/s: 9647461 +5/20000 train_loss: 7.9435 train_time: 0.0m tok/s: 9394471 +6/20000 train_loss: 7.4819 train_time: 0.0m tok/s: 9223646 +7/20000 train_loss: 7.2064 train_time: 0.0m tok/s: 9095635 +8/20000 train_loss: 6.9632 train_time: 0.0m tok/s: 9005196 +9/20000 train_loss: 6.6449 train_time: 0.0m tok/s: 8947144 +10/20000 train_loss: 6.4811 train_time: 0.0m tok/s: 8874363 +11/20000 train_loss: 6.1261 train_time: 0.0m tok/s: 8758987 +12/20000 train_loss: 5.8024 train_time: 0.0m tok/s: 8696610 +13/20000 train_loss: 5.6493 train_time: 0.0m tok/s: 8659111 +14/20000 train_loss: 5.3563 train_time: 0.0m tok/s: 8629535 +15/20000 train_loss: 5.2770 train_time: 0.0m tok/s: 8609166 +16/20000 train_loss: 5.3134 train_time: 0.0m tok/s: 8587771 +17/20000 train_loss: 5.1277 train_time: 0.0m tok/s: 8577528 +18/20000 train_loss: 5.0573 train_time: 0.0m tok/s: 8573634 +19/20000 train_loss: 4.9799 train_time: 0.0m tok/s: 8568643 +20/20000 train_loss: 4.8948 train_time: 0.0m tok/s: 8563415 +21/20000 train_loss: 4.8154 train_time: 0.0m tok/s: 8550615 +22/20000 train_loss: 4.8250 train_time: 0.0m tok/s: 8534184 +23/20000 train_loss: 4.7700 train_time: 0.0m tok/s: 8523232 +24/20000 train_loss: 4.8872 train_time: 0.0m tok/s: 8513073 +25/20000 train_loss: 4.6624 train_time: 0.0m tok/s: 8508927 +26/20000 train_loss: 4.6986 train_time: 0.0m tok/s: 8502891 +27/20000 train_loss: 4.5737 train_time: 0.0m tok/s: 8497773 +28/20000 train_loss: 4.6449 train_time: 0.0m tok/s: 8496644 +29/20000 train_loss: 4.5690 train_time: 0.0m tok/s: 8493660 +30/20000 train_loss: 4.5484 train_time: 0.0m tok/s: 8490716 +31/20000 train_loss: 4.5363 train_time: 0.0m tok/s: 8484776 +32/20000 train_loss: 4.5149 train_time: 0.0m tok/s: 8473103 +33/20000 train_loss: 4.4848 train_time: 0.1m tok/s: 8466051 +34/20000 train_loss: 4.4073 train_time: 0.1m tok/s: 8459139 +35/20000 train_loss: 4.3480 train_time: 0.1m tok/s: 8455679 +36/20000 train_loss: 4.4850 train_time: 0.1m tok/s: 8449538 +37/20000 train_loss: 4.4283 train_time: 0.1m tok/s: 8446850 +38/20000 train_loss: 4.3556 train_time: 0.1m tok/s: 8446063 +39/20000 train_loss: 4.4874 train_time: 0.1m tok/s: 8445613 +40/20000 train_loss: 4.4544 train_time: 0.1m tok/s: 8439972 +41/20000 train_loss: 4.3264 train_time: 0.1m tok/s: 8438834 +42/20000 train_loss: 4.2394 train_time: 0.1m tok/s: 8436344 +43/20000 train_loss: 4.2737 train_time: 0.1m tok/s: 8433392 +44/20000 train_loss: 4.2100 train_time: 0.1m tok/s: 8427377 +45/20000 train_loss: 4.3478 train_time: 0.1m tok/s: 8425611 +46/20000 train_loss: 4.2576 train_time: 0.1m tok/s: 8420942 +47/20000 train_loss: 4.1263 train_time: 0.1m tok/s: 8414430 +48/20000 train_loss: 4.1734 train_time: 0.1m tok/s: 8417757 +49/20000 train_loss: 4.1183 train_time: 0.1m tok/s: 8416079 +50/20000 train_loss: 4.0846 train_time: 0.1m tok/s: 8414980 +51/20000 train_loss: 4.2841 train_time: 0.1m tok/s: 8413899 +52/20000 train_loss: 4.2067 train_time: 0.1m tok/s: 8411307 +53/20000 train_loss: 4.1498 train_time: 0.1m tok/s: 8409478 +54/20000 train_loss: 4.1442 train_time: 0.1m tok/s: 8407681 +55/20000 train_loss: 4.1657 train_time: 0.1m tok/s: 8406581 +56/20000 train_loss: 4.0829 train_time: 0.1m tok/s: 8403378 +57/20000 train_loss: 4.1253 train_time: 0.1m tok/s: 8401604 +58/20000 train_loss: 4.0528 train_time: 0.1m tok/s: 8399519 +59/20000 train_loss: 4.0168 train_time: 0.1m tok/s: 8395045 +60/20000 train_loss: 3.9349 train_time: 0.1m tok/s: 8398565 +61/20000 train_loss: 3.9410 train_time: 0.1m tok/s: 8398192 +62/20000 train_loss: 4.0526 train_time: 0.1m tok/s: 8397311 +63/20000 train_loss: 4.1311 train_time: 0.1m tok/s: 8396873 +64/20000 train_loss: 3.9277 train_time: 0.1m tok/s: 8396797 +65/20000 train_loss: 4.0415 train_time: 0.1m tok/s: 8395608 +66/20000 train_loss: 3.9907 train_time: 0.1m tok/s: 8393313 +67/20000 train_loss: 3.9136 train_time: 0.1m tok/s: 8391572 +68/20000 train_loss: 3.9503 train_time: 0.1m tok/s: 8390613 +69/20000 train_loss: 3.8631 train_time: 0.1m tok/s: 8388166 +70/20000 train_loss: 3.9591 train_time: 0.1m tok/s: 8387782 +71/20000 train_loss: 3.8844 train_time: 0.1m tok/s: 8384314 +72/20000 train_loss: 4.0635 train_time: 0.1m tok/s: 8386589 +73/20000 train_loss: 3.8777 train_time: 0.1m tok/s: 8386262 +74/20000 train_loss: 3.8834 train_time: 0.1m tok/s: 8385770 +75/20000 train_loss: 3.8775 train_time: 0.1m tok/s: 8384765 +76/20000 train_loss: 3.8369 train_time: 0.1m tok/s: 8384224 +77/20000 train_loss: 3.7920 train_time: 0.1m tok/s: 8383046 +78/20000 train_loss: 3.7242 train_time: 0.1m tok/s: 8382638 +79/20000 train_loss: 3.8491 train_time: 0.1m tok/s: 8381667 +80/20000 train_loss: 3.7660 train_time: 0.1m tok/s: 8379863 +81/20000 train_loss: 3.7007 train_time: 0.1m tok/s: 8378579 +82/20000 train_loss: 3.7293 train_time: 0.1m tok/s: 8377766 +83/20000 train_loss: 3.6029 train_time: 0.1m tok/s: 8377401 +84/20000 train_loss: 3.6624 train_time: 0.1m tok/s: 8376391 +85/20000 train_loss: 3.6209 train_time: 0.1m tok/s: 8375913 +86/20000 train_loss: 3.4084 train_time: 0.1m tok/s: 8375604 +87/20000 train_loss: 3.6481 train_time: 0.1m tok/s: 8375139 +88/20000 train_loss: 3.5255 train_time: 0.1m tok/s: 8374479 +89/20000 train_loss: 3.5654 train_time: 0.1m tok/s: 8373441 +90/20000 train_loss: 3.5703 train_time: 0.1m tok/s: 8373195 +91/20000 train_loss: 3.6038 train_time: 0.1m tok/s: 8371944 +92/20000 train_loss: 3.6815 train_time: 0.1m tok/s: 8371849 +93/20000 train_loss: 3.5825 train_time: 0.1m tok/s: 8371596 +94/20000 train_loss: 3.6130 train_time: 0.1m tok/s: 8372385 +95/20000 train_loss: 3.5798 train_time: 0.1m tok/s: 8372287 +96/20000 train_loss: 3.5515 train_time: 0.2m tok/s: 8369452 +97/20000 train_loss: 3.4463 train_time: 0.2m tok/s: 8372153 +98/20000 train_loss: 3.5158 train_time: 0.2m tok/s: 8370254 +99/20000 train_loss: 3.4721 train_time: 0.2m tok/s: 8371103 +100/20000 train_loss: 3.3905 train_time: 0.2m tok/s: 8371058 +101/20000 train_loss: 3.4075 train_time: 0.2m tok/s: 8370635 +102/20000 train_loss: 3.4686 train_time: 0.2m tok/s: 8370574 +103/20000 train_loss: 3.3420 train_time: 0.2m tok/s: 8370118 +104/20000 train_loss: 3.4586 train_time: 0.2m tok/s: 8369914 +105/20000 train_loss: 3.3399 train_time: 0.2m tok/s: 8369289 +106/20000 train_loss: 3.4719 train_time: 0.2m tok/s: 8368960 +107/20000 train_loss: 3.2130 train_time: 0.2m tok/s: 8368677 +108/20000 train_loss: 3.3850 train_time: 0.2m tok/s: 8367241 +109/20000 train_loss: 3.3858 train_time: 0.2m tok/s: 8365166 +110/20000 train_loss: 3.4035 train_time: 0.2m tok/s: 8365232 +111/20000 train_loss: 3.4085 train_time: 0.2m tok/s: 8364930 +112/20000 train_loss: 3.4062 train_time: 0.2m tok/s: 8365137 +113/20000 train_loss: 3.3182 train_time: 0.2m tok/s: 8364191 +114/20000 train_loss: 3.3715 train_time: 0.2m tok/s: 8364443 +115/20000 train_loss: 3.4116 train_time: 0.2m tok/s: 8364117 +116/20000 train_loss: 3.2150 train_time: 0.2m tok/s: 8362541 +117/20000 train_loss: 3.4154 train_time: 0.2m tok/s: 8362486 +118/20000 train_loss: 3.3629 train_time: 0.2m tok/s: 8362718 +119/20000 train_loss: 3.3422 train_time: 0.2m tok/s: 8361919 +120/20000 train_loss: 3.3291 train_time: 0.2m tok/s: 8361520 +121/20000 train_loss: 3.2837 train_time: 0.2m tok/s: 8362024 +122/20000 train_loss: 3.2962 train_time: 0.2m tok/s: 8362482 +123/20000 train_loss: 3.2863 train_time: 0.2m tok/s: 8361681 +124/20000 train_loss: 3.3314 train_time: 0.2m tok/s: 8360533 +125/20000 train_loss: 3.2245 train_time: 0.2m tok/s: 8360308 +126/20000 train_loss: 3.2509 train_time: 0.2m tok/s: 8360895 +127/20000 train_loss: 3.2722 train_time: 0.2m tok/s: 8360007 +128/20000 train_loss: 3.3235 train_time: 0.2m tok/s: 8359441 +129/20000 train_loss: 3.2893 train_time: 0.2m tok/s: 8359019 +130/20000 train_loss: 3.2684 train_time: 0.2m tok/s: 8358602 +131/20000 train_loss: 3.2174 train_time: 0.2m tok/s: 8357503 +132/20000 train_loss: 3.1758 train_time: 0.2m tok/s: 8357724 +133/20000 train_loss: 3.2191 train_time: 0.2m tok/s: 8357387 +134/20000 train_loss: 3.1312 train_time: 0.2m tok/s: 8357116 +135/20000 train_loss: 2.9590 train_time: 0.2m tok/s: 8354264 +136/20000 train_loss: 3.2276 train_time: 0.2m tok/s: 8352483 +137/20000 train_loss: 3.0676 train_time: 0.2m tok/s: 8351941 +138/20000 train_loss: 3.2728 train_time: 0.2m tok/s: 8351669 +139/20000 train_loss: 3.2339 train_time: 0.2m tok/s: 8351543 +140/20000 train_loss: 3.1732 train_time: 0.2m tok/s: 8351187 +141/20000 train_loss: 3.0811 train_time: 0.2m tok/s: 8350352 +142/20000 train_loss: 3.2845 train_time: 0.2m tok/s: 8350227 +143/20000 train_loss: 3.3495 train_time: 0.2m tok/s: 8349344 +144/20000 train_loss: 3.2825 train_time: 0.2m tok/s: 8349237 +145/20000 train_loss: 3.2420 train_time: 0.2m tok/s: 8349255 +146/20000 train_loss: 3.2620 train_time: 0.2m tok/s: 8349028 +147/20000 train_loss: 3.1630 train_time: 0.2m tok/s: 8348596 +148/20000 train_loss: 3.1951 train_time: 0.2m tok/s: 8348804 +149/20000 train_loss: 3.2533 train_time: 0.2m tok/s: 8348957 +150/20000 train_loss: 3.1954 train_time: 0.2m tok/s: 8349073 +151/20000 train_loss: 3.5461 train_time: 0.2m tok/s: 8348667 +152/20000 train_loss: 3.1605 train_time: 0.2m tok/s: 8347856 +153/20000 train_loss: 3.2939 train_time: 0.2m tok/s: 8347746 +154/20000 train_loss: 3.1933 train_time: 0.2m tok/s: 8347303 +155/20000 train_loss: 3.1406 train_time: 0.2m tok/s: 8346526 +156/20000 train_loss: 3.0464 train_time: 0.2m tok/s: 8346073 +157/20000 train_loss: 3.0941 train_time: 0.2m tok/s: 8346045 +158/20000 train_loss: 3.1929 train_time: 0.2m tok/s: 8345518 +159/20000 train_loss: 3.0508 train_time: 0.2m tok/s: 8345449 +160/20000 train_loss: 3.1802 train_time: 0.3m tok/s: 8345670 +161/20000 train_loss: 3.1346 train_time: 0.3m tok/s: 8345354 +162/20000 train_loss: 3.0666 train_time: 0.3m tok/s: 8344563 +163/20000 train_loss: 3.1402 train_time: 0.3m tok/s: 8344924 +164/20000 train_loss: 3.0222 train_time: 0.3m tok/s: 8344105 +165/20000 train_loss: 3.2044 train_time: 0.3m tok/s: 8343781 +166/20000 train_loss: 3.1368 train_time: 0.3m tok/s: 8343262 +167/20000 train_loss: 3.1235 train_time: 0.3m tok/s: 8343111 +168/20000 train_loss: 3.1762 train_time: 0.3m tok/s: 8343011 +169/20000 train_loss: 3.0927 train_time: 0.3m tok/s: 8343248 +170/20000 train_loss: 2.8103 train_time: 0.3m tok/s: 8342353 +171/20000 train_loss: 3.1271 train_time: 0.3m tok/s: 8341850 +172/20000 train_loss: 3.0887 train_time: 0.3m tok/s: 8341999 +173/20000 train_loss: 3.2275 train_time: 0.3m tok/s: 8342225 +174/20000 train_loss: 3.1115 train_time: 0.3m tok/s: 8341862 +175/20000 train_loss: 3.1433 train_time: 0.3m tok/s: 8341547 +176/20000 train_loss: 3.1562 train_time: 0.3m tok/s: 8341559 +177/20000 train_loss: 3.1248 train_time: 0.3m tok/s: 8340986 +178/20000 train_loss: 2.9536 train_time: 0.3m tok/s: 8340531 +179/20000 train_loss: 3.3035 train_time: 0.3m tok/s: 8340452 +180/20000 train_loss: 2.9681 train_time: 0.3m tok/s: 8340130 +181/20000 train_loss: 2.9522 train_time: 0.3m tok/s: 8339379 +182/20000 train_loss: 3.0496 train_time: 0.3m tok/s: 8339044 +183/20000 train_loss: 2.9867 train_time: 0.3m tok/s: 8338760 +184/20000 train_loss: 2.9953 train_time: 0.3m tok/s: 8338580 +185/20000 train_loss: 2.7162 train_time: 0.3m tok/s: 8337177 +186/20000 train_loss: 3.1078 train_time: 0.3m tok/s: 8336413 +187/20000 train_loss: 3.0456 train_time: 0.3m tok/s: 8336414 +188/20000 train_loss: 3.1972 train_time: 0.3m tok/s: 8336285 +189/20000 train_loss: 3.5217 train_time: 0.3m tok/s: 8335894 +190/20000 train_loss: 3.0774 train_time: 0.3m tok/s: 8335445 +191/20000 train_loss: 3.0454 train_time: 0.3m tok/s: 8335446 +192/20000 train_loss: 3.0061 train_time: 0.3m tok/s: 8335553 +193/20000 train_loss: 3.0003 train_time: 0.3m tok/s: 8335771 +194/20000 train_loss: 3.0042 train_time: 0.3m tok/s: 8335576 +195/20000 train_loss: 2.8922 train_time: 0.3m tok/s: 8335374 +196/20000 train_loss: 3.1360 train_time: 0.3m tok/s: 8334911 +197/20000 train_loss: 3.0508 train_time: 0.3m tok/s: 8334869 +198/20000 train_loss: 3.0555 train_time: 0.3m tok/s: 8334961 +199/20000 train_loss: 3.0500 train_time: 0.3m tok/s: 8334937 +200/20000 train_loss: 3.0642 train_time: 0.3m tok/s: 8334778 +201/20000 train_loss: 3.1121 train_time: 0.3m tok/s: 8334035 +202/20000 train_loss: 3.3261 train_time: 0.3m tok/s: 8333679 +203/20000 train_loss: 3.0669 train_time: 0.3m tok/s: 8333420 +204/20000 train_loss: 3.0749 train_time: 0.3m tok/s: 8333424 +205/20000 train_loss: 3.0578 train_time: 0.3m tok/s: 8333438 +206/20000 train_loss: 2.9497 train_time: 0.3m tok/s: 8333110 +207/20000 train_loss: 3.0942 train_time: 0.3m tok/s: 8333149 +208/20000 train_loss: 2.9345 train_time: 0.3m tok/s: 8333048 +209/20000 train_loss: 3.0045 train_time: 0.3m tok/s: 8332535 +210/20000 train_loss: 3.0762 train_time: 0.3m tok/s: 8332171 +211/20000 train_loss: 3.2555 train_time: 0.3m tok/s: 8331559 +212/20000 train_loss: 3.0186 train_time: 0.3m tok/s: 8331508 +213/20000 train_loss: 2.9340 train_time: 0.3m tok/s: 8330991 +214/20000 train_loss: 3.0872 train_time: 0.3m tok/s: 8330868 +215/20000 train_loss: 3.0284 train_time: 0.3m tok/s: 8330971 +216/20000 train_loss: 3.0855 train_time: 0.3m tok/s: 8330777 +217/20000 train_loss: 3.0160 train_time: 0.3m tok/s: 8330610 +218/20000 train_loss: 3.0223 train_time: 0.3m tok/s: 8330716 +219/20000 train_loss: 3.1144 train_time: 0.3m tok/s: 8330851 +220/20000 train_loss: 3.3268 train_time: 0.3m tok/s: 8329808 +221/20000 train_loss: 2.9284 train_time: 0.3m tok/s: 8328757 +222/20000 train_loss: 2.9734 train_time: 0.3m tok/s: 8329106 +223/20000 train_loss: 2.9950 train_time: 0.4m tok/s: 8328883 +224/20000 train_loss: 2.9844 train_time: 0.4m tok/s: 8328885 +225/20000 train_loss: 3.0741 train_time: 0.4m tok/s: 8328184 +226/20000 train_loss: 3.0370 train_time: 0.4m tok/s: 8328434 +227/20000 train_loss: 3.0686 train_time: 0.4m tok/s: 8328257 +228/20000 train_loss: 3.0741 train_time: 0.4m tok/s: 8328334 +229/20000 train_loss: 3.0798 train_time: 0.4m tok/s: 8328553 +230/20000 train_loss: 2.9523 train_time: 0.4m tok/s: 8328663 +231/20000 train_loss: 3.0982 train_time: 0.4m tok/s: 8328445 +232/20000 train_loss: 2.9818 train_time: 0.4m tok/s: 8327806 +233/20000 train_loss: 3.0127 train_time: 0.4m tok/s: 8327473 +234/20000 train_loss: 3.0136 train_time: 0.4m tok/s: 8327455 +235/20000 train_loss: 2.9307 train_time: 0.4m tok/s: 8327614 +236/20000 train_loss: 3.0060 train_time: 0.4m tok/s: 8327565 +237/20000 train_loss: 2.8922 train_time: 0.4m tok/s: 8327524 +238/20000 train_loss: 3.0829 train_time: 0.4m tok/s: 8327304 +239/20000 train_loss: 3.0021 train_time: 0.4m tok/s: 8326925 +240/20000 train_loss: 3.1496 train_time: 0.4m tok/s: 8326564 +241/20000 train_loss: 3.0063 train_time: 0.4m tok/s: 8326632 +242/20000 train_loss: 3.0886 train_time: 0.4m tok/s: 8326500 +243/20000 train_loss: 2.9992 train_time: 0.4m tok/s: 8326661 +244/20000 train_loss: 3.0420 train_time: 0.4m tok/s: 8326827 +245/20000 train_loss: 2.9824 train_time: 0.4m tok/s: 8326982 +246/20000 train_loss: 3.0376 train_time: 0.4m tok/s: 8327069 +247/20000 train_loss: 2.9721 train_time: 0.4m tok/s: 8326719 +248/20000 train_loss: 2.8882 train_time: 0.4m tok/s: 8326402 +249/20000 train_loss: 2.9745 train_time: 0.4m tok/s: 8326501 +250/20000 train_loss: 2.9782 train_time: 0.4m tok/s: 8326792 +251/20000 train_loss: 2.9310 train_time: 0.4m tok/s: 8326498 +252/20000 train_loss: 2.9322 train_time: 0.4m tok/s: 8326463 +253/20000 train_loss: 3.0230 train_time: 0.4m tok/s: 8326498 +254/20000 train_loss: 3.0826 train_time: 0.4m tok/s: 8326472 +255/20000 train_loss: 3.1003 train_time: 0.4m tok/s: 8326163 +256/20000 train_loss: 2.9595 train_time: 0.4m tok/s: 8326222 +257/20000 train_loss: 2.9649 train_time: 0.4m tok/s: 8326386 +258/20000 train_loss: 3.0141 train_time: 0.4m tok/s: 8325905 +259/20000 train_loss: 2.9421 train_time: 0.4m tok/s: 8325522 +260/20000 train_loss: 3.1486 train_time: 0.4m tok/s: 8325500 +261/20000 train_loss: 2.9329 train_time: 0.4m tok/s: 8325123 +262/20000 train_loss: 2.7793 train_time: 0.4m tok/s: 8325276 +263/20000 train_loss: 2.7964 train_time: 0.4m tok/s: 8325201 +264/20000 train_loss: 2.9692 train_time: 0.4m tok/s: 8325313 +265/20000 train_loss: 2.9903 train_time: 0.4m tok/s: 8325249 +266/20000 train_loss: 2.9165 train_time: 0.4m tok/s: 8324718 +267/20000 train_loss: 2.9364 train_time: 0.4m tok/s: 8324673 +268/20000 train_loss: 3.0140 train_time: 0.4m tok/s: 8324557 +269/20000 train_loss: 3.0015 train_time: 0.4m tok/s: 8324801 +270/20000 train_loss: 2.9985 train_time: 0.4m tok/s: 8324723 +271/20000 train_loss: 3.0015 train_time: 0.4m tok/s: 8324482 +272/20000 train_loss: 3.0707 train_time: 0.4m tok/s: 8324532 +273/20000 train_loss: 2.9259 train_time: 0.4m tok/s: 8324708 +274/20000 train_loss: 3.0270 train_time: 0.4m tok/s: 8324589 +275/20000 train_loss: 2.9566 train_time: 0.4m tok/s: 8324817 +276/20000 train_loss: 2.8796 train_time: 0.4m tok/s: 8325008 +277/20000 train_loss: 2.8659 train_time: 0.4m tok/s: 8324878 +278/20000 train_loss: 2.8370 train_time: 0.4m tok/s: 8324470 +279/20000 train_loss: 2.9716 train_time: 0.4m tok/s: 8324032 +280/20000 train_loss: 3.0067 train_time: 0.4m tok/s: 8323822 +281/20000 train_loss: 2.7586 train_time: 0.4m tok/s: 8323721 +282/20000 train_loss: 3.0673 train_time: 0.4m tok/s: 8323736 +283/20000 train_loss: 2.8712 train_time: 0.4m tok/s: 8323566 +284/20000 train_loss: 2.9119 train_time: 0.4m tok/s: 8323398 +285/20000 train_loss: 2.9630 train_time: 0.4m tok/s: 8323230 +286/20000 train_loss: 2.9876 train_time: 0.5m tok/s: 8323063 +287/20000 train_loss: 2.8311 train_time: 0.5m tok/s: 8322641 +288/20000 train_loss: 2.9711 train_time: 0.5m tok/s: 8322535 +289/20000 train_loss: 2.8782 train_time: 0.5m tok/s: 8322489 +290/20000 train_loss: 2.9014 train_time: 0.5m tok/s: 8322464 +291/20000 train_loss: 2.8781 train_time: 0.5m tok/s: 8322437 +292/20000 train_loss: 2.7074 train_time: 0.5m tok/s: 8322171 +293/20000 train_loss: 2.9368 train_time: 0.5m tok/s: 8322084 +294/20000 train_loss: 3.0567 train_time: 0.5m tok/s: 8321958 +295/20000 train_loss: 2.9989 train_time: 0.5m tok/s: 8321845 +296/20000 train_loss: 3.0624 train_time: 0.5m tok/s: 8321676 +297/20000 train_loss: 2.9461 train_time: 0.5m tok/s: 8321149 +298/20000 train_loss: 2.9828 train_time: 0.5m tok/s: 8321222 +299/20000 train_loss: 2.8115 train_time: 0.5m tok/s: 8321209 +300/20000 train_loss: 3.0230 train_time: 0.5m tok/s: 8321236 +301/20000 train_loss: 2.9652 train_time: 0.5m tok/s: 8321152 +302/20000 train_loss: 2.8643 train_time: 0.5m tok/s: 8320985 +303/20000 train_loss: 2.9242 train_time: 0.5m tok/s: 8321043 +304/20000 train_loss: 2.9326 train_time: 0.5m tok/s: 8321014 +305/20000 train_loss: 2.9342 train_time: 0.5m tok/s: 8320778 +306/20000 train_loss: 3.0082 train_time: 0.5m tok/s: 8320483 +307/20000 train_loss: 2.9163 train_time: 0.5m tok/s: 8320434 +308/20000 train_loss: 2.8961 train_time: 0.5m tok/s: 8320428 +309/20000 train_loss: 3.0394 train_time: 0.5m tok/s: 8320220 +310/20000 train_loss: 2.8561 train_time: 0.5m tok/s: 8320351 +311/20000 train_loss: 2.9256 train_time: 0.5m tok/s: 8320512 +312/20000 train_loss: 2.8312 train_time: 0.5m tok/s: 8320564 +313/20000 train_loss: 2.8318 train_time: 0.5m tok/s: 8320334 +314/20000 train_loss: 2.8708 train_time: 0.5m tok/s: 8320264 +315/20000 train_loss: 2.9573 train_time: 0.5m tok/s: 8320051 +316/20000 train_loss: 2.6925 train_time: 0.5m tok/s: 8319713 +317/20000 train_loss: 2.8096 train_time: 0.5m tok/s: 8319433 +318/20000 train_loss: 2.9174 train_time: 0.5m tok/s: 8319067 +319/20000 train_loss: 2.9107 train_time: 0.5m tok/s: 8318821 +320/20000 train_loss: 3.0278 train_time: 0.5m tok/s: 8318774 +321/20000 train_loss: 2.9987 train_time: 0.5m tok/s: 8318496 +322/20000 train_loss: 2.9631 train_time: 0.5m tok/s: 8318708 +323/20000 train_loss: 3.0037 train_time: 0.5m tok/s: 8318749 +324/20000 train_loss: 2.9093 train_time: 0.5m tok/s: 8318777 +325/20000 train_loss: 2.8894 train_time: 0.5m tok/s: 8318735 +326/20000 train_loss: 2.8964 train_time: 0.5m tok/s: 8318777 +327/20000 train_loss: 2.8364 train_time: 0.5m tok/s: 8318543 +328/20000 train_loss: 2.8598 train_time: 0.5m tok/s: 8318468 +329/20000 train_loss: 2.8144 train_time: 0.5m tok/s: 8318226 +330/20000 train_loss: 2.7720 train_time: 0.5m tok/s: 8318449 +331/20000 train_loss: 2.8929 train_time: 0.5m tok/s: 8317809 +332/20000 train_loss: 2.9668 train_time: 0.5m tok/s: 8317725 +333/20000 train_loss: 2.8694 train_time: 0.5m tok/s: 8317810 +334/20000 train_loss: 3.0914 train_time: 0.5m tok/s: 8317731 +335/20000 train_loss: 2.8409 train_time: 0.5m tok/s: 8317627 +336/20000 train_loss: 2.9330 train_time: 0.5m tok/s: 8317527 +337/20000 train_loss: 2.8287 train_time: 0.5m tok/s: 8317603 +338/20000 train_loss: 2.8935 train_time: 0.5m tok/s: 8317665 +339/20000 train_loss: 2.9375 train_time: 0.5m tok/s: 8317654 +340/20000 train_loss: 2.9653 train_time: 0.5m tok/s: 8317541 +341/20000 train_loss: 2.9154 train_time: 0.5m tok/s: 8317540 +342/20000 train_loss: 2.8096 train_time: 0.5m tok/s: 8317545 +343/20000 train_loss: 2.9121 train_time: 0.5m tok/s: 8317530 +344/20000 train_loss: 2.8223 train_time: 0.5m tok/s: 8317208 +345/20000 train_loss: 2.8568 train_time: 0.5m tok/s: 8317137 +346/20000 train_loss: 2.8734 train_time: 0.5m tok/s: 8317125 +347/20000 train_loss: 2.8952 train_time: 0.5m tok/s: 8317109 +348/20000 train_loss: 2.8599 train_time: 0.5m tok/s: 8317004 +349/20000 train_loss: 2.9313 train_time: 0.6m tok/s: 8316918 +350/20000 train_loss: 2.7742 train_time: 0.6m tok/s: 8316821 +351/20000 train_loss: 2.7963 train_time: 0.6m tok/s: 8316859 +352/20000 train_loss: 2.7699 train_time: 0.6m tok/s: 8316605 +353/20000 train_loss: 2.6122 train_time: 0.6m tok/s: 8316291 +354/20000 train_loss: 2.9951 train_time: 0.6m tok/s: 8316017 +355/20000 train_loss: 2.9251 train_time: 0.6m tok/s: 8315840 +356/20000 train_loss: 2.8299 train_time: 0.6m tok/s: 8315357 +357/20000 train_loss: 2.7801 train_time: 0.6m tok/s: 8315162 +358/20000 train_loss: 2.7884 train_time: 0.6m tok/s: 8315157 +359/20000 train_loss: 2.8868 train_time: 0.6m tok/s: 8315232 +360/20000 train_loss: 2.8882 train_time: 0.6m tok/s: 8314794 +361/20000 train_loss: 2.9605 train_time: 0.6m tok/s: 8314660 +362/20000 train_loss: 2.8657 train_time: 0.6m tok/s: 8314479 +363/20000 train_loss: 2.9489 train_time: 0.6m tok/s: 8314333 +364/20000 train_loss: 2.8112 train_time: 0.6m tok/s: 8314163 +365/20000 train_loss: 2.8040 train_time: 0.6m tok/s: 8314218 +366/20000 train_loss: 2.8063 train_time: 0.6m tok/s: 8314094 +367/20000 train_loss: 2.9229 train_time: 0.6m tok/s: 8314038 +368/20000 train_loss: 2.7374 train_time: 0.6m tok/s: 8313859 +369/20000 train_loss: 2.8905 train_time: 0.6m tok/s: 8313919 +370/20000 train_loss: 2.8647 train_time: 0.6m tok/s: 8313948 +371/20000 train_loss: 2.8755 train_time: 0.6m tok/s: 8313953 +372/20000 train_loss: 2.8334 train_time: 0.6m tok/s: 8313824 +373/20000 train_loss: 2.7208 train_time: 0.6m tok/s: 8313886 +374/20000 train_loss: 2.7236 train_time: 0.6m tok/s: 8313570 +375/20000 train_loss: 2.6771 train_time: 0.6m tok/s: 8313608 +376/20000 train_loss: 2.9089 train_time: 0.6m tok/s: 8313228 +377/20000 train_loss: 2.7247 train_time: 0.6m tok/s: 8313199 +378/20000 train_loss: 2.8152 train_time: 0.6m tok/s: 8313055 +379/20000 train_loss: 2.8727 train_time: 0.6m tok/s: 8313092 +380/20000 train_loss: 2.8813 train_time: 0.6m tok/s: 8313202 +381/20000 train_loss: 2.8994 train_time: 0.6m tok/s: 8312839 +382/20000 train_loss: 2.9558 train_time: 0.6m tok/s: 8312738 +383/20000 train_loss: 2.9420 train_time: 0.6m tok/s: 8312825 +384/20000 train_loss: 2.8161 train_time: 0.6m tok/s: 8312469 +385/20000 train_loss: 2.8350 train_time: 0.6m tok/s: 8312334 +386/20000 train_loss: 2.8727 train_time: 0.6m tok/s: 8312183 +387/20000 train_loss: 3.0555 train_time: 0.6m tok/s: 8311931 +388/20000 train_loss: 2.8756 train_time: 0.6m tok/s: 8311833 +389/20000 train_loss: 2.9076 train_time: 0.6m tok/s: 8311548 +390/20000 train_loss: 2.7545 train_time: 0.6m tok/s: 8311604 +391/20000 train_loss: 2.7082 train_time: 0.6m tok/s: 8311554 +392/20000 train_loss: 2.7721 train_time: 0.6m tok/s: 8311746 +393/20000 train_loss: 2.8390 train_time: 0.6m tok/s: 8311911 +394/20000 train_loss: 2.8307 train_time: 0.6m tok/s: 8311860 +395/20000 train_loss: 2.9168 train_time: 0.6m tok/s: 8311794 +396/20000 train_loss: 2.8213 train_time: 0.6m tok/s: 8311564 +397/20000 train_loss: 2.8217 train_time: 0.6m tok/s: 8311345 +398/20000 train_loss: 2.8653 train_time: 0.6m tok/s: 8311151 +399/20000 train_loss: 2.7668 train_time: 0.6m tok/s: 8311020 +400/20000 train_loss: 2.8638 train_time: 0.6m tok/s: 8311170 +401/20000 train_loss: 2.8552 train_time: 0.6m tok/s: 8311136 +402/20000 train_loss: 2.7240 train_time: 0.6m tok/s: 8311294 +403/20000 train_loss: 2.9468 train_time: 0.6m tok/s: 8311402 +404/20000 train_loss: 2.9262 train_time: 0.6m tok/s: 8310989 +405/20000 train_loss: 2.9195 train_time: 0.6m tok/s: 8310821 +406/20000 train_loss: 2.8040 train_time: 0.6m tok/s: 8310457 +407/20000 train_loss: 2.8310 train_time: 0.6m tok/s: 8310252 +408/20000 train_loss: 2.8316 train_time: 0.6m tok/s: 8310078 +409/20000 train_loss: 2.7936 train_time: 0.6m tok/s: 8309962 +410/20000 train_loss: 2.8690 train_time: 0.6m tok/s: 8309941 +411/20000 train_loss: 2.8105 train_time: 0.6m tok/s: 8309754 +412/20000 train_loss: 2.8156 train_time: 0.6m tok/s: 8309762 +413/20000 train_loss: 2.7043 train_time: 0.7m tok/s: 8309782 +414/20000 train_loss: 2.7207 train_time: 0.7m tok/s: 8309702 +415/20000 train_loss: 2.6995 train_time: 0.7m tok/s: 8309665 +416/20000 train_loss: 2.7666 train_time: 0.7m tok/s: 8309452 +417/20000 train_loss: 2.7715 train_time: 0.7m tok/s: 8309250 +418/20000 train_loss: 2.7918 train_time: 0.7m tok/s: 8309309 +419/20000 train_loss: 2.8103 train_time: 0.7m tok/s: 8309269 +420/20000 train_loss: 2.7951 train_time: 0.7m tok/s: 8309271 +421/20000 train_loss: 2.8589 train_time: 0.7m tok/s: 8309143 +422/20000 train_loss: 2.8358 train_time: 0.7m tok/s: 8309169 +423/20000 train_loss: 2.8318 train_time: 0.7m tok/s: 8309097 +424/20000 train_loss: 2.9031 train_time: 0.7m tok/s: 8309029 +425/20000 train_loss: 2.7979 train_time: 0.7m tok/s: 8308819 +426/20000 train_loss: 2.8220 train_time: 0.7m tok/s: 8308474 +427/20000 train_loss: 2.8287 train_time: 0.7m tok/s: 8308378 +428/20000 train_loss: 2.7904 train_time: 0.7m tok/s: 8308331 +429/20000 train_loss: 2.7282 train_time: 0.7m tok/s: 8308251 +430/20000 train_loss: 2.8610 train_time: 0.7m tok/s: 8308174 +431/20000 train_loss: 2.6740 train_time: 0.7m tok/s: 8308041 +432/20000 train_loss: 2.7314 train_time: 0.7m tok/s: 8308172 +433/20000 train_loss: 2.6685 train_time: 0.7m tok/s: 8307916 +434/20000 train_loss: 2.6623 train_time: 0.7m tok/s: 8307597 +435/20000 train_loss: 2.8675 train_time: 0.7m tok/s: 8307291 +436/20000 train_loss: 2.4910 train_time: 0.7m tok/s: 8307292 +437/20000 train_loss: 2.7312 train_time: 0.7m tok/s: 8307219 +438/20000 train_loss: 2.8559 train_time: 0.7m tok/s: 8307147 +439/20000 train_loss: 2.7570 train_time: 0.7m tok/s: 8306825 +440/20000 train_loss: 2.6748 train_time: 0.7m tok/s: 8306770 +441/20000 train_loss: 2.9077 train_time: 0.7m tok/s: 8306730 +442/20000 train_loss: 2.9646 train_time: 0.7m tok/s: 8306627 +443/20000 train_loss: 2.9145 train_time: 0.7m tok/s: 8306705 +444/20000 train_loss: 2.9278 train_time: 0.7m tok/s: 8306657 +445/20000 train_loss: 2.8883 train_time: 0.7m tok/s: 8306557 +446/20000 train_loss: 2.7619 train_time: 0.7m tok/s: 8306530 +447/20000 train_loss: 2.7924 train_time: 0.7m tok/s: 8306645 +448/20000 train_loss: 2.8005 train_time: 0.7m tok/s: 8306677 +449/20000 train_loss: 2.7825 train_time: 0.7m tok/s: 8306475 +450/20000 train_loss: 2.8201 train_time: 0.7m tok/s: 8306116 +451/20000 train_loss: 2.5354 train_time: 0.7m tok/s: 8305893 +452/20000 train_loss: 2.7555 train_time: 0.7m tok/s: 8305765 +453/20000 train_loss: 2.6912 train_time: 0.7m tok/s: 8305563 +454/20000 train_loss: 2.6986 train_time: 0.7m tok/s: 8305264 +455/20000 train_loss: 2.7549 train_time: 0.7m tok/s: 8305138 +456/20000 train_loss: 2.7734 train_time: 0.7m tok/s: 8305286 +457/20000 train_loss: 2.6893 train_time: 0.7m tok/s: 8305226 +458/20000 train_loss: 2.7633 train_time: 0.7m tok/s: 8305079 +459/20000 train_loss: 2.8815 train_time: 0.7m tok/s: 8304971 +460/20000 train_loss: 2.8032 train_time: 0.7m tok/s: 8304924 +461/20000 train_loss: 2.8679 train_time: 0.7m tok/s: 8305029 +462/20000 train_loss: 2.9139 train_time: 0.7m tok/s: 8304856 +463/20000 train_loss: 2.8078 train_time: 0.7m tok/s: 8304931 +464/20000 train_loss: 2.7687 train_time: 0.7m tok/s: 8304880 +465/20000 train_loss: 2.9466 train_time: 0.7m tok/s: 8304715 +466/20000 train_loss: 2.8513 train_time: 0.7m tok/s: 8304536 +467/20000 train_loss: 2.8444 train_time: 0.7m tok/s: 8304429 +468/20000 train_loss: 2.9834 train_time: 0.7m tok/s: 8304240 +469/20000 train_loss: 2.7333 train_time: 0.7m tok/s: 8303983 +470/20000 train_loss: 2.7583 train_time: 0.7m tok/s: 8303947 +471/20000 train_loss: 2.8740 train_time: 0.7m tok/s: 8303671 +472/20000 train_loss: 2.9741 train_time: 0.7m tok/s: 8303094 +473/20000 train_loss: 2.7041 train_time: 0.7m tok/s: 8302687 +474/20000 train_loss: 2.6860 train_time: 0.7m tok/s: 8302659 +475/20000 train_loss: 2.8513 train_time: 0.7m tok/s: 8302675 +476/20000 train_loss: 2.6142 train_time: 0.8m tok/s: 8302452 +477/20000 train_loss: 2.7164 train_time: 0.8m tok/s: 8302446 +478/20000 train_loss: 2.8179 train_time: 0.8m tok/s: 8302473 +479/20000 train_loss: 2.7911 train_time: 0.8m tok/s: 8302345 +480/20000 train_loss: 3.0491 train_time: 0.8m tok/s: 8302078 +481/20000 train_loss: 2.8396 train_time: 0.8m tok/s: 8302002 +482/20000 train_loss: 2.7801 train_time: 0.8m tok/s: 8302097 +483/20000 train_loss: 2.8158 train_time: 0.8m tok/s: 8302202 +484/20000 train_loss: 2.8773 train_time: 0.8m tok/s: 8301982 +485/20000 train_loss: 2.7539 train_time: 0.8m tok/s: 8301993 +486/20000 train_loss: 2.7601 train_time: 0.8m tok/s: 8301936 +487/20000 train_loss: 2.8165 train_time: 0.8m tok/s: 8301950 +488/20000 train_loss: 2.7570 train_time: 0.8m tok/s: 8301682 +489/20000 train_loss: 2.3531 train_time: 0.8m tok/s: 8301364 +490/20000 train_loss: 2.8545 train_time: 0.8m tok/s: 8301206 +491/20000 train_loss: 2.7772 train_time: 0.8m tok/s: 8301156 +492/20000 train_loss: 2.7767 train_time: 0.8m tok/s: 8301377 +493/20000 train_loss: 2.6690 train_time: 0.8m tok/s: 8301288 +494/20000 train_loss: 2.6765 train_time: 0.8m tok/s: 8301229 +495/20000 train_loss: 2.7902 train_time: 0.8m tok/s: 8301253 +496/20000 train_loss: 2.6864 train_time: 0.8m tok/s: 8301108 +497/20000 train_loss: 2.9200 train_time: 0.8m tok/s: 8301177 +498/20000 train_loss: 2.8311 train_time: 0.8m tok/s: 8301039 +499/20000 train_loss: 2.9289 train_time: 0.8m tok/s: 8301062 +500/20000 train_loss: 2.7361 train_time: 0.8m tok/s: 8301021 +501/20000 train_loss: 2.9087 train_time: 0.8m tok/s: 8300953 +502/20000 train_loss: 2.7061 train_time: 0.8m tok/s: 8300829 +503/20000 train_loss: 2.7867 train_time: 0.8m tok/s: 8300655 +504/20000 train_loss: 2.6798 train_time: 0.8m tok/s: 8300680 +505/20000 train_loss: 2.8782 train_time: 0.8m tok/s: 8300401 +506/20000 train_loss: 2.7831 train_time: 0.8m tok/s: 8300268 +507/20000 train_loss: 2.7671 train_time: 0.8m tok/s: 8300339 +508/20000 train_loss: 2.9143 train_time: 0.8m tok/s: 8300428 +509/20000 train_loss: 2.9075 train_time: 0.8m tok/s: 8300241 +510/20000 train_loss: 2.6962 train_time: 0.8m tok/s: 8300253 +511/20000 train_loss: 2.8722 train_time: 0.8m tok/s: 8300268 +512/20000 train_loss: 2.8480 train_time: 0.8m tok/s: 8300137 +513/20000 train_loss: 2.8895 train_time: 0.8m tok/s: 8299994 +514/20000 train_loss: 2.8448 train_time: 0.8m tok/s: 8300167 +515/20000 train_loss: 2.8506 train_time: 0.8m tok/s: 8299887 +516/20000 train_loss: 2.7214 train_time: 0.8m tok/s: 8300012 +517/20000 train_loss: 2.8072 train_time: 0.8m tok/s: 8300055 +518/20000 train_loss: 2.9325 train_time: 0.8m tok/s: 8299971 +519/20000 train_loss: 2.7350 train_time: 0.8m tok/s: 8299898 +520/20000 train_loss: 2.6565 train_time: 0.8m tok/s: 8299899 +521/20000 train_loss: 2.7629 train_time: 0.8m tok/s: 8299904 +522/20000 train_loss: 2.7362 train_time: 0.8m tok/s: 8299922 +523/20000 train_loss: 2.7212 train_time: 0.8m tok/s: 8299974 +524/20000 train_loss: 2.7928 train_time: 0.8m tok/s: 8299800 +525/20000 train_loss: 2.7186 train_time: 0.8m tok/s: 8299475 +526/20000 train_loss: 2.8253 train_time: 0.8m tok/s: 8299443 +527/20000 train_loss: 2.8856 train_time: 0.8m tok/s: 8299271 +528/20000 train_loss: 2.8647 train_time: 0.8m tok/s: 8299417 +529/20000 train_loss: 2.8745 train_time: 0.8m tok/s: 8299137 +530/20000 train_loss: 2.8904 train_time: 0.8m tok/s: 8298986 +531/20000 train_loss: 3.1778 train_time: 0.8m tok/s: 8298733 +532/20000 train_loss: 3.1044 train_time: 0.8m tok/s: 8298457 +533/20000 train_loss: 2.6720 train_time: 0.8m tok/s: 8298385 +534/20000 train_loss: 2.8651 train_time: 0.8m tok/s: 8298203 +535/20000 train_loss: 2.7872 train_time: 0.8m tok/s: 8298099 +536/20000 train_loss: 2.6746 train_time: 0.8m tok/s: 8297948 +537/20000 train_loss: 2.8962 train_time: 0.8m tok/s: 8297718 +538/20000 train_loss: 2.7283 train_time: 0.8m tok/s: 8297720 +539/20000 train_loss: 2.8449 train_time: 0.9m tok/s: 8297648 +540/20000 train_loss: 2.8571 train_time: 0.9m tok/s: 8297607 +541/20000 train_loss: 2.2775 train_time: 0.9m tok/s: 8297352 +542/20000 train_loss: 2.8382 train_time: 0.9m tok/s: 8297023 +543/20000 train_loss: 2.8062 train_time: 0.9m tok/s: 8296871 +544/20000 train_loss: 2.8286 train_time: 0.9m tok/s: 8296756 +545/20000 train_loss: 2.7753 train_time: 0.9m tok/s: 8296831 +546/20000 train_loss: 2.8121 train_time: 0.9m tok/s: 8296783 +547/20000 train_loss: 2.7686 train_time: 0.9m tok/s: 8296692 +548/20000 train_loss: 2.7348 train_time: 0.9m tok/s: 8296719 +549/20000 train_loss: 2.6821 train_time: 0.9m tok/s: 8296658 +550/20000 train_loss: 2.7737 train_time: 0.9m tok/s: 8296605 +551/20000 train_loss: 2.7230 train_time: 0.9m tok/s: 8296442 +552/20000 train_loss: 2.8860 train_time: 0.9m tok/s: 8295814 +553/20000 train_loss: 2.7467 train_time: 0.9m tok/s: 8295364 +554/20000 train_loss: 2.5801 train_time: 0.9m tok/s: 8295337 +555/20000 train_loss: 2.6592 train_time: 0.9m tok/s: 8295327 +556/20000 train_loss: 2.7596 train_time: 0.9m tok/s: 8295452 +557/20000 train_loss: 2.8728 train_time: 0.9m tok/s: 8295353 +558/20000 train_loss: 2.8287 train_time: 0.9m tok/s: 8295384 +559/20000 train_loss: 2.7463 train_time: 0.9m tok/s: 8295577 +560/20000 train_loss: 2.7867 train_time: 0.9m tok/s: 8295428 +561/20000 train_loss: 2.7869 train_time: 0.9m tok/s: 8295490 +562/20000 train_loss: 2.8484 train_time: 0.9m tok/s: 8295453 +563/20000 train_loss: 2.8256 train_time: 0.9m tok/s: 8295404 +564/20000 train_loss: 2.9436 train_time: 0.9m tok/s: 8295227 +565/20000 train_loss: 2.8413 train_time: 0.9m tok/s: 8295083 +566/20000 train_loss: 2.7439 train_time: 0.9m tok/s: 8295133 +567/20000 train_loss: 2.6983 train_time: 0.9m tok/s: 8295232 +568/20000 train_loss: 2.8224 train_time: 0.9m tok/s: 8295202 +569/20000 train_loss: 2.6698 train_time: 0.9m tok/s: 8295027 +570/20000 train_loss: 2.6846 train_time: 0.9m tok/s: 8294885 +571/20000 train_loss: 2.7897 train_time: 0.9m tok/s: 8294753 +572/20000 train_loss: 2.6427 train_time: 0.9m tok/s: 8294656 +573/20000 train_loss: 2.6462 train_time: 0.9m tok/s: 8294515 +574/20000 train_loss: 2.7330 train_time: 0.9m tok/s: 8294542 +575/20000 train_loss: 2.5108 train_time: 0.9m tok/s: 8294419 +576/20000 train_loss: 2.7627 train_time: 0.9m tok/s: 8294459 +577/20000 train_loss: 2.8559 train_time: 0.9m tok/s: 8294174 +578/20000 train_loss: 2.8326 train_time: 0.9m tok/s: 8294102 +579/20000 train_loss: 2.7279 train_time: 0.9m tok/s: 8294262 +580/20000 train_loss: 2.8024 train_time: 0.9m tok/s: 8294401 +581/20000 train_loss: 2.7721 train_time: 0.9m tok/s: 8294249 +582/20000 train_loss: 2.7700 train_time: 0.9m tok/s: 8294145 +583/20000 train_loss: 2.7266 train_time: 0.9m tok/s: 8294124 +584/20000 train_loss: 2.7859 train_time: 0.9m tok/s: 8294111 +585/20000 train_loss: 2.7964 train_time: 0.9m tok/s: 8293971 +586/20000 train_loss: 2.6351 train_time: 0.9m tok/s: 8293941 +587/20000 train_loss: 2.7264 train_time: 0.9m tok/s: 8293729 +588/20000 train_loss: 2.7038 train_time: 0.9m tok/s: 8294039 +589/20000 train_loss: 2.7367 train_time: 0.9m tok/s: 8293831 +590/20000 train_loss: 2.7634 train_time: 0.9m tok/s: 8293785 +591/20000 train_loss: 2.7487 train_time: 0.9m tok/s: 8293786 +592/20000 train_loss: 2.7345 train_time: 0.9m tok/s: 8293874 +593/20000 train_loss: 2.7435 train_time: 0.9m tok/s: 8293836 +594/20000 train_loss: 2.6431 train_time: 0.9m tok/s: 8293626 +595/20000 train_loss: 2.7979 train_time: 0.9m tok/s: 8293454 +596/20000 train_loss: 2.6759 train_time: 0.9m tok/s: 8293401 +597/20000 train_loss: 2.7465 train_time: 0.9m tok/s: 8293380 +598/20000 train_loss: 2.7880 train_time: 0.9m tok/s: 8293143 +599/20000 train_loss: 2.7005 train_time: 0.9m tok/s: 8293061 +600/20000 train_loss: 2.7433 train_time: 0.9m tok/s: 8293208 +601/20000 train_loss: 2.7278 train_time: 0.9m tok/s: 8293121 +602/20000 train_loss: 2.7598 train_time: 1.0m tok/s: 8292862 +603/20000 train_loss: 2.7541 train_time: 1.0m tok/s: 8292835 +604/20000 train_loss: 2.7468 train_time: 1.0m tok/s: 8292737 +605/20000 train_loss: 2.6539 train_time: 1.0m tok/s: 8292614 +606/20000 train_loss: 2.6537 train_time: 1.0m tok/s: 8292617 +607/20000 train_loss: 2.7395 train_time: 1.0m tok/s: 8292570 +608/20000 train_loss: 2.6538 train_time: 1.0m tok/s: 8292538 +609/20000 train_loss: 2.7184 train_time: 1.0m tok/s: 8292547 +610/20000 train_loss: 2.7696 train_time: 1.0m tok/s: 8292578 +611/20000 train_loss: 2.8885 train_time: 1.0m tok/s: 8292397 +612/20000 train_loss: 2.8268 train_time: 1.0m tok/s: 8292241 +613/20000 train_loss: 2.8057 train_time: 1.0m tok/s: 8292312 +614/20000 train_loss: 2.8002 train_time: 1.0m tok/s: 8292325 +615/20000 train_loss: 2.7492 train_time: 1.0m tok/s: 8292179 +616/20000 train_loss: 2.7672 train_time: 1.0m tok/s: 8292189 +617/20000 train_loss: 2.7294 train_time: 1.0m tok/s: 8292186 +618/20000 train_loss: 2.7387 train_time: 1.0m tok/s: 8292082 +619/20000 train_loss: 2.7837 train_time: 1.0m tok/s: 8292024 +620/20000 train_loss: 2.8850 train_time: 1.0m tok/s: 8292193 +621/20000 train_loss: 2.6857 train_time: 1.0m tok/s: 8292304 +622/20000 train_loss: 2.7209 train_time: 1.0m tok/s: 8292369 +623/20000 train_loss: 2.7269 train_time: 1.0m tok/s: 8292235 +624/20000 train_loss: 2.4428 train_time: 1.0m tok/s: 8292162 +625/20000 train_loss: 2.7508 train_time: 1.0m tok/s: 8292076 +626/20000 train_loss: 2.8956 train_time: 1.0m tok/s: 8292052 +627/20000 train_loss: 2.6854 train_time: 1.0m tok/s: 8291906 +628/20000 train_loss: 2.8623 train_time: 1.0m tok/s: 8291765 +629/20000 train_loss: 2.8480 train_time: 1.0m tok/s: 8291793 +630/20000 train_loss: 2.6961 train_time: 1.0m tok/s: 8291957 +631/20000 train_loss: 2.8210 train_time: 1.0m tok/s: 8291899 +632/20000 train_loss: 2.8361 train_time: 1.0m tok/s: 8291999 +633/20000 train_loss: 2.7132 train_time: 1.0m tok/s: 8292019 +634/20000 train_loss: 2.9400 train_time: 1.0m tok/s: 8291947 +635/20000 train_loss: 2.7373 train_time: 1.0m tok/s: 8291790 +636/20000 train_loss: 2.8682 train_time: 1.0m tok/s: 8291634 +637/20000 train_loss: 2.7559 train_time: 1.0m tok/s: 8291626 +638/20000 train_loss: 2.5723 train_time: 1.0m tok/s: 8291654 +639/20000 train_loss: 2.7300 train_time: 1.0m tok/s: 8291261 +640/20000 train_loss: 2.7084 train_time: 1.0m tok/s: 8291200 +641/20000 train_loss: 2.7875 train_time: 1.0m tok/s: 8291288 +642/20000 train_loss: 2.7913 train_time: 1.0m tok/s: 8291385 +643/20000 train_loss: 2.7552 train_time: 1.0m tok/s: 8291324 +644/20000 train_loss: 2.8034 train_time: 1.0m tok/s: 8291446 +645/20000 train_loss: 2.8804 train_time: 1.0m tok/s: 8291474 +646/20000 train_loss: 2.7743 train_time: 1.0m tok/s: 8291461 +647/20000 train_loss: 2.8246 train_time: 1.0m tok/s: 8291031 +648/20000 train_loss: 2.7339 train_time: 1.0m tok/s: 8290672 +649/20000 train_loss: 2.8670 train_time: 1.0m tok/s: 8290728 +650/20000 train_loss: 2.7541 train_time: 1.0m tok/s: 8290760 +651/20000 train_loss: 2.7381 train_time: 1.0m tok/s: 8290813 +652/20000 train_loss: 2.7038 train_time: 1.0m tok/s: 8290727 +653/20000 train_loss: 2.6509 train_time: 1.0m tok/s: 8290869 +654/20000 train_loss: 2.7104 train_time: 1.0m tok/s: 8291015 +655/20000 train_loss: 2.7024 train_time: 1.0m tok/s: 8290677 +656/20000 train_loss: 2.6435 train_time: 1.0m tok/s: 8290577 +657/20000 train_loss: 2.6545 train_time: 1.0m tok/s: 8290571 +658/20000 train_loss: 2.6972 train_time: 1.0m tok/s: 8290617 +659/20000 train_loss: 2.7437 train_time: 1.0m tok/s: 8290431 +660/20000 train_loss: 2.7460 train_time: 1.0m tok/s: 8290551 +661/20000 train_loss: 2.7993 train_time: 1.0m tok/s: 8290684 +662/20000 train_loss: 2.6861 train_time: 1.0m tok/s: 8290708 +663/20000 train_loss: 2.7790 train_time: 1.0m tok/s: 8290765 +664/20000 train_loss: 2.7825 train_time: 1.0m tok/s: 8290685 +665/20000 train_loss: 2.8298 train_time: 1.1m tok/s: 8290614 +666/20000 train_loss: 2.8237 train_time: 1.1m tok/s: 8290579 +667/20000 train_loss: 2.7573 train_time: 1.1m tok/s: 8290581 +668/20000 train_loss: 2.7381 train_time: 1.1m tok/s: 8290360 +669/20000 train_loss: 2.6267 train_time: 1.1m tok/s: 8290334 +670/20000 train_loss: 2.6448 train_time: 1.1m tok/s: 8290342 +671/20000 train_loss: 2.6554 train_time: 1.1m tok/s: 8290344 +672/20000 train_loss: 2.7816 train_time: 1.1m tok/s: 8290337 +673/20000 train_loss: 2.6220 train_time: 1.1m tok/s: 8290364 +674/20000 train_loss: 2.8539 train_time: 1.1m tok/s: 8290391 +675/20000 train_loss: 2.6158 train_time: 1.1m tok/s: 8290197 +676/20000 train_loss: 2.8493 train_time: 1.1m tok/s: 8290011 +677/20000 train_loss: 2.6765 train_time: 1.1m tok/s: 8289989 +678/20000 train_loss: 2.7624 train_time: 1.1m tok/s: 8289810 +679/20000 train_loss: 2.6982 train_time: 1.1m tok/s: 8289779 +680/20000 train_loss: 2.9003 train_time: 1.1m tok/s: 8289750 +681/20000 train_loss: 2.7762 train_time: 1.1m tok/s: 8289599 +682/20000 train_loss: 2.8772 train_time: 1.1m tok/s: 8289619 +683/20000 train_loss: 2.8561 train_time: 1.1m tok/s: 8289551 +684/20000 train_loss: 2.7920 train_time: 1.1m tok/s: 8289588 +685/20000 train_loss: 2.6524 train_time: 1.1m tok/s: 8289406 +686/20000 train_loss: 2.8805 train_time: 1.1m tok/s: 8289237 +687/20000 train_loss: 2.7728 train_time: 1.1m tok/s: 8289281 +688/20000 train_loss: 2.7770 train_time: 1.1m tok/s: 8289163 +689/20000 train_loss: 2.8105 train_time: 1.1m tok/s: 8288949 +690/20000 train_loss: 2.7369 train_time: 1.1m tok/s: 8288955 +691/20000 train_loss: 2.8549 train_time: 1.1m tok/s: 8288916 +692/20000 train_loss: 2.9056 train_time: 1.1m tok/s: 8288893 +693/20000 train_loss: 2.8113 train_time: 1.1m tok/s: 8288919 +694/20000 train_loss: 2.8118 train_time: 1.1m tok/s: 8288815 +695/20000 train_loss: 2.8009 train_time: 1.1m tok/s: 8288690 +696/20000 train_loss: 2.8027 train_time: 1.1m tok/s: 8288837 +697/20000 train_loss: 2.6642 train_time: 1.1m tok/s: 8288745 +698/20000 train_loss: 2.8324 train_time: 1.1m tok/s: 8288368 +699/20000 train_loss: 2.6872 train_time: 1.1m tok/s: 8288074 +700/20000 train_loss: 2.6322 train_time: 1.1m tok/s: 8287932 +701/20000 train_loss: 2.6364 train_time: 1.1m tok/s: 8287947 +702/20000 train_loss: 2.6237 train_time: 1.1m tok/s: 8287833 +703/20000 train_loss: 2.5073 train_time: 1.1m tok/s: 8287705 +704/20000 train_loss: 2.8425 train_time: 1.1m tok/s: 8287604 +705/20000 train_loss: 2.7876 train_time: 1.1m tok/s: 8287590 +706/20000 train_loss: 2.7544 train_time: 1.1m tok/s: 8287538 +707/20000 train_loss: 2.7560 train_time: 1.1m tok/s: 8287434 +708/20000 train_loss: 2.8378 train_time: 1.1m tok/s: 8287559 +709/20000 train_loss: 2.8036 train_time: 1.1m tok/s: 8287479 +710/20000 train_loss: 2.6331 train_time: 1.1m tok/s: 8287532 +711/20000 train_loss: 2.7086 train_time: 1.1m tok/s: 8287363 +712/20000 train_loss: 2.6351 train_time: 1.1m tok/s: 8287278 +713/20000 train_loss: 2.6930 train_time: 1.1m tok/s: 8287274 +714/20000 train_loss: 2.7566 train_time: 1.1m tok/s: 8287395 +715/20000 train_loss: 2.7041 train_time: 1.1m tok/s: 8287407 +716/20000 train_loss: 2.7203 train_time: 1.1m tok/s: 8287466 +717/20000 train_loss: 2.9148 train_time: 1.1m tok/s: 8287536 +718/20000 train_loss: 2.8068 train_time: 1.1m tok/s: 8287570 +719/20000 train_loss: 2.7488 train_time: 1.1m tok/s: 8287489 +720/20000 train_loss: 2.6903 train_time: 1.1m tok/s: 8287377 +721/20000 train_loss: 2.8322 train_time: 1.1m tok/s: 8287244 +722/20000 train_loss: 2.6618 train_time: 1.1m tok/s: 8287205 +723/20000 train_loss: 2.8684 train_time: 1.1m tok/s: 8287193 +724/20000 train_loss: 2.7711 train_time: 1.1m tok/s: 8287123 +725/20000 train_loss: 2.6372 train_time: 1.1m tok/s: 8287242 +726/20000 train_loss: 2.7800 train_time: 1.1m tok/s: 8287144 +727/20000 train_loss: 2.5993 train_time: 1.1m tok/s: 8287105 +728/20000 train_loss: 2.7998 train_time: 1.2m tok/s: 8287227 +729/20000 train_loss: 2.8422 train_time: 1.2m tok/s: 8287199 +730/20000 train_loss: 2.7738 train_time: 1.2m tok/s: 8287180 +731/20000 train_loss: 2.8704 train_time: 1.2m tok/s: 8287059 +732/20000 train_loss: 2.7076 train_time: 1.2m tok/s: 8287060 +733/20000 train_loss: 2.8806 train_time: 1.2m tok/s: 8287028 +734/20000 train_loss: 2.7344 train_time: 1.2m tok/s: 8287003 +735/20000 train_loss: 2.7922 train_time: 1.2m tok/s: 8287056 +736/20000 train_loss: 2.6702 train_time: 1.2m tok/s: 8286964 +737/20000 train_loss: 2.8014 train_time: 1.2m tok/s: 8286829 +738/20000 train_loss: 2.6694 train_time: 1.2m tok/s: 8286864 +739/20000 train_loss: 2.5922 train_time: 1.2m tok/s: 8286857 +740/20000 train_loss: 2.8257 train_time: 1.2m tok/s: 8286743 +741/20000 train_loss: 2.8210 train_time: 1.2m tok/s: 8286822 +742/20000 train_loss: 2.6856 train_time: 1.2m tok/s: 8286793 +743/20000 train_loss: 2.8417 train_time: 1.2m tok/s: 8286599 +744/20000 train_loss: 2.7309 train_time: 1.2m tok/s: 8286695 +745/20000 train_loss: 2.7394 train_time: 1.2m tok/s: 8286701 +746/20000 train_loss: 2.8148 train_time: 1.2m tok/s: 8286741 +747/20000 train_loss: 2.6977 train_time: 1.2m tok/s: 8286738 +748/20000 train_loss: 2.7446 train_time: 1.2m tok/s: 8286550 +749/20000 train_loss: 2.8040 train_time: 1.2m tok/s: 8286546 +750/20000 train_loss: 2.8129 train_time: 1.2m tok/s: 8286422 +751/20000 train_loss: 2.6858 train_time: 1.2m tok/s: 8286398 +752/20000 train_loss: 2.7723 train_time: 1.2m tok/s: 8286289 +753/20000 train_loss: 2.4233 train_time: 1.2m tok/s: 8285985 +754/20000 train_loss: 2.6697 train_time: 1.2m tok/s: 8285837 +755/20000 train_loss: 2.8646 train_time: 1.2m tok/s: 8285636 +756/20000 train_loss: 3.1184 train_time: 1.2m tok/s: 8285857 +757/20000 train_loss: 2.7884 train_time: 1.2m tok/s: 8285855 +758/20000 train_loss: 2.7222 train_time: 1.2m tok/s: 8285910 +759/20000 train_loss: 2.6838 train_time: 1.2m tok/s: 8285873 +760/20000 train_loss: 2.8606 train_time: 1.2m tok/s: 8285817 +761/20000 train_loss: 2.7328 train_time: 1.2m tok/s: 8285740 +762/20000 train_loss: 2.8295 train_time: 1.2m tok/s: 8285752 +763/20000 train_loss: 2.6534 train_time: 1.2m tok/s: 8285681 +764/20000 train_loss: 2.7027 train_time: 1.2m tok/s: 8285633 +765/20000 train_loss: 2.6762 train_time: 1.2m tok/s: 8285651 +766/20000 train_loss: 2.6706 train_time: 1.2m tok/s: 8285654 +767/20000 train_loss: 2.6942 train_time: 1.2m tok/s: 8285526 +768/20000 train_loss: 2.7364 train_time: 1.2m tok/s: 8285790 +769/20000 train_loss: 2.7678 train_time: 1.2m tok/s: 8285813 +770/20000 train_loss: 2.7713 train_time: 1.2m tok/s: 8285829 +771/20000 train_loss: 2.7854 train_time: 1.2m tok/s: 8285690 +772/20000 train_loss: 2.7680 train_time: 1.2m tok/s: 8285752 +773/20000 train_loss: 2.7062 train_time: 1.2m tok/s: 8285637 +774/20000 train_loss: 2.8421 train_time: 1.2m tok/s: 8285620 +775/20000 train_loss: 2.8081 train_time: 1.2m tok/s: 8285583 +776/20000 train_loss: 2.9095 train_time: 1.2m tok/s: 8285426 +777/20000 train_loss: 2.8581 train_time: 1.2m tok/s: 8285145 +778/20000 train_loss: 2.7098 train_time: 1.2m tok/s: 8285025 +779/20000 train_loss: 2.4396 train_time: 1.2m tok/s: 8284973 +780/20000 train_loss: 2.7767 train_time: 1.2m tok/s: 8284944 +781/20000 train_loss: 2.7561 train_time: 1.2m tok/s: 8284938 +782/20000 train_loss: 3.0301 train_time: 1.2m tok/s: 8284851 +783/20000 train_loss: 2.5278 train_time: 1.2m tok/s: 8284643 +784/20000 train_loss: 2.9036 train_time: 1.2m tok/s: 8284604 +785/20000 train_loss: 2.8602 train_time: 1.2m tok/s: 8284614 +786/20000 train_loss: 2.7254 train_time: 1.2m tok/s: 8284701 +787/20000 train_loss: 2.6370 train_time: 1.2m tok/s: 8284537 +788/20000 train_loss: 2.6751 train_time: 1.2m tok/s: 8284477 +789/20000 train_loss: 2.7998 train_time: 1.2m tok/s: 8284355 +790/20000 train_loss: 2.6408 train_time: 1.2m tok/s: 8284282 +791/20000 train_loss: 2.5974 train_time: 1.3m tok/s: 8284066 +792/20000 train_loss: 2.7320 train_time: 1.3m tok/s: 8284280 +793/20000 train_loss: 2.7067 train_time: 1.3m tok/s: 8284293 +794/20000 train_loss: 2.7156 train_time: 1.3m tok/s: 8284202 +795/20000 train_loss: 2.8526 train_time: 1.3m tok/s: 8284197 +796/20000 train_loss: 2.7175 train_time: 1.3m tok/s: 8284283 +797/20000 train_loss: 2.7368 train_time: 1.3m tok/s: 8284349 +798/20000 train_loss: 2.7531 train_time: 1.3m tok/s: 8284365 +799/20000 train_loss: 2.7870 train_time: 1.3m tok/s: 8284248 +800/20000 train_loss: 2.7145 train_time: 1.3m tok/s: 8284159 +801/20000 train_loss: 2.7456 train_time: 1.3m tok/s: 8284098 +802/20000 train_loss: 2.8196 train_time: 1.3m tok/s: 8284074 +803/20000 train_loss: 2.6844 train_time: 1.3m tok/s: 8283884 +804/20000 train_loss: 2.6678 train_time: 1.3m tok/s: 8283988 +805/20000 train_loss: 2.6796 train_time: 1.3m tok/s: 8283966 +806/20000 train_loss: 2.7984 train_time: 1.3m tok/s: 8283964 +807/20000 train_loss: 2.7847 train_time: 1.3m tok/s: 8284008 +808/20000 train_loss: 2.8127 train_time: 1.3m tok/s: 8284016 +809/20000 train_loss: 2.6330 train_time: 1.3m tok/s: 8284066 +810/20000 train_loss: 2.8493 train_time: 1.3m tok/s: 8284047 +811/20000 train_loss: 2.8467 train_time: 1.3m tok/s: 8284090 +812/20000 train_loss: 2.6820 train_time: 1.3m tok/s: 8284137 +813/20000 train_loss: 2.7413 train_time: 1.3m tok/s: 8284163 +814/20000 train_loss: 2.7909 train_time: 1.3m tok/s: 8284153 +815/20000 train_loss: 2.8901 train_time: 1.3m tok/s: 8284055 +816/20000 train_loss: 2.7193 train_time: 1.3m tok/s: 8283992 +817/20000 train_loss: 2.7113 train_time: 1.3m tok/s: 8283894 +818/20000 train_loss: 2.7761 train_time: 1.3m tok/s: 8284127 +819/20000 train_loss: 2.7927 train_time: 1.3m tok/s: 8284131 +820/20000 train_loss: 3.0427 train_time: 1.3m tok/s: 8283873 +821/20000 train_loss: 2.7823 train_time: 1.3m tok/s: 8283755 +822/20000 train_loss: 2.5985 train_time: 1.3m tok/s: 8283736 +823/20000 train_loss: 2.6452 train_time: 1.3m tok/s: 8283658 +824/20000 train_loss: 2.8128 train_time: 1.3m tok/s: 8283683 +825/20000 train_loss: 2.8863 train_time: 1.3m tok/s: 8283772 +826/20000 train_loss: 2.8625 train_time: 1.3m tok/s: 8283709 +827/20000 train_loss: 2.6439 train_time: 1.3m tok/s: 8283612 +828/20000 train_loss: 2.7236 train_time: 1.3m tok/s: 8283645 +829/20000 train_loss: 3.3586 train_time: 1.3m tok/s: 8283637 +830/20000 train_loss: 2.7508 train_time: 1.3m tok/s: 8283385 +831/20000 train_loss: 2.7402 train_time: 1.3m tok/s: 8283354 +832/20000 train_loss: 2.7701 train_time: 1.3m tok/s: 8283367 +833/20000 train_loss: 2.8682 train_time: 1.3m tok/s: 8283426 +834/20000 train_loss: 2.6907 train_time: 1.3m tok/s: 8283426 +835/20000 train_loss: 2.7753 train_time: 1.3m tok/s: 8283525 +836/20000 train_loss: 2.6145 train_time: 1.3m tok/s: 8283468 +837/20000 train_loss: 2.5094 train_time: 1.3m tok/s: 8283302 +838/20000 train_loss: 2.6224 train_time: 1.3m tok/s: 8283166 +839/20000 train_loss: 2.7189 train_time: 1.3m tok/s: 8283046 +840/20000 train_loss: 3.1215 train_time: 1.3m tok/s: 8283095 +841/20000 train_loss: 2.7179 train_time: 1.3m tok/s: 8283002 +842/20000 train_loss: 2.7246 train_time: 1.3m tok/s: 8283064 +843/20000 train_loss: 2.6458 train_time: 1.3m tok/s: 8283081 +844/20000 train_loss: 2.7318 train_time: 1.3m tok/s: 8283131 +845/20000 train_loss: 2.6844 train_time: 1.3m tok/s: 8282975 +846/20000 train_loss: 2.6742 train_time: 1.3m tok/s: 8282911 +847/20000 train_loss: 2.7202 train_time: 1.3m tok/s: 8282805 +848/20000 train_loss: 2.6418 train_time: 1.3m tok/s: 8282863 +849/20000 train_loss: 2.7445 train_time: 1.3m tok/s: 8282880 +850/20000 train_loss: 2.5808 train_time: 1.3m tok/s: 8282837 +851/20000 train_loss: 2.7577 train_time: 1.3m tok/s: 8282759 +852/20000 train_loss: 2.5650 train_time: 1.3m tok/s: 8282945 +853/20000 train_loss: 2.7158 train_time: 1.3m tok/s: 8282953 +854/20000 train_loss: 2.7030 train_time: 1.4m tok/s: 8283006 +855/20000 train_loss: 2.7567 train_time: 1.4m tok/s: 8283060 +856/20000 train_loss: 2.6897 train_time: 1.4m tok/s: 8283148 +857/20000 train_loss: 2.8367 train_time: 1.4m tok/s: 8283106 +858/20000 train_loss: 2.8318 train_time: 1.4m tok/s: 8283159 +859/20000 train_loss: 2.7234 train_time: 1.4m tok/s: 8283119 +860/20000 train_loss: 2.6616 train_time: 1.4m tok/s: 8282995 +861/20000 train_loss: 2.7028 train_time: 1.4m tok/s: 8282852 +862/20000 train_loss: 2.6611 train_time: 1.4m tok/s: 8282720 +863/20000 train_loss: 2.8908 train_time: 1.4m tok/s: 8282649 +864/20000 train_loss: 2.7454 train_time: 1.4m tok/s: 8282594 +865/20000 train_loss: 2.7705 train_time: 1.4m tok/s: 8282550 +866/20000 train_loss: 2.6292 train_time: 1.4m tok/s: 8282443 +867/20000 train_loss: 2.6718 train_time: 1.4m tok/s: 8282290 +868/20000 train_loss: 2.6810 train_time: 1.4m tok/s: 8282253 +869/20000 train_loss: 2.6973 train_time: 1.4m tok/s: 8282296 +870/20000 train_loss: 2.6658 train_time: 1.4m tok/s: 8282385 +871/20000 train_loss: 2.6591 train_time: 1.4m tok/s: 8282410 +872/20000 train_loss: 2.7496 train_time: 1.4m tok/s: 8282518 +873/20000 train_loss: 2.6622 train_time: 1.4m tok/s: 8282560 +874/20000 train_loss: 2.8041 train_time: 1.4m tok/s: 8282501 +875/20000 train_loss: 2.7573 train_time: 1.4m tok/s: 8282506 +876/20000 train_loss: 2.8069 train_time: 1.4m tok/s: 8282639 +877/20000 train_loss: 2.6919 train_time: 1.4m tok/s: 8282663 +878/20000 train_loss: 2.6855 train_time: 1.4m tok/s: 8282597 +879/20000 train_loss: 2.7334 train_time: 1.4m tok/s: 8282511 +880/20000 train_loss: 2.7440 train_time: 1.4m tok/s: 8282579 +881/20000 train_loss: 2.6585 train_time: 1.4m tok/s: 8282657 +882/20000 train_loss: 2.6874 train_time: 1.4m tok/s: 8282639 +883/20000 train_loss: 2.7764 train_time: 1.4m tok/s: 8282669 +884/20000 train_loss: 2.5184 train_time: 1.4m tok/s: 8282694 +885/20000 train_loss: 2.6316 train_time: 1.4m tok/s: 8282687 +886/20000 train_loss: 2.7051 train_time: 1.4m tok/s: 8282647 +887/20000 train_loss: 2.6697 train_time: 1.4m tok/s: 8282538 +888/20000 train_loss: 2.6353 train_time: 1.4m tok/s: 8282520 +889/20000 train_loss: 2.8020 train_time: 1.4m tok/s: 8282549 +890/20000 train_loss: 2.6013 train_time: 1.4m tok/s: 8282480 +891/20000 train_loss: 2.6885 train_time: 1.4m tok/s: 8282441 +892/20000 train_loss: 2.7038 train_time: 1.4m tok/s: 8282451 +893/20000 train_loss: 2.6438 train_time: 1.4m tok/s: 8282555 +894/20000 train_loss: 2.7089 train_time: 1.4m tok/s: 8282662 +895/20000 train_loss: 2.7350 train_time: 1.4m tok/s: 8282678 +896/20000 train_loss: 2.7955 train_time: 1.4m tok/s: 8282654 +897/20000 train_loss: 2.7189 train_time: 1.4m tok/s: 8282586 +898/20000 train_loss: 2.6886 train_time: 1.4m tok/s: 8282492 +899/20000 train_loss: 2.6657 train_time: 1.4m tok/s: 8282422 +900/20000 train_loss: 2.7236 train_time: 1.4m tok/s: 8282485 +901/20000 train_loss: 2.6352 train_time: 1.4m tok/s: 8282619 +902/20000 train_loss: 2.6263 train_time: 1.4m tok/s: 8282567 +903/20000 train_loss: 2.5843 train_time: 1.4m tok/s: 8282501 +904/20000 train_loss: 2.5642 train_time: 1.4m tok/s: 8282551 +905/20000 train_loss: 2.7046 train_time: 1.4m tok/s: 8282610 +906/20000 train_loss: 2.7732 train_time: 1.4m tok/s: 8282721 +907/20000 train_loss: 2.7487 train_time: 1.4m tok/s: 8282770 +908/20000 train_loss: 2.8283 train_time: 1.4m tok/s: 8282810 +909/20000 train_loss: 2.7663 train_time: 1.4m tok/s: 8282775 +910/20000 train_loss: 2.8212 train_time: 1.4m tok/s: 8282780 +911/20000 train_loss: 2.7224 train_time: 1.4m tok/s: 8282680 +912/20000 train_loss: 2.5403 train_time: 1.4m tok/s: 8282645 +913/20000 train_loss: 2.7242 train_time: 1.4m tok/s: 8282428 +914/20000 train_loss: 2.8067 train_time: 1.4m tok/s: 8282366 +915/20000 train_loss: 2.7448 train_time: 1.4m tok/s: 8282333 +916/20000 train_loss: 2.7434 train_time: 1.4m tok/s: 8282346 +917/20000 train_loss: 2.6750 train_time: 1.5m tok/s: 8282292 +918/20000 train_loss: 2.5485 train_time: 1.5m tok/s: 8282291 +919/20000 train_loss: 2.6226 train_time: 1.5m tok/s: 8282296 +920/20000 train_loss: 2.6432 train_time: 1.5m tok/s: 8282342 +921/20000 train_loss: 2.5005 train_time: 1.5m tok/s: 8282354 +922/20000 train_loss: 2.7089 train_time: 1.5m tok/s: 8282287 +923/20000 train_loss: 2.6289 train_time: 1.5m tok/s: 8282204 +924/20000 train_loss: 2.6164 train_time: 1.5m tok/s: 8282172 +925/20000 train_loss: 2.9349 train_time: 1.5m tok/s: 8282231 +926/20000 train_loss: 2.5570 train_time: 1.5m tok/s: 8282111 +927/20000 train_loss: 2.7460 train_time: 1.5m tok/s: 8281989 +928/20000 train_loss: 2.7942 train_time: 1.5m tok/s: 8282002 +929/20000 train_loss: 2.7155 train_time: 1.5m tok/s: 8282039 +930/20000 train_loss: 2.8777 train_time: 1.5m tok/s: 8281976 +931/20000 train_loss: 2.7602 train_time: 1.5m tok/s: 8281895 +932/20000 train_loss: 2.6924 train_time: 1.5m tok/s: 8281982 +933/20000 train_loss: 2.7150 train_time: 1.5m tok/s: 8281957 +934/20000 train_loss: 2.7210 train_time: 1.5m tok/s: 8282028 +935/20000 train_loss: 2.7939 train_time: 1.5m tok/s: 8281931 +936/20000 train_loss: 2.6099 train_time: 1.5m tok/s: 8281897 +937/20000 train_loss: 2.7213 train_time: 1.5m tok/s: 8281915 +938/20000 train_loss: 2.5071 train_time: 1.5m tok/s: 8281768 +939/20000 train_loss: 2.5098 train_time: 1.5m tok/s: 8281393 +940/20000 train_loss: 2.7674 train_time: 1.5m tok/s: 8281196 +941/20000 train_loss: 2.8648 train_time: 1.5m tok/s: 8281145 +942/20000 train_loss: 2.6968 train_time: 1.5m tok/s: 8281238 +943/20000 train_loss: 2.7095 train_time: 1.5m tok/s: 8281202 +944/20000 train_loss: 2.8043 train_time: 1.5m tok/s: 8281332 +945/20000 train_loss: 2.7210 train_time: 1.5m tok/s: 8281482 +946/20000 train_loss: 2.6215 train_time: 1.5m tok/s: 8281504 +947/20000 train_loss: 2.7908 train_time: 1.5m tok/s: 8281440 +948/20000 train_loss: 2.7184 train_time: 1.5m tok/s: 8281365 +949/20000 train_loss: 2.7113 train_time: 1.5m tok/s: 8281369 +950/20000 train_loss: 2.7234 train_time: 1.5m tok/s: 8281364 +951/20000 train_loss: 2.7861 train_time: 1.5m tok/s: 8281399 +952/20000 train_loss: 2.5454 train_time: 1.5m tok/s: 8281414 +953/20000 train_loss: 2.6837 train_time: 1.5m tok/s: 8281395 +954/20000 train_loss: 2.6914 train_time: 1.5m tok/s: 8281298 +955/20000 train_loss: 2.8038 train_time: 1.5m tok/s: 8281226 +956/20000 train_loss: 2.8076 train_time: 1.5m tok/s: 8281174 +957/20000 train_loss: 2.6751 train_time: 1.5m tok/s: 8281122 +958/20000 train_loss: 2.7700 train_time: 1.5m tok/s: 8281002 +959/20000 train_loss: 2.7776 train_time: 1.5m tok/s: 8280961 +960/20000 train_loss: 2.9861 train_time: 1.5m tok/s: 8280892 +961/20000 train_loss: 2.7868 train_time: 1.5m tok/s: 8280818 +962/20000 train_loss: 2.7003 train_time: 1.5m tok/s: 8280895 +963/20000 train_loss: 2.7510 train_time: 1.5m tok/s: 8280940 +964/20000 train_loss: 2.6577 train_time: 1.5m tok/s: 8280988 +965/20000 train_loss: 2.7806 train_time: 1.5m tok/s: 8280942 +966/20000 train_loss: 2.7705 train_time: 1.5m tok/s: 8280966 +967/20000 train_loss: 2.5687 train_time: 1.5m tok/s: 8280978 +968/20000 train_loss: 2.6721 train_time: 1.5m tok/s: 8280937 +969/20000 train_loss: 2.7002 train_time: 1.5m tok/s: 8280752 +970/20000 train_loss: 2.6946 train_time: 1.5m tok/s: 8280520 +971/20000 train_loss: 2.5434 train_time: 1.5m tok/s: 8280267 +972/20000 train_loss: 2.4965 train_time: 1.5m tok/s: 8280107 +973/20000 train_loss: 2.5650 train_time: 1.5m tok/s: 8280018 +974/20000 train_loss: 2.6905 train_time: 1.5m tok/s: 8280028 +975/20000 train_loss: 2.7445 train_time: 1.5m tok/s: 8280046 +976/20000 train_loss: 2.6783 train_time: 1.5m tok/s: 8280073 +977/20000 train_loss: 2.7329 train_time: 1.5m tok/s: 8280089 +978/20000 train_loss: 2.7636 train_time: 1.5m tok/s: 8280068 +979/20000 train_loss: 2.6065 train_time: 1.5m tok/s: 8279970 +980/20000 train_loss: 2.6460 train_time: 1.6m tok/s: 8279858 +981/20000 train_loss: 2.6371 train_time: 1.6m tok/s: 8279844 +982/20000 train_loss: 2.6228 train_time: 1.6m tok/s: 8279865 +983/20000 train_loss: 2.7465 train_time: 1.6m tok/s: 8279820 +984/20000 train_loss: 2.6417 train_time: 1.6m tok/s: 8279816 +985/20000 train_loss: 2.7577 train_time: 1.6m tok/s: 8279796 +986/20000 train_loss: 2.6871 train_time: 1.6m tok/s: 8279918 +987/20000 train_loss: 2.6790 train_time: 1.6m tok/s: 8280006 +988/20000 train_loss: 2.5425 train_time: 1.6m tok/s: 8280050 +989/20000 train_loss: 2.6675 train_time: 1.6m tok/s: 8280001 +990/20000 train_loss: 2.6587 train_time: 1.6m tok/s: 8279961 +991/20000 train_loss: 2.8125 train_time: 1.6m tok/s: 8279857 +992/20000 train_loss: 2.6258 train_time: 1.6m tok/s: 8279823 +993/20000 train_loss: 2.5734 train_time: 1.6m tok/s: 8279793 +994/20000 train_loss: 2.7181 train_time: 1.6m tok/s: 8279795 +995/20000 train_loss: 2.8781 train_time: 1.6m tok/s: 8279757 +996/20000 train_loss: 2.8183 train_time: 1.6m tok/s: 8279797 +997/20000 train_loss: 2.7770 train_time: 1.6m tok/s: 8279897 +998/20000 train_loss: 2.6601 train_time: 1.6m tok/s: 8279980 +999/20000 train_loss: 2.7453 train_time: 1.6m tok/s: 8279991 +1000/20000 train_loss: 2.7853 train_time: 1.6m tok/s: 8279967 +1001/20000 train_loss: 2.6722 train_time: 1.6m tok/s: 8279931 +1002/20000 train_loss: 2.7258 train_time: 1.6m tok/s: 8279836 +1003/20000 train_loss: 2.6500 train_time: 1.6m tok/s: 8279714 +1004/20000 train_loss: 2.6541 train_time: 1.6m tok/s: 8279748 +1005/20000 train_loss: 2.6383 train_time: 1.6m tok/s: 8279821 +1006/20000 train_loss: 2.7077 train_time: 1.6m tok/s: 8279810 +1007/20000 train_loss: 2.5859 train_time: 1.6m tok/s: 8279767 +1008/20000 train_loss: 2.5331 train_time: 1.6m tok/s: 8279748 +1009/20000 train_loss: 2.6822 train_time: 1.6m tok/s: 8279745 +1010/20000 train_loss: 2.8213 train_time: 1.6m tok/s: 8279838 +1011/20000 train_loss: 2.8061 train_time: 1.6m tok/s: 8279865 +1012/20000 train_loss: 2.4137 train_time: 1.6m tok/s: 8279668 +1013/20000 train_loss: 2.6116 train_time: 1.6m tok/s: 8279481 +1014/20000 train_loss: 2.7258 train_time: 1.6m tok/s: 8279534 +1015/20000 train_loss: 2.7578 train_time: 1.6m tok/s: 8279501 +1016/20000 train_loss: 2.5612 train_time: 1.6m tok/s: 8279408 +1017/20000 train_loss: 2.7126 train_time: 1.6m tok/s: 8279482 +1018/20000 train_loss: 2.7840 train_time: 1.6m tok/s: 8279400 +1019/20000 train_loss: 2.6648 train_time: 1.6m tok/s: 8279296 +1020/20000 train_loss: 2.6915 train_time: 1.6m tok/s: 8279358 +1021/20000 train_loss: 2.6648 train_time: 1.6m tok/s: 8279280 +1022/20000 train_loss: 2.7607 train_time: 1.6m tok/s: 8279234 +1023/20000 train_loss: 2.6658 train_time: 1.6m tok/s: 8279242 +1024/20000 train_loss: 2.6913 train_time: 1.6m tok/s: 8279251 +1025/20000 train_loss: 2.7551 train_time: 1.6m tok/s: 8279340 +1026/20000 train_loss: 3.3119 train_time: 1.6m tok/s: 8279226 +1027/20000 train_loss: 2.5488 train_time: 1.6m tok/s: 8279094 +1028/20000 train_loss: 2.6099 train_time: 1.6m tok/s: 8279099 +1029/20000 train_loss: 2.6992 train_time: 1.6m tok/s: 8279099 +1030/20000 train_loss: 2.5460 train_time: 1.6m tok/s: 8279086 +1031/20000 train_loss: 2.6129 train_time: 1.6m tok/s: 8278969 +1032/20000 train_loss: 2.7259 train_time: 1.6m tok/s: 8279021 +1033/20000 train_loss: 2.8420 train_time: 1.6m tok/s: 8279019 +1034/20000 train_loss: 2.6285 train_time: 1.6m tok/s: 8278967 +1035/20000 train_loss: 2.7273 train_time: 1.6m tok/s: 8279084 +1036/20000 train_loss: 2.6872 train_time: 1.6m tok/s: 8279121 +1037/20000 train_loss: 2.6931 train_time: 1.6m tok/s: 8279137 +1038/20000 train_loss: 2.4912 train_time: 1.6m tok/s: 8279123 +1039/20000 train_loss: 2.6978 train_time: 1.6m tok/s: 8279143 +1040/20000 train_loss: 2.6469 train_time: 1.6m tok/s: 8279184 +1041/20000 train_loss: 2.6604 train_time: 1.6m tok/s: 8279216 +1042/20000 train_loss: 2.6622 train_time: 1.6m tok/s: 8279190 +1043/20000 train_loss: 2.7179 train_time: 1.7m tok/s: 8279152 +1044/20000 train_loss: 2.6576 train_time: 1.7m tok/s: 8279057 +1045/20000 train_loss: 2.7913 train_time: 1.7m tok/s: 8278999 +1046/20000 train_loss: 2.7346 train_time: 1.7m tok/s: 8279042 +1047/20000 train_loss: 2.6988 train_time: 1.7m tok/s: 8279138 +1048/20000 train_loss: 2.5977 train_time: 1.7m tok/s: 8279160 +1049/20000 train_loss: 2.7256 train_time: 1.7m tok/s: 8279213 +1050/20000 train_loss: 2.8179 train_time: 1.7m tok/s: 8279276 +1051/20000 train_loss: 2.7274 train_time: 1.7m tok/s: 8279362 +1052/20000 train_loss: 2.6186 train_time: 1.7m tok/s: 8279217 +1053/20000 train_loss: 2.6456 train_time: 1.7m tok/s: 8279136 +1054/20000 train_loss: 2.6220 train_time: 1.7m tok/s: 8279033 +1055/20000 train_loss: 2.5833 train_time: 1.7m tok/s: 8278952 +1056/20000 train_loss: 2.6547 train_time: 1.7m tok/s: 8278942 +1057/20000 train_loss: 2.7190 train_time: 1.7m tok/s: 8278788 +1058/20000 train_loss: 2.6031 train_time: 1.7m tok/s: 8278809 +1059/20000 train_loss: 2.7477 train_time: 1.7m tok/s: 8278820 +1060/20000 train_loss: 2.6680 train_time: 1.7m tok/s: 8278773 +1061/20000 train_loss: 2.7482 train_time: 1.7m tok/s: 8278808 +1062/20000 train_loss: 2.7664 train_time: 1.7m tok/s: 8278837 +1063/20000 train_loss: 2.7709 train_time: 1.7m tok/s: 8278873 +1064/20000 train_loss: 2.6530 train_time: 1.7m tok/s: 8278873 +1065/20000 train_loss: 2.4633 train_time: 1.7m tok/s: 8278784 +1066/20000 train_loss: 2.7954 train_time: 1.7m tok/s: 8278688 +1067/20000 train_loss: 2.7983 train_time: 1.7m tok/s: 8278645 +1068/20000 train_loss: 2.6976 train_time: 1.7m tok/s: 8278743 +1069/20000 train_loss: 2.6422 train_time: 1.7m tok/s: 8278753 +1070/20000 train_loss: 2.5741 train_time: 1.7m tok/s: 8278784 +1071/20000 train_loss: 2.7349 train_time: 1.7m tok/s: 8278884 +1072/20000 train_loss: 2.6670 train_time: 1.7m tok/s: 8278986 +1073/20000 train_loss: 2.6701 train_time: 1.7m tok/s: 8278944 +1074/20000 train_loss: 2.6884 train_time: 1.7m tok/s: 8278893 +1075/20000 train_loss: 2.7167 train_time: 1.7m tok/s: 8278901 +1076/20000 train_loss: 2.7160 train_time: 1.7m tok/s: 8278903 +1077/20000 train_loss: 2.6624 train_time: 1.7m tok/s: 8278831 +1078/20000 train_loss: 2.8111 train_time: 1.7m tok/s: 8278782 +1079/20000 train_loss: 2.6699 train_time: 1.7m tok/s: 8278663 +1080/20000 train_loss: 2.6425 train_time: 1.7m tok/s: 8278846 +1081/20000 train_loss: 2.6694 train_time: 1.7m tok/s: 8278880 +1082/20000 train_loss: 2.6673 train_time: 1.7m tok/s: 8278919 +1083/20000 train_loss: 2.6015 train_time: 1.7m tok/s: 8278881 +1084/20000 train_loss: 2.6820 train_time: 1.7m tok/s: 8278855 +1085/20000 train_loss: 2.6500 train_time: 1.7m tok/s: 8278938 +1086/20000 train_loss: 2.6848 train_time: 1.7m tok/s: 8279039 +1087/20000 train_loss: 2.6638 train_time: 1.7m tok/s: 8279087 +1088/20000 train_loss: 2.8426 train_time: 1.7m tok/s: 8279003 +1089/20000 train_loss: 2.8089 train_time: 1.7m tok/s: 8279015 +1090/20000 train_loss: 2.6286 train_time: 1.7m tok/s: 8279013 +1091/20000 train_loss: 2.6719 train_time: 1.7m tok/s: 8278943 +1092/20000 train_loss: 2.7021 train_time: 1.7m tok/s: 8279042 +1093/20000 train_loss: 2.7481 train_time: 1.7m tok/s: 8279041 +1094/20000 train_loss: 2.8054 train_time: 1.7m tok/s: 8279027 +1095/20000 train_loss: 2.6208 train_time: 1.7m tok/s: 8278902 +1096/20000 train_loss: 2.5262 train_time: 1.7m tok/s: 8278790 +1097/20000 train_loss: 2.6565 train_time: 1.7m tok/s: 8278718 +1098/20000 train_loss: 2.6694 train_time: 1.7m tok/s: 8278637 +1099/20000 train_loss: 2.5129 train_time: 1.7m tok/s: 8278663 +1100/20000 train_loss: 2.5793 train_time: 1.7m tok/s: 8278714 +1101/20000 train_loss: 2.6518 train_time: 1.7m tok/s: 8278723 +1102/20000 train_loss: 2.6753 train_time: 1.7m tok/s: 8278846 +1103/20000 train_loss: 2.7381 train_time: 1.7m tok/s: 8278806 +1104/20000 train_loss: 2.7008 train_time: 1.7m tok/s: 8278928 +1105/20000 train_loss: 2.7198 train_time: 1.7m tok/s: 8278854 +1106/20000 train_loss: 2.7280 train_time: 1.8m tok/s: 8278922 +1107/20000 train_loss: 2.7368 train_time: 1.8m tok/s: 8278853 +1108/20000 train_loss: 2.6777 train_time: 1.8m tok/s: 8278881 +1109/20000 train_loss: 2.6686 train_time: 1.8m tok/s: 8278903 +1110/20000 train_loss: 2.6621 train_time: 1.8m tok/s: 8279028 +1111/20000 train_loss: 2.6370 train_time: 1.8m tok/s: 8279044 +1112/20000 train_loss: 2.6289 train_time: 1.8m tok/s: 8279018 +1113/20000 train_loss: 2.6479 train_time: 1.8m tok/s: 8278962 +1114/20000 train_loss: 2.8099 train_time: 1.8m tok/s: 8278953 +1115/20000 train_loss: 2.6632 train_time: 1.8m tok/s: 8278894 +1116/20000 train_loss: 2.8631 train_time: 1.8m tok/s: 8278956 +1117/20000 train_loss: 2.6880 train_time: 1.8m tok/s: 8278886 +1118/20000 train_loss: 2.7201 train_time: 1.8m tok/s: 8278864 +1119/20000 train_loss: 2.7437 train_time: 1.8m tok/s: 8278765 +1120/20000 train_loss: 2.6335 train_time: 1.8m tok/s: 8278817 +1121/20000 train_loss: 2.6396 train_time: 1.8m tok/s: 8278882 +1122/20000 train_loss: 2.7385 train_time: 1.8m tok/s: 8278955 +1123/20000 train_loss: 2.5358 train_time: 1.8m tok/s: 8279055 +1124/20000 train_loss: 2.6843 train_time: 1.8m tok/s: 8278988 +1125/20000 train_loss: 2.5653 train_time: 1.8m tok/s: 8278998 +1126/20000 train_loss: 2.6388 train_time: 1.8m tok/s: 8279067 +1127/20000 train_loss: 2.8765 train_time: 1.8m tok/s: 8278970 +1128/20000 train_loss: 2.8469 train_time: 1.8m tok/s: 8279053 +1129/20000 train_loss: 2.5908 train_time: 1.8m tok/s: 8279006 +1130/20000 train_loss: 2.7622 train_time: 1.8m tok/s: 8279032 +1131/20000 train_loss: 2.7473 train_time: 1.8m tok/s: 8279124 +1132/20000 train_loss: 2.6178 train_time: 1.8m tok/s: 8279122 +1133/20000 train_loss: 2.5774 train_time: 1.8m tok/s: 8279166 +1134/20000 train_loss: 2.7306 train_time: 1.8m tok/s: 8279159 +1135/20000 train_loss: 2.7251 train_time: 1.8m tok/s: 8279181 +1136/20000 train_loss: 2.5640 train_time: 1.8m tok/s: 8279144 +1137/20000 train_loss: 2.6110 train_time: 1.8m tok/s: 8279141 +1138/20000 train_loss: 2.5617 train_time: 1.8m tok/s: 8279177 +1139/20000 train_loss: 2.5523 train_time: 1.8m tok/s: 8279171 +1140/20000 train_loss: 2.6631 train_time: 1.8m tok/s: 8279150 +1141/20000 train_loss: 2.6926 train_time: 1.8m tok/s: 8279154 +1142/20000 train_loss: 2.6953 train_time: 1.8m tok/s: 8279150 +1143/20000 train_loss: 2.7277 train_time: 1.8m tok/s: 8279221 +1144/20000 train_loss: 2.7776 train_time: 1.8m tok/s: 8279180 +1145/20000 train_loss: 2.7321 train_time: 1.8m tok/s: 8279212 +1146/20000 train_loss: 2.5946 train_time: 1.8m tok/s: 8279280 +1147/20000 train_loss: 2.7485 train_time: 1.8m tok/s: 8279379 +1148/20000 train_loss: 2.5579 train_time: 1.8m tok/s: 8279281 +1149/20000 train_loss: 2.7193 train_time: 1.8m tok/s: 8279316 +1150/20000 train_loss: 2.5909 train_time: 1.8m tok/s: 8279365 +1151/20000 train_loss: 2.5916 train_time: 1.8m tok/s: 8279415 +1152/20000 train_loss: 2.4652 train_time: 1.8m tok/s: 8279381 +1153/20000 train_loss: 2.6017 train_time: 1.8m tok/s: 8279368 +1154/20000 train_loss: 2.7352 train_time: 1.8m tok/s: 8279460 +1155/20000 train_loss: 2.5845 train_time: 1.8m tok/s: 8279490 +1156/20000 train_loss: 2.6942 train_time: 1.8m tok/s: 8279534 +1157/20000 train_loss: 2.6830 train_time: 1.8m tok/s: 8279528 +1158/20000 train_loss: 2.7774 train_time: 1.8m tok/s: 8279554 +1159/20000 train_loss: 2.7169 train_time: 1.8m tok/s: 8279526 +1160/20000 train_loss: 2.6824 train_time: 1.8m tok/s: 8279564 +1161/20000 train_loss: 2.6546 train_time: 1.8m tok/s: 8279616 +1162/20000 train_loss: 2.7074 train_time: 1.8m tok/s: 8279652 +1163/20000 train_loss: 2.6915 train_time: 1.8m tok/s: 8279577 +1164/20000 train_loss: 2.6593 train_time: 1.8m tok/s: 8279534 +1165/20000 train_loss: 2.5271 train_time: 1.8m tok/s: 8279478 +1166/20000 train_loss: 2.7283 train_time: 1.8m tok/s: 8279497 +1167/20000 train_loss: 2.7679 train_time: 1.8m tok/s: 8279414 +1168/20000 train_loss: 2.5630 train_time: 1.8m tok/s: 8279311 +1169/20000 train_loss: 2.6995 train_time: 1.9m tok/s: 8279275 +1170/20000 train_loss: 2.9058 train_time: 1.9m tok/s: 8279330 +1171/20000 train_loss: 2.6551 train_time: 1.9m tok/s: 8279346 +1172/20000 train_loss: 2.7053 train_time: 1.9m tok/s: 8279392 +1173/20000 train_loss: 2.6402 train_time: 1.9m tok/s: 8279426 +1174/20000 train_loss: 2.7057 train_time: 1.9m tok/s: 8279465 +1175/20000 train_loss: 2.6507 train_time: 1.9m tok/s: 8279352 +1176/20000 train_loss: 2.7736 train_time: 1.9m tok/s: 8279370 +1177/20000 train_loss: 2.7741 train_time: 1.9m tok/s: 8279304 +1178/20000 train_loss: 2.6621 train_time: 1.9m tok/s: 8279305 +1179/20000 train_loss: 2.5420 train_time: 1.9m tok/s: 8279183 +1180/20000 train_loss: 2.6020 train_time: 1.9m tok/s: 8279140 +1181/20000 train_loss: 2.6559 train_time: 1.9m tok/s: 8279227 +1182/20000 train_loss: 2.5774 train_time: 1.9m tok/s: 8279306 +1183/20000 train_loss: 2.7935 train_time: 1.9m tok/s: 8279359 +1184/20000 train_loss: 2.4907 train_time: 1.9m tok/s: 8279244 +1185/20000 train_loss: 2.6685 train_time: 1.9m tok/s: 8279222 +1186/20000 train_loss: 2.6643 train_time: 1.9m tok/s: 8279227 +1187/20000 train_loss: 2.7631 train_time: 1.9m tok/s: 8279116 +1188/20000 train_loss: 2.8780 train_time: 1.9m tok/s: 8279227 +1189/20000 train_loss: 2.6376 train_time: 1.9m tok/s: 8279254 +1190/20000 train_loss: 2.6996 train_time: 1.9m tok/s: 8279295 +1191/20000 train_loss: 2.6398 train_time: 1.9m tok/s: 8279256 +1192/20000 train_loss: 2.6729 train_time: 1.9m tok/s: 8279168 +1193/20000 train_loss: 2.6932 train_time: 1.9m tok/s: 8279108 +1194/20000 train_loss: 2.6992 train_time: 1.9m tok/s: 8279100 +1195/20000 train_loss: 2.5951 train_time: 1.9m tok/s: 8279129 +1196/20000 train_loss: 2.8441 train_time: 1.9m tok/s: 8279071 +1197/20000 train_loss: 2.5704 train_time: 1.9m tok/s: 8279040 +1198/20000 train_loss: 2.6965 train_time: 1.9m tok/s: 8279063 +1199/20000 train_loss: 2.7390 train_time: 1.9m tok/s: 8279031 +1200/20000 train_loss: 2.7240 train_time: 1.9m tok/s: 8279096 +1201/20000 train_loss: 2.7277 train_time: 1.9m tok/s: 8279091 +1202/20000 train_loss: 2.8253 train_time: 1.9m tok/s: 8279144 +1203/20000 train_loss: 2.6706 train_time: 1.9m tok/s: 8279210 +1204/20000 train_loss: 2.7080 train_time: 1.9m tok/s: 8279155 +1205/20000 train_loss: 2.7416 train_time: 1.9m tok/s: 8279208 +1206/20000 train_loss: 2.7611 train_time: 1.9m tok/s: 8279222 +1207/20000 train_loss: 2.5705 train_time: 1.9m tok/s: 8279227 +1208/20000 train_loss: 2.5663 train_time: 1.9m tok/s: 8279240 +1209/20000 train_loss: 2.6914 train_time: 1.9m tok/s: 8279194 +1210/20000 train_loss: 2.6233 train_time: 1.9m tok/s: 8279253 +1211/20000 train_loss: 2.5586 train_time: 1.9m tok/s: 8279236 +1212/20000 train_loss: 2.5428 train_time: 1.9m tok/s: 8279100 +1213/20000 train_loss: 2.8078 train_time: 1.9m tok/s: 8278980 +1214/20000 train_loss: 2.6268 train_time: 1.9m tok/s: 8279061 +1215/20000 train_loss: 2.7162 train_time: 1.9m tok/s: 8279061 +1216/20000 train_loss: 2.6797 train_time: 1.9m tok/s: 8279094 +1217/20000 train_loss: 2.7595 train_time: 1.9m tok/s: 8279053 +1218/20000 train_loss: 2.7006 train_time: 1.9m tok/s: 8279059 +1219/20000 train_loss: 3.3076 train_time: 1.9m tok/s: 8279056 +1220/20000 train_loss: 2.6081 train_time: 1.9m tok/s: 8278988 +1221/20000 train_loss: 2.7635 train_time: 1.9m tok/s: 8278964 +1222/20000 train_loss: 2.5949 train_time: 1.9m tok/s: 8278965 +1223/20000 train_loss: 2.6995 train_time: 1.9m tok/s: 8278905 +1224/20000 train_loss: 2.7146 train_time: 1.9m tok/s: 8278863 +1225/20000 train_loss: 2.5356 train_time: 1.9m tok/s: 8278865 +1226/20000 train_loss: 2.6509 train_time: 1.9m tok/s: 8278830 +1227/20000 train_loss: 2.8649 train_time: 1.9m tok/s: 8278810 +1228/20000 train_loss: 2.6554 train_time: 1.9m tok/s: 8278845 +1229/20000 train_loss: 2.6598 train_time: 1.9m tok/s: 8278770 +1230/20000 train_loss: 2.7568 train_time: 1.9m tok/s: 8278778 +1231/20000 train_loss: 2.7038 train_time: 1.9m tok/s: 8278773 +1232/20000 train_loss: 2.6825 train_time: 2.0m tok/s: 8278791 +1233/20000 train_loss: 2.6587 train_time: 2.0m tok/s: 8278763 +1234/20000 train_loss: 2.6576 train_time: 2.0m tok/s: 8278770 +1235/20000 train_loss: 2.5983 train_time: 2.0m tok/s: 8278731 +1236/20000 train_loss: 2.6429 train_time: 2.0m tok/s: 8278680 +1237/20000 train_loss: 2.6194 train_time: 2.0m tok/s: 8278653 +1238/20000 train_loss: 2.5850 train_time: 2.0m tok/s: 8278640 +1239/20000 train_loss: 2.5846 train_time: 2.0m tok/s: 8278621 +1240/20000 train_loss: 2.5418 train_time: 2.0m tok/s: 8278676 +1241/20000 train_loss: 2.5907 train_time: 2.0m tok/s: 8278693 +1242/20000 train_loss: 2.5835 train_time: 2.0m tok/s: 8278663 +1243/20000 train_loss: 2.6776 train_time: 2.0m tok/s: 8278439 +1244/20000 train_loss: 2.7772 train_time: 2.0m tok/s: 8278411 +1245/20000 train_loss: 2.6744 train_time: 2.0m tok/s: 8278330 +1246/20000 train_loss: 2.7854 train_time: 2.0m tok/s: 8278160 +1247/20000 train_loss: 2.7849 train_time: 2.0m tok/s: 8278131 +1248/20000 train_loss: 2.6548 train_time: 2.0m tok/s: 8278168 +1249/20000 train_loss: 2.6416 train_time: 2.0m tok/s: 8278189 +1250/20000 train_loss: 2.6446 train_time: 2.0m tok/s: 8278225 +1251/20000 train_loss: 2.5979 train_time: 2.0m tok/s: 8278286 +1252/20000 train_loss: 2.6745 train_time: 2.0m tok/s: 8278331 +1253/20000 train_loss: 2.6213 train_time: 2.0m tok/s: 8278340 +1254/20000 train_loss: 2.6855 train_time: 2.0m tok/s: 8278334 +1255/20000 train_loss: 2.4491 train_time: 2.0m tok/s: 8278359 +1256/20000 train_loss: 2.6764 train_time: 2.0m tok/s: 8278329 +1257/20000 train_loss: 2.6007 train_time: 2.0m tok/s: 8278317 +1258/20000 train_loss: 2.6350 train_time: 2.0m tok/s: 8278219 +1259/20000 train_loss: 2.7631 train_time: 2.0m tok/s: 8278160 +1260/20000 train_loss: 2.7061 train_time: 2.0m tok/s: 8278198 +1261/20000 train_loss: 2.7850 train_time: 2.0m tok/s: 8278247 +1262/20000 train_loss: 2.6917 train_time: 2.0m tok/s: 8278361 +1263/20000 train_loss: 2.7057 train_time: 2.0m tok/s: 8278388 +1264/20000 train_loss: 2.6355 train_time: 2.0m tok/s: 8278325 +1265/20000 train_loss: 2.6337 train_time: 2.0m tok/s: 8278356 +1266/20000 train_loss: 2.6473 train_time: 2.0m tok/s: 8278250 +1267/20000 train_loss: 2.6858 train_time: 2.0m tok/s: 8278175 +1268/20000 train_loss: 2.4835 train_time: 2.0m tok/s: 8278184 +1269/20000 train_loss: 2.6800 train_time: 2.0m tok/s: 8278192 +1270/20000 train_loss: 2.6484 train_time: 2.0m tok/s: 8278298 +1271/20000 train_loss: 2.5622 train_time: 2.0m tok/s: 8278305 +1272/20000 train_loss: 2.8069 train_time: 2.0m tok/s: 8278407 +1273/20000 train_loss: 2.7524 train_time: 2.0m tok/s: 8278456 +1274/20000 train_loss: 2.6746 train_time: 2.0m tok/s: 8278487 +1275/20000 train_loss: 2.7786 train_time: 2.0m tok/s: 8278549 +1276/20000 train_loss: 2.7144 train_time: 2.0m tok/s: 8278603 +1277/20000 train_loss: 2.7025 train_time: 2.0m tok/s: 8278639 +1278/20000 train_loss: 2.6150 train_time: 2.0m tok/s: 8278573 +1279/20000 train_loss: 2.7210 train_time: 2.0m tok/s: 8278567 +1280/20000 train_loss: 2.6461 train_time: 2.0m tok/s: 8278545 +1281/20000 train_loss: 2.8164 train_time: 2.0m tok/s: 8278549 +1282/20000 train_loss: 2.5345 train_time: 2.0m tok/s: 8278507 +1283/20000 train_loss: 2.6347 train_time: 2.0m tok/s: 8278440 +1284/20000 train_loss: 2.6158 train_time: 2.0m tok/s: 8278569 +1285/20000 train_loss: 2.7772 train_time: 2.0m tok/s: 8278567 +1286/20000 train_loss: 2.6680 train_time: 2.0m tok/s: 8278576 +1287/20000 train_loss: 2.7082 train_time: 2.0m tok/s: 8278522 +1288/20000 train_loss: 2.7228 train_time: 2.0m tok/s: 8278536 +1289/20000 train_loss: 2.7512 train_time: 2.0m tok/s: 8278467 +1290/20000 train_loss: 2.6335 train_time: 2.0m tok/s: 8278463 +1291/20000 train_loss: 2.7477 train_time: 2.0m tok/s: 8278526 +1292/20000 train_loss: 2.7289 train_time: 2.0m tok/s: 8278549 +1293/20000 train_loss: 2.7161 train_time: 2.0m tok/s: 8278524 +1294/20000 train_loss: 2.7242 train_time: 2.0m tok/s: 8278475 +1295/20000 train_loss: 2.7484 train_time: 2.1m tok/s: 8278428 +1296/20000 train_loss: 2.6841 train_time: 2.1m tok/s: 8278452 +1297/20000 train_loss: 2.5892 train_time: 2.1m tok/s: 8278490 +1298/20000 train_loss: 2.6611 train_time: 2.1m tok/s: 8278546 +1299/20000 train_loss: 2.5160 train_time: 2.1m tok/s: 8278550 +1300/20000 train_loss: 2.6376 train_time: 2.1m tok/s: 8278578 +1301/20000 train_loss: 2.6901 train_time: 2.1m tok/s: 8278674 +1302/20000 train_loss: 2.6736 train_time: 2.1m tok/s: 8278715 +1303/20000 train_loss: 2.8890 train_time: 2.1m tok/s: 8278682 +1304/20000 train_loss: 2.7399 train_time: 2.1m tok/s: 8278645 +1305/20000 train_loss: 2.7723 train_time: 2.1m tok/s: 8278643 +1306/20000 train_loss: 2.8625 train_time: 2.1m tok/s: 8278602 +1307/20000 train_loss: 2.6282 train_time: 2.1m tok/s: 8278640 +1308/20000 train_loss: 2.6326 train_time: 2.1m tok/s: 8278685 +1309/20000 train_loss: 2.6663 train_time: 2.1m tok/s: 8278648 +1310/20000 train_loss: 2.5643 train_time: 2.1m tok/s: 8278569 +1311/20000 train_loss: 2.6070 train_time: 2.1m tok/s: 8278502 +1312/20000 train_loss: 2.5468 train_time: 2.1m tok/s: 8278508 +1313/20000 train_loss: 2.5769 train_time: 2.1m tok/s: 8278479 +1314/20000 train_loss: 2.5587 train_time: 2.1m tok/s: 8278455 +1315/20000 train_loss: 2.4184 train_time: 2.1m tok/s: 8278394 +1316/20000 train_loss: 2.6801 train_time: 2.1m tok/s: 8278302 +1317/20000 train_loss: 2.6906 train_time: 2.1m tok/s: 8278383 +1318/20000 train_loss: 2.7124 train_time: 2.1m tok/s: 8278407 +1319/20000 train_loss: 2.8092 train_time: 2.1m tok/s: 8278369 +1320/20000 train_loss: 2.7263 train_time: 2.1m tok/s: 8278493 +1321/20000 train_loss: 2.7179 train_time: 2.1m tok/s: 8278522 +1322/20000 train_loss: 2.7385 train_time: 2.1m tok/s: 8278513 +1323/20000 train_loss: 2.5835 train_time: 2.1m tok/s: 8278520 +1324/20000 train_loss: 2.6464 train_time: 2.1m tok/s: 8278518 +1325/20000 train_loss: 2.8280 train_time: 2.1m tok/s: 8278481 +1326/20000 train_loss: 2.8361 train_time: 2.1m tok/s: 8278393 +1327/20000 train_loss: 2.6633 train_time: 2.1m tok/s: 8278363 +1328/20000 train_loss: 2.6558 train_time: 2.1m tok/s: 8278375 +1329/20000 train_loss: 2.7462 train_time: 2.1m tok/s: 8278391 +1330/20000 train_loss: 2.6113 train_time: 2.1m tok/s: 8278371 +1331/20000 train_loss: 2.6769 train_time: 2.1m tok/s: 8278335 +1332/20000 train_loss: 2.9155 train_time: 2.1m tok/s: 8278381 +1333/20000 train_loss: 2.8375 train_time: 2.1m tok/s: 8278418 +1334/20000 train_loss: 2.6681 train_time: 2.1m tok/s: 8278364 +1335/20000 train_loss: 2.6269 train_time: 2.1m tok/s: 8278396 +1336/20000 train_loss: 2.6528 train_time: 2.1m tok/s: 8278396 +1337/20000 train_loss: 2.8073 train_time: 2.1m tok/s: 8278397 +1338/20000 train_loss: 2.8969 train_time: 2.1m tok/s: 8278311 +1339/20000 train_loss: 2.6929 train_time: 2.1m tok/s: 8278284 +1340/20000 train_loss: 2.5343 train_time: 2.1m tok/s: 8278327 +1341/20000 train_loss: 2.5572 train_time: 2.1m tok/s: 8278321 +1342/20000 train_loss: 2.6439 train_time: 2.1m tok/s: 8278322 +1343/20000 train_loss: 2.6656 train_time: 2.1m tok/s: 8278279 +1344/20000 train_loss: 2.6880 train_time: 2.1m tok/s: 8278369 +1345/20000 train_loss: 2.6417 train_time: 2.1m tok/s: 8278399 +1346/20000 train_loss: 2.7677 train_time: 2.1m tok/s: 8278371 +1347/20000 train_loss: 2.7930 train_time: 2.1m tok/s: 8278361 +1348/20000 train_loss: 2.6952 train_time: 2.1m tok/s: 8278353 +1349/20000 train_loss: 2.7283 train_time: 2.1m tok/s: 8278355 +1350/20000 train_loss: 2.6817 train_time: 2.1m tok/s: 8278376 +1351/20000 train_loss: 2.7774 train_time: 2.1m tok/s: 8278383 +1352/20000 train_loss: 2.6715 train_time: 2.1m tok/s: 8278408 +1353/20000 train_loss: 2.7049 train_time: 2.1m tok/s: 8278427 +1354/20000 train_loss: 2.3890 train_time: 2.1m tok/s: 8278344 +1355/20000 train_loss: 2.5815 train_time: 2.1m tok/s: 8278225 +1356/20000 train_loss: 2.6813 train_time: 2.1m tok/s: 8278295 +1357/20000 train_loss: 2.6773 train_time: 2.1m tok/s: 8278284 +1358/20000 train_loss: 2.7436 train_time: 2.2m tok/s: 8278298 +1359/20000 train_loss: 2.5098 train_time: 2.2m tok/s: 8278269 +1360/20000 train_loss: 2.7391 train_time: 2.2m tok/s: 8278197 +1361/20000 train_loss: 2.6340 train_time: 2.2m tok/s: 8278189 +1362/20000 train_loss: 2.6207 train_time: 2.2m tok/s: 8278225 +1363/20000 train_loss: 2.7032 train_time: 2.2m tok/s: 8278299 +1364/20000 train_loss: 2.5360 train_time: 2.2m tok/s: 8278279 +1365/20000 train_loss: 2.5357 train_time: 2.2m tok/s: 8278231 +1366/20000 train_loss: 2.6214 train_time: 2.2m tok/s: 8278224 +1367/20000 train_loss: 2.7082 train_time: 2.2m tok/s: 8278152 +1368/20000 train_loss: 2.5695 train_time: 2.2m tok/s: 8278301 +1369/20000 train_loss: 2.6791 train_time: 2.2m tok/s: 8278303 +1370/20000 train_loss: 2.7234 train_time: 2.2m tok/s: 8278297 +1371/20000 train_loss: 2.6972 train_time: 2.2m tok/s: 8278235 +1372/20000 train_loss: 2.7330 train_time: 2.2m tok/s: 8278211 +1373/20000 train_loss: 2.6682 train_time: 2.2m tok/s: 8278288 +1374/20000 train_loss: 2.7686 train_time: 2.2m tok/s: 8278326 +1375/20000 train_loss: 2.7268 train_time: 2.2m tok/s: 8278392 +1376/20000 train_loss: 2.5859 train_time: 2.2m tok/s: 8278380 +1377/20000 train_loss: 2.6411 train_time: 2.2m tok/s: 8278386 +1378/20000 train_loss: 2.5868 train_time: 2.2m tok/s: 8278410 +1379/20000 train_loss: 2.6333 train_time: 2.2m tok/s: 8278391 +1380/20000 train_loss: 2.5966 train_time: 2.2m tok/s: 8278451 +1381/20000 train_loss: 2.6421 train_time: 2.2m tok/s: 8278476 +1382/20000 train_loss: 2.6561 train_time: 2.2m tok/s: 8278416 +1383/20000 train_loss: 2.6827 train_time: 2.2m tok/s: 8278476 +1384/20000 train_loss: 2.5797 train_time: 2.2m tok/s: 8278448 +1385/20000 train_loss: 2.6037 train_time: 2.2m tok/s: 8278461 +1386/20000 train_loss: 2.7617 train_time: 2.2m tok/s: 8278474 +1387/20000 train_loss: 2.6786 train_time: 2.2m tok/s: 8278523 +1388/20000 train_loss: 2.7556 train_time: 2.2m tok/s: 8278619 +1389/20000 train_loss: 2.6107 train_time: 2.2m tok/s: 8278581 +1390/20000 train_loss: 2.7349 train_time: 2.2m tok/s: 8278589 +1391/20000 train_loss: 2.5476 train_time: 2.2m tok/s: 8278556 +1392/20000 train_loss: 2.7380 train_time: 2.2m tok/s: 8278643 +1393/20000 train_loss: 2.6354 train_time: 2.2m tok/s: 8278579 +1394/20000 train_loss: 2.9004 train_time: 2.2m tok/s: 8278553 +1395/20000 train_loss: 2.4996 train_time: 2.2m tok/s: 8278531 +1396/20000 train_loss: 2.8234 train_time: 2.2m tok/s: 8278537 +1397/20000 train_loss: 2.7118 train_time: 2.2m tok/s: 8278536 +1398/20000 train_loss: 2.8013 train_time: 2.2m tok/s: 8278561 +1399/20000 train_loss: 2.6430 train_time: 2.2m tok/s: 8278610 +1400/20000 train_loss: 2.7409 train_time: 2.2m tok/s: 8278626 +1401/20000 train_loss: 2.7324 train_time: 2.2m tok/s: 8278621 +1402/20000 train_loss: 2.5738 train_time: 2.2m tok/s: 8278716 +1403/20000 train_loss: 2.6017 train_time: 2.2m tok/s: 8278655 +1404/20000 train_loss: 2.6828 train_time: 2.2m tok/s: 8278724 +1405/20000 train_loss: 2.7219 train_time: 2.2m tok/s: 8278759 +1406/20000 train_loss: 2.8393 train_time: 2.2m tok/s: 8278625 +1407/20000 train_loss: 2.5728 train_time: 2.2m tok/s: 8278490 +1408/20000 train_loss: 2.6943 train_time: 2.2m tok/s: 8278445 +1409/20000 train_loss: 2.7744 train_time: 2.2m tok/s: 8278483 +1410/20000 train_loss: 2.6550 train_time: 2.2m tok/s: 8278513 +1411/20000 train_loss: 2.6942 train_time: 2.2m tok/s: 8278528 +1412/20000 train_loss: 2.7341 train_time: 2.2m tok/s: 8278544 +1413/20000 train_loss: 2.6133 train_time: 2.2m tok/s: 8278530 +1414/20000 train_loss: 2.6061 train_time: 2.2m tok/s: 8278588 +1415/20000 train_loss: 2.6664 train_time: 2.2m tok/s: 8278550 +1416/20000 train_loss: 2.6226 train_time: 2.2m tok/s: 8278662 +1417/20000 train_loss: 2.6273 train_time: 2.2m tok/s: 8278621 +1418/20000 train_loss: 2.7559 train_time: 2.2m tok/s: 8278569 +1419/20000 train_loss: 2.6650 train_time: 2.2m tok/s: 8278543 +1420/20000 train_loss: 2.6296 train_time: 2.2m tok/s: 8278544 +1421/20000 train_loss: 2.7549 train_time: 2.2m tok/s: 8278562 +1422/20000 train_loss: 2.7400 train_time: 2.3m tok/s: 8278577 +1423/20000 train_loss: 2.6998 train_time: 2.3m tok/s: 8278550 +1424/20000 train_loss: 2.7076 train_time: 2.3m tok/s: 8278569 +1425/20000 train_loss: 2.6242 train_time: 2.3m tok/s: 8278575 +1426/20000 train_loss: 2.6929 train_time: 2.3m tok/s: 8278541 +1427/20000 train_loss: 2.6527 train_time: 2.3m tok/s: 8278477 +1428/20000 train_loss: 2.6474 train_time: 2.3m tok/s: 8278494 +1429/20000 train_loss: 2.5898 train_time: 2.3m tok/s: 8278465 +1430/20000 train_loss: 2.6282 train_time: 2.3m tok/s: 8278488 +1431/20000 train_loss: 2.6041 train_time: 2.3m tok/s: 8278530 +1432/20000 train_loss: 2.4342 train_time: 2.3m tok/s: 8278487 +1433/20000 train_loss: 2.6289 train_time: 2.3m tok/s: 8278445 +1434/20000 train_loss: 2.7444 train_time: 2.3m tok/s: 8278383 +1435/20000 train_loss: 2.7170 train_time: 2.3m tok/s: 8278432 +1436/20000 train_loss: 2.6218 train_time: 2.3m tok/s: 8278524 +1437/20000 train_loss: 2.7029 train_time: 2.3m tok/s: 8278574 +1438/20000 train_loss: 2.7520 train_time: 2.3m tok/s: 8278570 +1439/20000 train_loss: 2.6684 train_time: 2.3m tok/s: 8278416 +1440/20000 train_loss: 2.7078 train_time: 2.3m tok/s: 8278385 +1441/20000 train_loss: 2.6673 train_time: 2.3m tok/s: 8278390 +1442/20000 train_loss: 2.5872 train_time: 2.3m tok/s: 8278325 +1443/20000 train_loss: 2.6007 train_time: 2.3m tok/s: 8278319 +1444/20000 train_loss: 2.5484 train_time: 2.3m tok/s: 8278340 +1445/20000 train_loss: 2.6639 train_time: 2.3m tok/s: 8278307 +1446/20000 train_loss: 2.7705 train_time: 2.3m tok/s: 8278271 +1447/20000 train_loss: 2.7193 train_time: 2.3m tok/s: 8278280 +1448/20000 train_loss: 2.7004 train_time: 2.3m tok/s: 8278304 +1449/20000 train_loss: 2.6473 train_time: 2.3m tok/s: 8278337 +1450/20000 train_loss: 2.7339 train_time: 2.3m tok/s: 8278356 +1451/20000 train_loss: 2.5955 train_time: 2.3m tok/s: 8278378 +1452/20000 train_loss: 2.6216 train_time: 2.3m tok/s: 8278396 +1453/20000 train_loss: 2.6432 train_time: 2.3m tok/s: 8278413 +1454/20000 train_loss: 2.7434 train_time: 2.3m tok/s: 8278318 +1455/20000 train_loss: 2.5698 train_time: 2.3m tok/s: 8278292 +1456/20000 train_loss: 2.4579 train_time: 2.3m tok/s: 8278222 +1457/20000 train_loss: 2.4823 train_time: 2.3m tok/s: 8278221 +1458/20000 train_loss: 2.5968 train_time: 2.3m tok/s: 8278142 +1459/20000 train_loss: 2.6721 train_time: 2.3m tok/s: 8278137 +1460/20000 train_loss: 2.6996 train_time: 2.3m tok/s: 8278192 +1461/20000 train_loss: 2.7707 train_time: 2.3m tok/s: 8278266 +1462/20000 train_loss: 2.6453 train_time: 2.3m tok/s: 8278275 +1463/20000 train_loss: 2.6618 train_time: 2.3m tok/s: 8278246 +1464/20000 train_loss: 2.6461 train_time: 2.3m tok/s: 8278273 +1465/20000 train_loss: 2.6775 train_time: 2.3m tok/s: 8278237 +1466/20000 train_loss: 2.6134 train_time: 2.3m tok/s: 8278260 +1467/20000 train_loss: 2.6185 train_time: 2.3m tok/s: 8278209 +1468/20000 train_loss: 2.5602 train_time: 2.3m tok/s: 8278160 +1469/20000 train_loss: 2.5829 train_time: 2.3m tok/s: 8278178 +1470/20000 train_loss: 2.5099 train_time: 2.3m tok/s: 8278147 +1471/20000 train_loss: 2.8053 train_time: 2.3m tok/s: 8278153 +1472/20000 train_loss: 2.8741 train_time: 2.3m tok/s: 8278095 +1473/20000 train_loss: 2.7489 train_time: 2.3m tok/s: 8278009 +1474/20000 train_loss: 2.7330 train_time: 2.3m tok/s: 8277951 +1475/20000 train_loss: 2.6929 train_time: 2.3m tok/s: 8277937 +1476/20000 train_loss: 2.7688 train_time: 2.3m tok/s: 8277945 +1477/20000 train_loss: 2.6019 train_time: 2.3m tok/s: 8277892 +1478/20000 train_loss: 2.5996 train_time: 2.3m tok/s: 8277930 +1479/20000 train_loss: 2.5868 train_time: 2.3m tok/s: 8277942 +1480/20000 train_loss: 2.6057 train_time: 2.3m tok/s: 8277941 +1481/20000 train_loss: 2.6358 train_time: 2.3m tok/s: 8277933 +1482/20000 train_loss: 3.0501 train_time: 2.3m tok/s: 8277881 +1483/20000 train_loss: 2.6959 train_time: 2.3m tok/s: 8277866 +1484/20000 train_loss: 2.6309 train_time: 2.3m tok/s: 8277874 +1485/20000 train_loss: 2.7847 train_time: 2.4m tok/s: 8277920 +1486/20000 train_loss: 2.5960 train_time: 2.4m tok/s: 8277898 +1487/20000 train_loss: 2.6967 train_time: 2.4m tok/s: 8277885 +1488/20000 train_loss: 2.6509 train_time: 2.4m tok/s: 8277963 +1489/20000 train_loss: 2.5789 train_time: 2.4m tok/s: 8277981 +1490/20000 train_loss: 2.6674 train_time: 2.4m tok/s: 8277969 +1491/20000 train_loss: 2.6673 train_time: 2.4m tok/s: 8277989 +1492/20000 train_loss: 2.5994 train_time: 2.4m tok/s: 8278008 +1493/20000 train_loss: 2.6688 train_time: 2.4m tok/s: 8278091 +1494/20000 train_loss: 2.6376 train_time: 2.4m tok/s: 8278068 +1495/20000 train_loss: 2.5668 train_time: 2.4m tok/s: 8278015 +1496/20000 train_loss: 2.6738 train_time: 2.4m tok/s: 8278008 +1497/20000 train_loss: 2.5999 train_time: 2.4m tok/s: 8277957 +1498/20000 train_loss: 2.9267 train_time: 2.4m tok/s: 8277992 +1499/20000 train_loss: 2.6908 train_time: 2.4m tok/s: 8277930 +1500/20000 train_loss: 2.7194 train_time: 2.4m tok/s: 8277979 +1501/20000 train_loss: 2.6925 train_time: 2.4m tok/s: 8278034 +1502/20000 train_loss: 2.8050 train_time: 2.4m tok/s: 8278054 +1503/20000 train_loss: 2.6731 train_time: 2.4m tok/s: 8278016 +1504/20000 train_loss: 2.7324 train_time: 2.4m tok/s: 8278095 +1505/20000 train_loss: 2.6833 train_time: 2.4m tok/s: 8278149 +1506/20000 train_loss: 2.7099 train_time: 2.4m tok/s: 8278168 +1507/20000 train_loss: 2.8273 train_time: 2.4m tok/s: 8278094 +1508/20000 train_loss: 2.5295 train_time: 2.4m tok/s: 8278004 +1509/20000 train_loss: 2.5792 train_time: 2.4m tok/s: 8277928 +1510/20000 train_loss: 2.5441 train_time: 2.4m tok/s: 8277962 +1511/20000 train_loss: 2.4980 train_time: 2.4m tok/s: 8277899 +1512/20000 train_loss: 2.5702 train_time: 2.4m tok/s: 8277860 +1513/20000 train_loss: 2.7198 train_time: 2.4m tok/s: 8277924 +1514/20000 train_loss: 2.7424 train_time: 2.4m tok/s: 8277962 +1515/20000 train_loss: 2.6927 train_time: 2.4m tok/s: 8277997 +1516/20000 train_loss: 2.5840 train_time: 2.4m tok/s: 8277971 +1517/20000 train_loss: 2.5954 train_time: 2.4m tok/s: 8277962 +1518/20000 train_loss: 2.7311 train_time: 2.4m tok/s: 8278005 +1519/20000 train_loss: 2.6596 train_time: 2.4m tok/s: 8277941 +1520/20000 train_loss: 2.6640 train_time: 2.4m tok/s: 8277945 +1521/20000 train_loss: 2.6457 train_time: 2.4m tok/s: 8277944 +1522/20000 train_loss: 2.6486 train_time: 2.4m tok/s: 8278012 +1523/20000 train_loss: 2.6782 train_time: 2.4m tok/s: 8277975 +1524/20000 train_loss: 2.6448 train_time: 2.4m tok/s: 8277965 +1525/20000 train_loss: 2.6277 train_time: 2.4m tok/s: 8277955 +1526/20000 train_loss: 2.7232 train_time: 2.4m tok/s: 8277903 +1527/20000 train_loss: 2.6731 train_time: 2.4m tok/s: 8277868 +1528/20000 train_loss: 2.4658 train_time: 2.4m tok/s: 8277828 +1529/20000 train_loss: 2.6291 train_time: 2.4m tok/s: 8277909 +1530/20000 train_loss: 2.6129 train_time: 2.4m tok/s: 8277923 +1531/20000 train_loss: 2.3602 train_time: 2.4m tok/s: 8277901 +1532/20000 train_loss: 2.6189 train_time: 2.4m tok/s: 8277889 +1533/20000 train_loss: 2.6754 train_time: 2.4m tok/s: 8277860 +1534/20000 train_loss: 2.6337 train_time: 2.4m tok/s: 8277863 +1535/20000 train_loss: 2.7690 train_time: 2.4m tok/s: 8277868 +1536/20000 train_loss: 2.6928 train_time: 2.4m tok/s: 8277793 +1537/20000 train_loss: 3.0721 train_time: 2.4m tok/s: 8277768 +1538/20000 train_loss: 2.7195 train_time: 2.4m tok/s: 8277710 +1539/20000 train_loss: 2.6282 train_time: 2.4m tok/s: 8277716 +1540/20000 train_loss: 2.6770 train_time: 2.4m tok/s: 8277668 +1541/20000 train_loss: 2.5854 train_time: 2.4m tok/s: 8277606 +1542/20000 train_loss: 2.6152 train_time: 2.4m tok/s: 8277652 +1543/20000 train_loss: 2.6335 train_time: 2.4m tok/s: 8277721 +1544/20000 train_loss: 2.5884 train_time: 2.4m tok/s: 8277693 +1545/20000 train_loss: 2.6175 train_time: 2.4m tok/s: 8277681 +1546/20000 train_loss: 2.5002 train_time: 2.4m tok/s: 8277677 +1547/20000 train_loss: 2.7405 train_time: 2.4m tok/s: 8277669 +1548/20000 train_loss: 2.6861 train_time: 2.5m tok/s: 8277668 +1549/20000 train_loss: 2.5794 train_time: 2.5m tok/s: 8277635 +1550/20000 train_loss: 2.7160 train_time: 2.5m tok/s: 8277598 +1551/20000 train_loss: 2.6741 train_time: 2.5m tok/s: 8277570 +1552/20000 train_loss: 2.5520 train_time: 2.5m tok/s: 8277589 +1553/20000 train_loss: 2.4852 train_time: 2.5m tok/s: 8277548 +1554/20000 train_loss: 2.5873 train_time: 2.5m tok/s: 8277594 +1555/20000 train_loss: 2.6295 train_time: 2.5m tok/s: 8277626 +1556/20000 train_loss: 2.5145 train_time: 2.5m tok/s: 8277638 +1557/20000 train_loss: 2.5486 train_time: 2.5m tok/s: 8277591 +1558/20000 train_loss: 2.5655 train_time: 2.5m tok/s: 8277519 +1559/20000 train_loss: 2.5465 train_time: 2.5m tok/s: 8277428 +1560/20000 train_loss: 2.6186 train_time: 2.5m tok/s: 8277480 +1561/20000 train_loss: 2.5394 train_time: 2.5m tok/s: 8277414 +1562/20000 train_loss: 2.5889 train_time: 2.5m tok/s: 8277424 +1563/20000 train_loss: 2.4923 train_time: 2.5m tok/s: 8277446 +1564/20000 train_loss: 2.5860 train_time: 2.5m tok/s: 8277483 +1565/20000 train_loss: 2.5700 train_time: 2.5m tok/s: 8277495 +1566/20000 train_loss: 2.7452 train_time: 2.5m tok/s: 8277482 +1567/20000 train_loss: 2.6809 train_time: 2.5m tok/s: 8277505 +1568/20000 train_loss: 2.5314 train_time: 2.5m tok/s: 8277554 +1569/20000 train_loss: 2.5961 train_time: 2.5m tok/s: 8277535 +1570/20000 train_loss: 2.5445 train_time: 2.5m tok/s: 8277560 +1571/20000 train_loss: 2.6129 train_time: 2.5m tok/s: 8277534 +1572/20000 train_loss: 3.1938 train_time: 2.5m tok/s: 8277519 +1573/20000 train_loss: 2.7475 train_time: 2.5m tok/s: 8277457 +1574/20000 train_loss: 2.5988 train_time: 2.5m tok/s: 8277438 +1575/20000 train_loss: 2.5509 train_time: 2.5m tok/s: 8277445 +1576/20000 train_loss: 2.5421 train_time: 2.5m tok/s: 8277423 +1577/20000 train_loss: 2.5720 train_time: 2.5m tok/s: 8277378 +1578/20000 train_loss: 2.5048 train_time: 2.5m tok/s: 8277337 +1579/20000 train_loss: 2.7593 train_time: 2.5m tok/s: 8277338 +1580/20000 train_loss: 2.6551 train_time: 2.5m tok/s: 8277308 +1581/20000 train_loss: 2.4919 train_time: 2.5m tok/s: 8277316 +1582/20000 train_loss: 2.5185 train_time: 2.5m tok/s: 8277272 +1583/20000 train_loss: 2.5811 train_time: 2.5m tok/s: 8277278 +1584/20000 train_loss: 2.5549 train_time: 2.5m tok/s: 8277391 +1585/20000 train_loss: 2.7027 train_time: 2.5m tok/s: 8277426 +1586/20000 train_loss: 2.5542 train_time: 2.5m tok/s: 8277402 +1587/20000 train_loss: 2.5948 train_time: 2.5m tok/s: 8277377 +1588/20000 train_loss: 2.6387 train_time: 2.5m tok/s: 8277381 +1589/20000 train_loss: 2.6956 train_time: 2.5m tok/s: 8277327 +1590/20000 train_loss: 2.6447 train_time: 2.5m tok/s: 8277267 +1591/20000 train_loss: 2.6377 train_time: 2.5m tok/s: 8277306 +1592/20000 train_loss: 2.5701 train_time: 2.5m tok/s: 8277305 +1593/20000 train_loss: 2.6339 train_time: 2.5m tok/s: 8277347 +1594/20000 train_loss: 2.7372 train_time: 2.5m tok/s: 8277384 +1595/20000 train_loss: 2.6757 train_time: 2.5m tok/s: 8277401 +1596/20000 train_loss: 2.4454 train_time: 2.5m tok/s: 8277441 +1597/20000 train_loss: 2.5636 train_time: 2.5m tok/s: 8277461 +1598/20000 train_loss: 2.6215 train_time: 2.5m tok/s: 8277435 +1599/20000 train_loss: 2.6138 train_time: 2.5m tok/s: 8277442 +1600/20000 train_loss: 2.8159 train_time: 2.5m tok/s: 8277406 +1601/20000 train_loss: 2.6498 train_time: 2.5m tok/s: 8277411 +1602/20000 train_loss: 2.7686 train_time: 2.5m tok/s: 8277313 +1603/20000 train_loss: 2.5833 train_time: 2.5m tok/s: 8277318 +1604/20000 train_loss: 2.5969 train_time: 2.5m tok/s: 8277383 +1605/20000 train_loss: 2.6192 train_time: 2.5m tok/s: 8277358 +1606/20000 train_loss: 2.6138 train_time: 2.5m tok/s: 8277324 +1607/20000 train_loss: 2.5303 train_time: 2.5m tok/s: 8277256 +1608/20000 train_loss: 2.5036 train_time: 2.5m tok/s: 8277358 +1609/20000 train_loss: 2.7032 train_time: 2.5m tok/s: 8277374 +1610/20000 train_loss: 2.6024 train_time: 2.5m tok/s: 8277325 +1611/20000 train_loss: 2.5936 train_time: 2.6m tok/s: 8277289 +1612/20000 train_loss: 2.6539 train_time: 2.6m tok/s: 8277275 +1613/20000 train_loss: 2.6505 train_time: 2.6m tok/s: 8277269 +1614/20000 train_loss: 2.7138 train_time: 2.6m tok/s: 8277272 +1615/20000 train_loss: 2.7261 train_time: 2.6m tok/s: 8277322 +1616/20000 train_loss: 2.6590 train_time: 2.6m tok/s: 8277375 +1617/20000 train_loss: 2.6056 train_time: 2.6m tok/s: 8277404 +1618/20000 train_loss: 3.0403 train_time: 2.6m tok/s: 8277390 +1619/20000 train_loss: 2.7392 train_time: 2.6m tok/s: 8277333 +1620/20000 train_loss: 2.5610 train_time: 2.6m tok/s: 8277352 +1621/20000 train_loss: 2.5796 train_time: 2.6m tok/s: 8277384 +1622/20000 train_loss: 2.7568 train_time: 2.6m tok/s: 8277350 +1623/20000 train_loss: 2.6726 train_time: 2.6m tok/s: 8277361 +1624/20000 train_loss: 2.6284 train_time: 2.6m tok/s: 8277342 +1625/20000 train_loss: 2.6326 train_time: 2.6m tok/s: 8277375 +1626/20000 train_loss: 2.7015 train_time: 2.6m tok/s: 8277357 +1627/20000 train_loss: 2.4348 train_time: 2.6m tok/s: 8277347 +1628/20000 train_loss: 2.5932 train_time: 2.6m tok/s: 8277381 +1629/20000 train_loss: 2.5713 train_time: 2.6m tok/s: 8277449 +1630/20000 train_loss: 2.5907 train_time: 2.6m tok/s: 8277411 +1631/20000 train_loss: 2.8008 train_time: 2.6m tok/s: 8277349 +1632/20000 train_loss: 2.7024 train_time: 2.6m tok/s: 8277505 +1633/20000 train_loss: 2.6669 train_time: 2.6m tok/s: 8277499 +1634/20000 train_loss: 2.6241 train_time: 2.6m tok/s: 8277470 +1635/20000 train_loss: 2.6817 train_time: 2.6m tok/s: 8277437 +1636/20000 train_loss: 2.4573 train_time: 2.6m tok/s: 8277451 +1637/20000 train_loss: 2.5601 train_time: 2.6m tok/s: 8277395 +1638/20000 train_loss: 2.4993 train_time: 2.6m tok/s: 8277332 +1639/20000 train_loss: 2.5206 train_time: 2.6m tok/s: 8277324 +1640/20000 train_loss: 2.3789 train_time: 2.6m tok/s: 8277349 +1641/20000 train_loss: 2.5435 train_time: 2.6m tok/s: 8277382 +1642/20000 train_loss: 2.7518 train_time: 2.6m tok/s: 8277375 +1643/20000 train_loss: 2.4323 train_time: 2.6m tok/s: 8277389 +1644/20000 train_loss: 2.4717 train_time: 2.6m tok/s: 8277499 +1645/20000 train_loss: 2.7671 train_time: 2.6m tok/s: 8277473 +1646/20000 train_loss: 2.5252 train_time: 2.6m tok/s: 8277478 +1647/20000 train_loss: 2.7408 train_time: 2.6m tok/s: 8277454 +1648/20000 train_loss: 2.6390 train_time: 2.6m tok/s: 8277472 +1649/20000 train_loss: 2.7471 train_time: 2.6m tok/s: 8277480 +1650/20000 train_loss: 2.5703 train_time: 2.6m tok/s: 8277427 +1651/20000 train_loss: 2.7328 train_time: 2.6m tok/s: 8277437 +1652/20000 train_loss: 2.6473 train_time: 2.6m tok/s: 8277525 +1653/20000 train_loss: 2.7458 train_time: 2.6m tok/s: 8277511 +1654/20000 train_loss: 2.6747 train_time: 2.6m tok/s: 8277576 +1655/20000 train_loss: 2.5566 train_time: 2.6m tok/s: 8277578 +1656/20000 train_loss: 2.6043 train_time: 2.6m tok/s: 8277710 +1657/20000 train_loss: 2.6452 train_time: 2.6m tok/s: 8277701 +1658/20000 train_loss: 2.6350 train_time: 2.6m tok/s: 8277691 +1659/20000 train_loss: 2.5861 train_time: 2.6m tok/s: 8277703 +1660/20000 train_loss: 2.5514 train_time: 2.6m tok/s: 8277657 +1661/20000 train_loss: 2.7454 train_time: 2.6m tok/s: 8277602 +1662/20000 train_loss: 2.7417 train_time: 2.6m tok/s: 8277513 +1663/20000 train_loss: 2.7980 train_time: 2.6m tok/s: 8277420 +1664/20000 train_loss: 2.8053 train_time: 2.6m tok/s: 8277441 +1665/20000 train_loss: 2.8027 train_time: 2.6m tok/s: 8277405 +1666/20000 train_loss: 2.7013 train_time: 2.6m tok/s: 8277369 +1667/20000 train_loss: 2.6045 train_time: 2.6m tok/s: 8277335 +1668/20000 train_loss: 2.6307 train_time: 2.6m tok/s: 8277458 +1669/20000 train_loss: 2.7532 train_time: 2.6m tok/s: 8277420 +1670/20000 train_loss: 2.5559 train_time: 2.6m tok/s: 8277422 +1671/20000 train_loss: 2.4862 train_time: 2.6m tok/s: 8277417 +1672/20000 train_loss: 2.6164 train_time: 2.6m tok/s: 8277476 +1673/20000 train_loss: 2.5774 train_time: 2.6m tok/s: 8277443 +1674/20000 train_loss: 2.6443 train_time: 2.7m tok/s: 8277484 +1675/20000 train_loss: 2.4337 train_time: 2.7m tok/s: 8277520 +1676/20000 train_loss: 2.6814 train_time: 2.7m tok/s: 8277543 +1677/20000 train_loss: 2.6033 train_time: 2.7m tok/s: 8277512 +1678/20000 train_loss: 2.6824 train_time: 2.7m tok/s: 8277397 +1679/20000 train_loss: 2.6110 train_time: 2.7m tok/s: 8277292 +1680/20000 train_loss: 2.5319 train_time: 2.7m tok/s: 8277288 +1681/20000 train_loss: 2.5197 train_time: 2.7m tok/s: 8277258 +1682/20000 train_loss: 2.6255 train_time: 2.7m tok/s: 8277255 +1683/20000 train_loss: 2.6286 train_time: 2.7m tok/s: 8277216 +1684/20000 train_loss: 2.5786 train_time: 2.7m tok/s: 8277228 +1685/20000 train_loss: 2.6761 train_time: 2.7m tok/s: 8277252 +1686/20000 train_loss: 2.5819 train_time: 2.7m tok/s: 8277295 +1687/20000 train_loss: 2.5508 train_time: 2.7m tok/s: 8277315 +1688/20000 train_loss: 2.5684 train_time: 2.7m tok/s: 8277319 +1689/20000 train_loss: 2.5205 train_time: 2.7m tok/s: 8277322 +1690/20000 train_loss: 2.8038 train_time: 2.7m tok/s: 8277247 +1691/20000 train_loss: 2.5976 train_time: 2.7m tok/s: 8277222 +1692/20000 train_loss: 2.5834 train_time: 2.7m tok/s: 8277283 +1693/20000 train_loss: 2.3943 train_time: 2.7m tok/s: 8277323 +1694/20000 train_loss: 2.6124 train_time: 2.7m tok/s: 8277293 +1695/20000 train_loss: 2.6157 train_time: 2.7m tok/s: 8277317 +1696/20000 train_loss: 2.6578 train_time: 2.7m tok/s: 8277368 +1697/20000 train_loss: 2.7591 train_time: 2.7m tok/s: 8277405 +1698/20000 train_loss: 2.6472 train_time: 2.7m tok/s: 8277431 +1699/20000 train_loss: 2.7436 train_time: 2.7m tok/s: 8277409 +1700/20000 train_loss: 2.5799 train_time: 2.7m tok/s: 8277410 +1701/20000 train_loss: 2.4805 train_time: 2.7m tok/s: 8277464 +1702/20000 train_loss: 2.6125 train_time: 2.7m tok/s: 8277460 +1703/20000 train_loss: 2.6467 train_time: 2.7m tok/s: 8277440 +1704/20000 train_loss: 2.7606 train_time: 2.7m tok/s: 8277490 +1705/20000 train_loss: 2.7770 train_time: 2.7m tok/s: 8277448 +1706/20000 train_loss: 2.7040 train_time: 2.7m tok/s: 8277465 +1707/20000 train_loss: 2.8438 train_time: 2.7m tok/s: 8277501 +1708/20000 train_loss: 2.4784 train_time: 2.7m tok/s: 8277435 +1709/20000 train_loss: 2.6634 train_time: 2.7m tok/s: 8277409 +1710/20000 train_loss: 2.6364 train_time: 2.7m tok/s: 8277453 +1711/20000 train_loss: 2.5726 train_time: 2.7m tok/s: 8277488 +1712/20000 train_loss: 2.6988 train_time: 2.7m tok/s: 8277542 +1713/20000 train_loss: 2.7884 train_time: 2.7m tok/s: 8277514 +1714/20000 train_loss: 2.5612 train_time: 2.7m tok/s: 8277517 +1715/20000 train_loss: 2.7481 train_time: 2.7m tok/s: 8277485 +1716/20000 train_loss: 2.7239 train_time: 2.7m tok/s: 8277535 +1717/20000 train_loss: 2.7375 train_time: 2.7m tok/s: 8277510 +1718/20000 train_loss: 2.8117 train_time: 2.7m tok/s: 8277490 +1719/20000 train_loss: 2.7147 train_time: 2.7m tok/s: 8277474 +1720/20000 train_loss: 2.5294 train_time: 2.7m tok/s: 8277435 +1721/20000 train_loss: 2.6054 train_time: 2.7m tok/s: 8277428 +1722/20000 train_loss: 2.7303 train_time: 2.7m tok/s: 8277447 +1723/20000 train_loss: 2.6240 train_time: 2.7m tok/s: 8277487 +1724/20000 train_loss: 2.6615 train_time: 2.7m tok/s: 8277536 +1725/20000 train_loss: 2.5989 train_time: 2.7m tok/s: 8277537 +1726/20000 train_loss: 2.6372 train_time: 2.7m tok/s: 8277613 +1727/20000 train_loss: 2.5938 train_time: 2.7m tok/s: 8277591 +1728/20000 train_loss: 2.8359 train_time: 2.7m tok/s: 8277640 +1729/20000 train_loss: 2.6595 train_time: 2.7m tok/s: 8277602 +1730/20000 train_loss: 2.7417 train_time: 2.7m tok/s: 8277634 +1731/20000 train_loss: 2.7440 train_time: 2.7m tok/s: 8277658 +1732/20000 train_loss: 2.7069 train_time: 2.7m tok/s: 8277699 +1733/20000 train_loss: 2.6981 train_time: 2.7m tok/s: 8277663 +1734/20000 train_loss: 2.6059 train_time: 2.7m tok/s: 8277665 +1735/20000 train_loss: 2.4687 train_time: 2.7m tok/s: 8277654 +1736/20000 train_loss: 2.6952 train_time: 2.7m tok/s: 8277610 +1737/20000 train_loss: 2.5782 train_time: 2.8m tok/s: 8277554 +1738/20000 train_loss: 2.8160 train_time: 2.8m tok/s: 8277566 +1739/20000 train_loss: 2.7475 train_time: 2.8m tok/s: 8277526 +1740/20000 train_loss: 2.3919 train_time: 2.8m tok/s: 8277639 +1741/20000 train_loss: 2.7940 train_time: 2.8m tok/s: 8277657 +1742/20000 train_loss: 2.6120 train_time: 2.8m tok/s: 8277691 +1743/20000 train_loss: 2.5234 train_time: 2.8m tok/s: 8277705 +1744/20000 train_loss: 2.5885 train_time: 2.8m tok/s: 8277714 +1745/20000 train_loss: 2.6302 train_time: 2.8m tok/s: 8277708 +1746/20000 train_loss: 2.6100 train_time: 2.8m tok/s: 8277731 +1747/20000 train_loss: 2.6385 train_time: 2.8m tok/s: 8277781 +1748/20000 train_loss: 2.5728 train_time: 2.8m tok/s: 8277781 +1749/20000 train_loss: 2.6246 train_time: 2.8m tok/s: 8277716 +1750/20000 train_loss: 2.6705 train_time: 2.8m tok/s: 8277720 +1751/20000 train_loss: 2.6705 train_time: 2.8m tok/s: 8277686 +1752/20000 train_loss: 2.6319 train_time: 2.8m tok/s: 8277835 +1753/20000 train_loss: 2.6120 train_time: 2.8m tok/s: 8277890 +1754/20000 train_loss: 2.6789 train_time: 2.8m tok/s: 8277911 +1755/20000 train_loss: 2.5908 train_time: 2.8m tok/s: 8277957 +1756/20000 train_loss: 2.6149 train_time: 2.8m tok/s: 8277869 +1757/20000 train_loss: 2.5819 train_time: 2.8m tok/s: 8277843 +1758/20000 train_loss: 2.8681 train_time: 2.8m tok/s: 8277823 +1759/20000 train_loss: 2.6325 train_time: 2.8m tok/s: 8277814 +1760/20000 train_loss: 2.5333 train_time: 2.8m tok/s: 8277742 +1761/20000 train_loss: 2.6347 train_time: 2.8m tok/s: 8277703 +1762/20000 train_loss: 2.7003 train_time: 2.8m tok/s: 8277748 +1763/20000 train_loss: 2.7148 train_time: 2.8m tok/s: 8277752 +1764/20000 train_loss: 2.6562 train_time: 2.8m tok/s: 8277770 +1765/20000 train_loss: 2.5830 train_time: 2.8m tok/s: 8277803 +1766/20000 train_loss: 2.7115 train_time: 2.8m tok/s: 8277819 +1767/20000 train_loss: 2.5760 train_time: 2.8m tok/s: 8277854 +1768/20000 train_loss: 2.6439 train_time: 2.8m tok/s: 8277858 +1769/20000 train_loss: 2.6439 train_time: 2.8m tok/s: 8277832 +1770/20000 train_loss: 2.6595 train_time: 2.8m tok/s: 8277815 +1771/20000 train_loss: 2.5251 train_time: 2.8m tok/s: 8277811 +1772/20000 train_loss: 2.5487 train_time: 2.8m tok/s: 8277806 +1773/20000 train_loss: 2.8520 train_time: 2.8m tok/s: 8277852 +1774/20000 train_loss: 2.7107 train_time: 2.8m tok/s: 8277913 +1775/20000 train_loss: 2.7518 train_time: 2.8m tok/s: 8277992 +1776/20000 train_loss: 2.5792 train_time: 2.8m tok/s: 8277984 +1777/20000 train_loss: 2.6960 train_time: 2.8m tok/s: 8277953 +1778/20000 train_loss: 2.6583 train_time: 2.8m tok/s: 8277963 +1779/20000 train_loss: 2.6569 train_time: 2.8m tok/s: 8278022 +1780/20000 train_loss: 2.6549 train_time: 2.8m tok/s: 8277986 +1781/20000 train_loss: 2.5264 train_time: 2.8m tok/s: 8277981 +1782/20000 train_loss: 2.4141 train_time: 2.8m tok/s: 8278002 +1783/20000 train_loss: 2.6268 train_time: 2.8m tok/s: 8277977 +1784/20000 train_loss: 2.6292 train_time: 2.8m tok/s: 8277967 +1785/20000 train_loss: 2.6405 train_time: 2.8m tok/s: 8278030 +1786/20000 train_loss: 2.7969 train_time: 2.8m tok/s: 8278064 +1787/20000 train_loss: 2.7149 train_time: 2.8m tok/s: 8278087 +1788/20000 train_loss: 2.6302 train_time: 2.8m tok/s: 8278129 +1789/20000 train_loss: 2.7132 train_time: 2.8m tok/s: 8278143 +1790/20000 train_loss: 2.5278 train_time: 2.8m tok/s: 8278182 +1791/20000 train_loss: 2.3533 train_time: 2.8m tok/s: 8278125 +1792/20000 train_loss: 2.6321 train_time: 2.8m tok/s: 8278075 +1793/20000 train_loss: 2.4961 train_time: 2.8m tok/s: 8278120 +1794/20000 train_loss: 2.4341 train_time: 2.8m tok/s: 8278144 +1795/20000 train_loss: 2.6233 train_time: 2.8m tok/s: 8278140 +1796/20000 train_loss: 2.6485 train_time: 2.8m tok/s: 8278184 +1797/20000 train_loss: 2.8051 train_time: 2.8m tok/s: 8278232 +1798/20000 train_loss: 2.5721 train_time: 2.8m tok/s: 8278221 +1799/20000 train_loss: 2.6580 train_time: 2.8m tok/s: 8278162 +1800/20000 train_loss: 2.5581 train_time: 2.9m tok/s: 8278172 +1801/20000 train_loss: 2.6820 train_time: 2.9m tok/s: 8278192 +1802/20000 train_loss: 2.5833 train_time: 2.9m tok/s: 8278127 +1803/20000 train_loss: 2.6388 train_time: 2.9m tok/s: 8278096 +1804/20000 train_loss: 2.6007 train_time: 2.9m tok/s: 8278132 +1805/20000 train_loss: 2.5872 train_time: 2.9m tok/s: 8278164 +1806/20000 train_loss: 2.8111 train_time: 2.9m tok/s: 8278153 +1807/20000 train_loss: 2.7097 train_time: 2.9m tok/s: 8278109 +1808/20000 train_loss: 2.7237 train_time: 2.9m tok/s: 8278140 +1809/20000 train_loss: 2.6091 train_time: 2.9m tok/s: 8278095 +1810/20000 train_loss: 2.7177 train_time: 2.9m tok/s: 8277991 +1811/20000 train_loss: 2.5672 train_time: 2.9m tok/s: 8277974 +1812/20000 train_loss: 2.6129 train_time: 2.9m tok/s: 8277957 +1813/20000 train_loss: 2.6979 train_time: 2.9m tok/s: 8277960 +1814/20000 train_loss: 2.7529 train_time: 2.9m tok/s: 8277985 +1815/20000 train_loss: 2.5548 train_time: 2.9m tok/s: 8277985 +1816/20000 train_loss: 2.4903 train_time: 2.9m tok/s: 8277968 +1817/20000 train_loss: 2.8469 train_time: 2.9m tok/s: 8277963 +1818/20000 train_loss: 2.6741 train_time: 2.9m tok/s: 8278027 +1819/20000 train_loss: 2.6920 train_time: 2.9m tok/s: 8278021 +1820/20000 train_loss: 2.5803 train_time: 2.9m tok/s: 8277990 +1821/20000 train_loss: 2.6466 train_time: 2.9m tok/s: 8278002 +1822/20000 train_loss: 2.7071 train_time: 2.9m tok/s: 8278011 +1823/20000 train_loss: 2.4053 train_time: 2.9m tok/s: 8277962 +1824/20000 train_loss: 2.6058 train_time: 2.9m tok/s: 8277889 +1825/20000 train_loss: 2.6279 train_time: 2.9m tok/s: 8277914 +1826/20000 train_loss: 2.4597 train_time: 2.9m tok/s: 8277915 +1827/20000 train_loss: 2.5771 train_time: 2.9m tok/s: 8277891 +1828/20000 train_loss: 2.5258 train_time: 2.9m tok/s: 8277837 +1829/20000 train_loss: 2.4680 train_time: 2.9m tok/s: 8277847 +1830/20000 train_loss: 2.6488 train_time: 2.9m tok/s: 8277864 +1831/20000 train_loss: 2.6514 train_time: 2.9m tok/s: 8277893 +1832/20000 train_loss: 2.6399 train_time: 2.9m tok/s: 8277914 +1833/20000 train_loss: 2.7047 train_time: 2.9m tok/s: 8277886 +1834/20000 train_loss: 2.6725 train_time: 2.9m tok/s: 8277945 +1835/20000 train_loss: 2.5779 train_time: 2.9m tok/s: 8277955 +1836/20000 train_loss: 2.5766 train_time: 2.9m tok/s: 8277919 +1837/20000 train_loss: 2.4866 train_time: 2.9m tok/s: 8277994 +1838/20000 train_loss: 2.6108 train_time: 2.9m tok/s: 8277991 +1839/20000 train_loss: 2.7137 train_time: 2.9m tok/s: 8278009 +1840/20000 train_loss: 2.6546 train_time: 2.9m tok/s: 8278005 +1841/20000 train_loss: 2.6940 train_time: 2.9m tok/s: 8277987 +1842/20000 train_loss: 2.6288 train_time: 2.9m tok/s: 8278014 +1843/20000 train_loss: 2.5603 train_time: 2.9m tok/s: 8278020 +1844/20000 train_loss: 2.6525 train_time: 2.9m tok/s: 8278023 +1845/20000 train_loss: 2.8151 train_time: 2.9m tok/s: 8277987 +1846/20000 train_loss: 2.5878 train_time: 2.9m tok/s: 8277967 +1847/20000 train_loss: 2.4690 train_time: 2.9m tok/s: 8277981 +1848/20000 train_loss: 2.5324 train_time: 2.9m tok/s: 8277954 +1849/20000 train_loss: 2.5244 train_time: 2.9m tok/s: 8277997 +1850/20000 train_loss: 2.6476 train_time: 2.9m tok/s: 8278033 +1851/20000 train_loss: 2.6626 train_time: 2.9m tok/s: 8278033 +1852/20000 train_loss: 2.6725 train_time: 2.9m tok/s: 8278054 +1853/20000 train_loss: 2.5128 train_time: 2.9m tok/s: 8278050 +1854/20000 train_loss: 2.5555 train_time: 2.9m tok/s: 8278100 +1855/20000 train_loss: 2.6406 train_time: 2.9m tok/s: 8278121 +1856/20000 train_loss: 2.6701 train_time: 2.9m tok/s: 8278123 +1857/20000 train_loss: 2.7878 train_time: 2.9m tok/s: 8278160 +1858/20000 train_loss: 2.7326 train_time: 2.9m tok/s: 8278201 +1859/20000 train_loss: 2.6439 train_time: 2.9m tok/s: 8278225 +1860/20000 train_loss: 2.5822 train_time: 2.9m tok/s: 8278241 +1861/20000 train_loss: 2.5712 train_time: 2.9m tok/s: 8278242 +1862/20000 train_loss: 2.5421 train_time: 2.9m tok/s: 8278250 +1863/20000 train_loss: 2.6246 train_time: 2.9m tok/s: 8278278 +1864/20000 train_loss: 2.5796 train_time: 3.0m tok/s: 8278237 +1865/20000 train_loss: 2.7346 train_time: 3.0m tok/s: 8278298 +1866/20000 train_loss: 2.6410 train_time: 3.0m tok/s: 8278200 +1867/20000 train_loss: 2.5486 train_time: 3.0m tok/s: 8278147 +1868/20000 train_loss: 2.5432 train_time: 3.0m tok/s: 8278168 +1869/20000 train_loss: 2.6833 train_time: 3.0m tok/s: 8278183 +1870/20000 train_loss: 2.6386 train_time: 3.0m tok/s: 8278176 +1871/20000 train_loss: 2.5514 train_time: 3.0m tok/s: 8278165 +1872/20000 train_loss: 2.5844 train_time: 3.0m tok/s: 8278182 +1873/20000 train_loss: 2.7284 train_time: 3.0m tok/s: 8278212 +1874/20000 train_loss: 2.6598 train_time: 3.0m tok/s: 8278217 +1875/20000 train_loss: 2.7483 train_time: 3.0m tok/s: 8278249 +1876/20000 train_loss: 2.7845 train_time: 3.0m tok/s: 8278273 +1877/20000 train_loss: 2.9599 train_time: 3.0m tok/s: 8278197 +1878/20000 train_loss: 2.6575 train_time: 3.0m tok/s: 8278121 +1879/20000 train_loss: 2.6109 train_time: 3.0m tok/s: 8278140 +1880/20000 train_loss: 2.7496 train_time: 3.0m tok/s: 8278156 +1881/20000 train_loss: 2.5941 train_time: 3.0m tok/s: 8278179 +1882/20000 train_loss: 2.7271 train_time: 3.0m tok/s: 8278215 +1883/20000 train_loss: 2.5978 train_time: 3.0m tok/s: 8278260 +1884/20000 train_loss: 2.5613 train_time: 3.0m tok/s: 8278224 +1885/20000 train_loss: 2.6419 train_time: 3.0m tok/s: 8278196 +1886/20000 train_loss: 2.5624 train_time: 3.0m tok/s: 8278239 +1887/20000 train_loss: 2.6347 train_time: 3.0m tok/s: 8278235 +1888/20000 train_loss: 2.4894 train_time: 3.0m tok/s: 8278261 +1889/20000 train_loss: 2.5433 train_time: 3.0m tok/s: 8278323 +1890/20000 train_loss: 2.6763 train_time: 3.0m tok/s: 8278279 +1891/20000 train_loss: 2.5100 train_time: 3.0m tok/s: 8278307 +1892/20000 train_loss: 2.6807 train_time: 3.0m tok/s: 8278322 +1893/20000 train_loss: 2.6850 train_time: 3.0m tok/s: 8278342 +1894/20000 train_loss: 2.5938 train_time: 3.0m tok/s: 8278328 +1895/20000 train_loss: 2.6494 train_time: 3.0m tok/s: 8278369 +1896/20000 train_loss: 2.6222 train_time: 3.0m tok/s: 8278292 +1897/20000 train_loss: 2.5616 train_time: 3.0m tok/s: 8278354 +1898/20000 train_loss: 2.7029 train_time: 3.0m tok/s: 8278344 +1899/20000 train_loss: 2.6254 train_time: 3.0m tok/s: 8278395 +1900/20000 train_loss: 2.6188 train_time: 3.0m tok/s: 8278452 +1901/20000 train_loss: 2.6875 train_time: 3.0m tok/s: 8278431 +1902/20000 train_loss: 2.6058 train_time: 3.0m tok/s: 8278450 +1903/20000 train_loss: 2.7594 train_time: 3.0m tok/s: 8278465 +1904/20000 train_loss: 3.1367 train_time: 3.0m tok/s: 8278458 +1905/20000 train_loss: 2.4854 train_time: 3.0m tok/s: 8278366 +1906/20000 train_loss: 2.6346 train_time: 3.0m tok/s: 8278350 +1907/20000 train_loss: 2.5161 train_time: 3.0m tok/s: 8278409 +1908/20000 train_loss: 2.5376 train_time: 3.0m tok/s: 8278398 +1909/20000 train_loss: 2.5840 train_time: 3.0m tok/s: 8278407 +1910/20000 train_loss: 2.5346 train_time: 3.0m tok/s: 8278387 +1911/20000 train_loss: 2.4885 train_time: 3.0m tok/s: 8278423 +1912/20000 train_loss: 2.7060 train_time: 3.0m tok/s: 8278486 +1913/20000 train_loss: 2.7184 train_time: 3.0m tok/s: 8278511 +1914/20000 train_loss: 2.6987 train_time: 3.0m tok/s: 8278521 +1915/20000 train_loss: 2.7139 train_time: 3.0m tok/s: 8278542 +1916/20000 train_loss: 2.5744 train_time: 3.0m tok/s: 8278541 +1917/20000 train_loss: 2.7111 train_time: 3.0m tok/s: 8278494 +1918/20000 train_loss: 2.5761 train_time: 3.0m tok/s: 8278514 +1919/20000 train_loss: 2.5594 train_time: 3.0m tok/s: 8278543 +1920/20000 train_loss: 2.4998 train_time: 3.0m tok/s: 8278540 +1921/20000 train_loss: 2.7061 train_time: 3.0m tok/s: 8278549 +1922/20000 train_loss: 2.6004 train_time: 3.0m tok/s: 8278590 +1923/20000 train_loss: 2.5222 train_time: 3.0m tok/s: 8278627 +1924/20000 train_loss: 2.5948 train_time: 3.0m tok/s: 8278633 +1925/20000 train_loss: 2.5358 train_time: 3.0m tok/s: 8278626 +1926/20000 train_loss: 2.7434 train_time: 3.0m tok/s: 8278644 +1927/20000 train_loss: 2.5739 train_time: 3.1m tok/s: 8278639 +1928/20000 train_loss: 2.6337 train_time: 3.1m tok/s: 8278664 +1929/20000 train_loss: 2.6461 train_time: 3.1m tok/s: 8278655 +1930/20000 train_loss: 2.7059 train_time: 3.1m tok/s: 8278662 +1931/20000 train_loss: 2.6436 train_time: 3.1m tok/s: 8278682 +1932/20000 train_loss: 2.7559 train_time: 3.1m tok/s: 8278711 +1933/20000 train_loss: 2.6562 train_time: 3.1m tok/s: 8278686 +1934/20000 train_loss: 2.6641 train_time: 3.1m tok/s: 8278713 +1935/20000 train_loss: 2.5647 train_time: 3.1m tok/s: 8278707 +1936/20000 train_loss: 2.6765 train_time: 3.1m tok/s: 8278692 +1937/20000 train_loss: 2.6694 train_time: 3.1m tok/s: 8278717 +1938/20000 train_loss: 2.7066 train_time: 3.1m tok/s: 8278754 +1939/20000 train_loss: 2.6218 train_time: 3.1m tok/s: 8278792 +1940/20000 train_loss: 2.8130 train_time: 3.1m tok/s: 8278785 +1941/20000 train_loss: 2.4731 train_time: 3.1m tok/s: 8278811 +1942/20000 train_loss: 2.4989 train_time: 3.1m tok/s: 8278886 +1943/20000 train_loss: 2.5009 train_time: 3.1m tok/s: 8278888 +1944/20000 train_loss: 2.5221 train_time: 3.1m tok/s: 8278900 +1945/20000 train_loss: 2.5770 train_time: 3.1m tok/s: 8278901 +1946/20000 train_loss: 2.6215 train_time: 3.1m tok/s: 8278942 +1947/20000 train_loss: 2.6899 train_time: 3.1m tok/s: 8278950 +1948/20000 train_loss: 2.6802 train_time: 3.1m tok/s: 8278939 +1949/20000 train_loss: 2.7305 train_time: 3.1m tok/s: 8278932 +1950/20000 train_loss: 2.5873 train_time: 3.1m tok/s: 8278972 +1951/20000 train_loss: 2.8077 train_time: 3.1m tok/s: 8278956 +1952/20000 train_loss: 2.8242 train_time: 3.1m tok/s: 8278961 +1953/20000 train_loss: 2.6397 train_time: 3.1m tok/s: 8278972 +1954/20000 train_loss: 2.5871 train_time: 3.1m tok/s: 8279019 +1955/20000 train_loss: 2.8588 train_time: 3.1m tok/s: 8279013 +1956/20000 train_loss: 2.5837 train_time: 3.1m tok/s: 8279017 +1957/20000 train_loss: 2.6059 train_time: 3.1m tok/s: 8279037 +1958/20000 train_loss: 2.5658 train_time: 3.1m tok/s: 8279076 +1959/20000 train_loss: 2.5512 train_time: 3.1m tok/s: 8279080 +1960/20000 train_loss: 2.5034 train_time: 3.1m tok/s: 8279110 +1961/20000 train_loss: 2.5139 train_time: 3.1m tok/s: 8279081 +1962/20000 train_loss: 2.6017 train_time: 3.1m tok/s: 8279093 +1963/20000 train_loss: 2.5665 train_time: 3.1m tok/s: 8279073 +1964/20000 train_loss: 2.5895 train_time: 3.1m tok/s: 8279133 +1965/20000 train_loss: 2.5890 train_time: 3.1m tok/s: 8279120 +1966/20000 train_loss: 2.7508 train_time: 3.1m tok/s: 8279110 +1967/20000 train_loss: 2.5610 train_time: 3.1m tok/s: 8279074 +1968/20000 train_loss: 2.7106 train_time: 3.1m tok/s: 8279065 +1969/20000 train_loss: 2.7651 train_time: 3.1m tok/s: 8279102 +1970/20000 train_loss: 2.5884 train_time: 3.1m tok/s: 8279101 +1971/20000 train_loss: 2.6186 train_time: 3.1m tok/s: 8279099 +1972/20000 train_loss: 2.6855 train_time: 3.1m tok/s: 8279117 +1973/20000 train_loss: 2.6004 train_time: 3.1m tok/s: 8279128 +1974/20000 train_loss: 2.7692 train_time: 3.1m tok/s: 8279090 +1975/20000 train_loss: 2.5243 train_time: 3.1m tok/s: 8279080 +1976/20000 train_loss: 2.7190 train_time: 3.1m tok/s: 8279126 +1977/20000 train_loss: 2.5277 train_time: 3.1m tok/s: 8279170 +1978/20000 train_loss: 2.6867 train_time: 3.1m tok/s: 8279168 +1979/20000 train_loss: 2.5286 train_time: 3.1m tok/s: 8279186 +1980/20000 train_loss: 2.5741 train_time: 3.1m tok/s: 8279218 +1981/20000 train_loss: 2.4769 train_time: 3.1m tok/s: 8279256 +1982/20000 train_loss: 2.6437 train_time: 3.1m tok/s: 8279200 +1983/20000 train_loss: 2.3817 train_time: 3.1m tok/s: 8279167 +1984/20000 train_loss: 2.6907 train_time: 3.1m tok/s: 8279147 +1985/20000 train_loss: 2.6284 train_time: 3.1m tok/s: 8279166 +1986/20000 train_loss: 2.6743 train_time: 3.1m tok/s: 8279197 +1987/20000 train_loss: 2.6898 train_time: 3.1m tok/s: 8279256 +1988/20000 train_loss: 2.6549 train_time: 3.1m tok/s: 8279319 +1989/20000 train_loss: 2.4878 train_time: 3.1m tok/s: 8279318 +1990/20000 train_loss: 2.6482 train_time: 3.2m tok/s: 8279315 +1991/20000 train_loss: 2.5719 train_time: 3.2m tok/s: 8279376 +1992/20000 train_loss: 2.7508 train_time: 3.2m tok/s: 8279417 +1993/20000 train_loss: 2.5696 train_time: 3.2m tok/s: 8279365 +1994/20000 train_loss: 2.6153 train_time: 3.2m tok/s: 8279359 +1995/20000 train_loss: 2.5196 train_time: 3.2m tok/s: 8279394 +1996/20000 train_loss: 2.6164 train_time: 3.2m tok/s: 8279440 +1997/20000 train_loss: 2.5970 train_time: 3.2m tok/s: 8279448 +1998/20000 train_loss: 2.6140 train_time: 3.2m tok/s: 8279509 +1999/20000 train_loss: 2.6754 train_time: 3.2m tok/s: 8279519 +2000/20000 train_loss: 2.4945 train_time: 3.2m tok/s: 8279532 +2001/20000 train_loss: 2.5884 train_time: 3.2m tok/s: 8279551 +2002/20000 train_loss: 2.4526 train_time: 3.2m tok/s: 8279470 +2003/20000 train_loss: 2.6501 train_time: 3.2m tok/s: 8279566 +2004/20000 train_loss: 2.6351 train_time: 3.2m tok/s: 8279603 +2005/20000 train_loss: 2.6178 train_time: 3.2m tok/s: 8279638 +2006/20000 train_loss: 2.5769 train_time: 3.2m tok/s: 8279669 +2007/20000 train_loss: 2.6280 train_time: 3.2m tok/s: 8279680 +2008/20000 train_loss: 2.5465 train_time: 3.2m tok/s: 8279663 +2009/20000 train_loss: 2.6550 train_time: 3.2m tok/s: 8279695 +2010/20000 train_loss: 2.7062 train_time: 3.2m tok/s: 8279733 +2011/20000 train_loss: 2.5728 train_time: 3.2m tok/s: 8279735 +2012/20000 train_loss: 2.5813 train_time: 3.2m tok/s: 8279752 +2013/20000 train_loss: 2.4827 train_time: 3.2m tok/s: 8279751 +2014/20000 train_loss: 2.4805 train_time: 3.2m tok/s: 8279731 +2015/20000 train_loss: 2.6996 train_time: 3.2m tok/s: 8279710 +2016/20000 train_loss: 2.4832 train_time: 3.2m tok/s: 8279684 +2017/20000 train_loss: 2.6374 train_time: 3.2m tok/s: 8279689 +2018/20000 train_loss: 2.6272 train_time: 3.2m tok/s: 8279760 +2019/20000 train_loss: 2.7204 train_time: 3.2m tok/s: 8279769 +2020/20000 train_loss: 2.7102 train_time: 3.2m tok/s: 8279780 +2021/20000 train_loss: 2.5656 train_time: 3.2m tok/s: 8279785 +2022/20000 train_loss: 2.4998 train_time: 3.2m tok/s: 8279788 +2023/20000 train_loss: 2.6927 train_time: 3.2m tok/s: 8279784 +2024/20000 train_loss: 2.6393 train_time: 3.2m tok/s: 8279766 +2025/20000 train_loss: 2.4815 train_time: 3.2m tok/s: 8279733 +2026/20000 train_loss: 2.7107 train_time: 3.2m tok/s: 8279730 +2027/20000 train_loss: 2.5896 train_time: 3.2m tok/s: 8279720 +2028/20000 train_loss: 2.7162 train_time: 3.2m tok/s: 8279651 +2029/20000 train_loss: 2.4961 train_time: 3.2m tok/s: 8279664 +2030/20000 train_loss: 2.5181 train_time: 3.2m tok/s: 8279681 +2031/20000 train_loss: 2.5048 train_time: 3.2m tok/s: 8279724 +2032/20000 train_loss: 2.5630 train_time: 3.2m tok/s: 8279654 +2033/20000 train_loss: 2.8344 train_time: 3.2m tok/s: 8279633 +2034/20000 train_loss: 2.6808 train_time: 3.2m tok/s: 8279650 +2035/20000 train_loss: 2.6606 train_time: 3.2m tok/s: 8279696 +2036/20000 train_loss: 2.6176 train_time: 3.2m tok/s: 8279680 +2037/20000 train_loss: 2.8338 train_time: 3.2m tok/s: 8279612 +2038/20000 train_loss: 2.5916 train_time: 3.2m tok/s: 8279555 +2039/20000 train_loss: 2.6321 train_time: 3.2m tok/s: 8279609 +2040/20000 train_loss: 2.5791 train_time: 3.2m tok/s: 8279613 +2041/20000 train_loss: 2.6398 train_time: 3.2m tok/s: 8279644 +2042/20000 train_loss: 2.5425 train_time: 3.2m tok/s: 8279623 +2043/20000 train_loss: 2.4899 train_time: 3.2m tok/s: 8279599 +2044/20000 train_loss: 2.6535 train_time: 3.2m tok/s: 8279607 +2045/20000 train_loss: 2.4405 train_time: 3.2m tok/s: 8279671 +2046/20000 train_loss: 2.4640 train_time: 3.2m tok/s: 8279728 +2047/20000 train_loss: 2.7460 train_time: 3.2m tok/s: 8279731 +2048/20000 train_loss: 2.5692 train_time: 3.2m tok/s: 8279735 +2049/20000 train_loss: 2.7137 train_time: 3.2m tok/s: 8279776 +2050/20000 train_loss: 2.6678 train_time: 3.2m tok/s: 8279777 +2051/20000 train_loss: 2.6450 train_time: 3.2m tok/s: 8279820 +2052/20000 train_loss: 2.5342 train_time: 3.2m tok/s: 8279914 +2053/20000 train_loss: 2.6357 train_time: 3.2m tok/s: 8279943 +2054/20000 train_loss: 2.6737 train_time: 3.3m tok/s: 8279990 +2055/20000 train_loss: 2.5781 train_time: 3.3m tok/s: 8279999 +2056/20000 train_loss: 2.6188 train_time: 3.3m tok/s: 8280001 +2057/20000 train_loss: 2.6610 train_time: 3.3m tok/s: 8280012 +2058/20000 train_loss: 2.5584 train_time: 3.3m tok/s: 8280012 +2059/20000 train_loss: 2.4952 train_time: 3.3m tok/s: 8280031 +2060/20000 train_loss: 2.5902 train_time: 3.3m tok/s: 8280028 +2061/20000 train_loss: 2.5889 train_time: 3.3m tok/s: 8280039 +2062/20000 train_loss: 2.6144 train_time: 3.3m tok/s: 8280027 +2063/20000 train_loss: 2.5529 train_time: 3.3m tok/s: 8280054 +2064/20000 train_loss: 2.7982 train_time: 3.3m tok/s: 8280102 +2065/20000 train_loss: 2.5352 train_time: 3.3m tok/s: 8280142 +2066/20000 train_loss: 2.6100 train_time: 3.3m tok/s: 8280101 +2067/20000 train_loss: 2.6673 train_time: 3.3m tok/s: 8280093 +2068/20000 train_loss: 2.6014 train_time: 3.3m tok/s: 8280114 +2069/20000 train_loss: 2.4590 train_time: 3.3m tok/s: 8280116 +2070/20000 train_loss: 2.6116 train_time: 3.3m tok/s: 8280074 +2071/20000 train_loss: 2.5526 train_time: 3.3m tok/s: 8280049 +2072/20000 train_loss: 2.5985 train_time: 3.3m tok/s: 8280055 +2073/20000 train_loss: 2.5321 train_time: 3.3m tok/s: 8280074 +2074/20000 train_loss: 2.6903 train_time: 3.3m tok/s: 8280077 +2075/20000 train_loss: 2.5732 train_time: 3.3m tok/s: 8280140 +2076/20000 train_loss: 2.6665 train_time: 3.3m tok/s: 8280147 +2077/20000 train_loss: 3.5637 train_time: 3.3m tok/s: 8280066 +2078/20000 train_loss: 2.7172 train_time: 3.3m tok/s: 8279963 +2079/20000 train_loss: 2.6616 train_time: 3.3m tok/s: 8279960 +2080/20000 train_loss: 2.6011 train_time: 3.3m tok/s: 8279968 +2081/20000 train_loss: 2.6066 train_time: 3.3m tok/s: 8279941 +2082/20000 train_loss: 2.5995 train_time: 3.3m tok/s: 8279919 +2083/20000 train_loss: 2.5423 train_time: 3.3m tok/s: 8279938 +2084/20000 train_loss: 2.5834 train_time: 3.3m tok/s: 8279983 +2085/20000 train_loss: 2.5822 train_time: 3.3m tok/s: 8280015 +2086/20000 train_loss: 2.6139 train_time: 3.3m tok/s: 8280023 +2087/20000 train_loss: 2.5504 train_time: 3.3m tok/s: 8280073 +2088/20000 train_loss: 2.4660 train_time: 3.3m tok/s: 8280035 +2089/20000 train_loss: 2.6281 train_time: 3.3m tok/s: 8280074 +2090/20000 train_loss: 2.7358 train_time: 3.3m tok/s: 8280134 +2091/20000 train_loss: 2.6223 train_time: 3.3m tok/s: 8280156 +2092/20000 train_loss: 2.6548 train_time: 3.3m tok/s: 8280175 +2093/20000 train_loss: 2.6408 train_time: 3.3m tok/s: 8280224 +2094/20000 train_loss: 2.6015 train_time: 3.3m tok/s: 8280290 +2095/20000 train_loss: 2.5895 train_time: 3.3m tok/s: 8280328 +2096/20000 train_loss: 2.6921 train_time: 3.3m tok/s: 8280343 +2097/20000 train_loss: 2.5727 train_time: 3.3m tok/s: 8280340 +2098/20000 train_loss: 2.4863 train_time: 3.3m tok/s: 8280315 +2099/20000 train_loss: 2.4869 train_time: 3.3m tok/s: 8280377 +2100/20000 train_loss: 2.5809 train_time: 3.3m tok/s: 8280374 +2101/20000 train_loss: 2.6497 train_time: 3.3m tok/s: 8280337 +2102/20000 train_loss: 2.5712 train_time: 3.3m tok/s: 8280336 +2103/20000 train_loss: 2.5889 train_time: 3.3m tok/s: 8280313 +2104/20000 train_loss: 2.7109 train_time: 3.3m tok/s: 8280344 +2105/20000 train_loss: 2.7213 train_time: 3.3m tok/s: 8280386 +2106/20000 train_loss: 2.7041 train_time: 3.3m tok/s: 8280413 +2107/20000 train_loss: 2.6031 train_time: 3.3m tok/s: 8280435 +2108/20000 train_loss: 2.5452 train_time: 3.3m tok/s: 8280370 +2109/20000 train_loss: 2.7136 train_time: 3.3m tok/s: 8280397 +2110/20000 train_loss: 2.5760 train_time: 3.3m tok/s: 8280426 +2111/20000 train_loss: 2.5567 train_time: 3.3m tok/s: 8280437 +2112/20000 train_loss: 2.5508 train_time: 3.3m tok/s: 8280409 +2113/20000 train_loss: 2.5200 train_time: 3.3m tok/s: 8280428 +2114/20000 train_loss: 2.7860 train_time: 3.3m tok/s: 8280452 +2115/20000 train_loss: 2.4931 train_time: 3.3m tok/s: 8280487 +2116/20000 train_loss: 2.6530 train_time: 3.3m tok/s: 8280501 +2117/20000 train_loss: 2.6898 train_time: 3.4m tok/s: 8280539 +2118/20000 train_loss: 2.6961 train_time: 3.4m tok/s: 8280570 +2119/20000 train_loss: 2.8045 train_time: 3.4m tok/s: 8280607 +2120/20000 train_loss: 2.6555 train_time: 3.4m tok/s: 8280644 +2121/20000 train_loss: 2.6326 train_time: 3.4m tok/s: 8280671 +2122/20000 train_loss: 2.5504 train_time: 3.4m tok/s: 8280726 +2123/20000 train_loss: 2.4014 train_time: 3.4m tok/s: 8280721 +2124/20000 train_loss: 2.6353 train_time: 3.4m tok/s: 8280709 +2125/20000 train_loss: 2.6154 train_time: 3.4m tok/s: 8280765 +2126/20000 train_loss: 2.6272 train_time: 3.4m tok/s: 8280712 +2127/20000 train_loss: 2.5554 train_time: 3.4m tok/s: 8280695 +2128/20000 train_loss: 2.5383 train_time: 3.4m tok/s: 8280734 +2129/20000 train_loss: 2.3665 train_time: 3.4m tok/s: 8280761 +2130/20000 train_loss: 2.6770 train_time: 3.4m tok/s: 8280779 +2131/20000 train_loss: 2.5886 train_time: 3.4m tok/s: 8280826 +2132/20000 train_loss: 2.8977 train_time: 3.4m tok/s: 8280850 +2133/20000 train_loss: 2.6967 train_time: 3.4m tok/s: 8280871 +2134/20000 train_loss: 2.6366 train_time: 3.4m tok/s: 8280860 +2135/20000 train_loss: 2.5881 train_time: 3.4m tok/s: 8280854 +2136/20000 train_loss: 2.5998 train_time: 3.4m tok/s: 8280936 +2137/20000 train_loss: 2.5360 train_time: 3.4m tok/s: 8280923 +2138/20000 train_loss: 2.5866 train_time: 3.4m tok/s: 8280957 +2139/20000 train_loss: 2.6766 train_time: 3.4m tok/s: 8280996 +2140/20000 train_loss: 2.5454 train_time: 3.4m tok/s: 8281013 +2141/20000 train_loss: 2.5298 train_time: 3.4m tok/s: 8280976 +2142/20000 train_loss: 2.6054 train_time: 3.4m tok/s: 8281001 +2143/20000 train_loss: 2.4823 train_time: 3.4m tok/s: 8280998 +2144/20000 train_loss: 2.5867 train_time: 3.4m tok/s: 8281003 +2145/20000 train_loss: 2.6656 train_time: 3.4m tok/s: 8280961 +2146/20000 train_loss: 2.5894 train_time: 3.4m tok/s: 8280977 +2147/20000 train_loss: 2.5574 train_time: 3.4m tok/s: 8281012 +2148/20000 train_loss: 2.7065 train_time: 3.4m tok/s: 8280997 +2149/20000 train_loss: 2.5402 train_time: 3.4m tok/s: 8280998 +2150/20000 train_loss: 2.7382 train_time: 3.4m tok/s: 8281030 +2151/20000 train_loss: 2.6811 train_time: 3.4m tok/s: 8281038 +2152/20000 train_loss: 2.5983 train_time: 3.4m tok/s: 8281027 +2153/20000 train_loss: 2.4610 train_time: 3.4m tok/s: 8280984 +2154/20000 train_loss: 2.6295 train_time: 3.4m tok/s: 8280999 +2155/20000 train_loss: 2.5183 train_time: 3.4m tok/s: 8280986 +2156/20000 train_loss: 2.6111 train_time: 3.4m tok/s: 8280995 +2157/20000 train_loss: 2.5929 train_time: 3.4m tok/s: 8280995 +2158/20000 train_loss: 2.5927 train_time: 3.4m tok/s: 8281013 +2159/20000 train_loss: 2.5379 train_time: 3.4m tok/s: 8281025 +2160/20000 train_loss: 2.4319 train_time: 3.4m tok/s: 8281030 +2161/20000 train_loss: 2.6165 train_time: 3.4m tok/s: 8280993 +2162/20000 train_loss: 2.6403 train_time: 3.4m tok/s: 8281043 +2163/20000 train_loss: 2.5627 train_time: 3.4m tok/s: 8281079 +2164/20000 train_loss: 2.6090 train_time: 3.4m tok/s: 8281068 +2165/20000 train_loss: 2.6438 train_time: 3.4m tok/s: 8281051 +2166/20000 train_loss: 2.5219 train_time: 3.4m tok/s: 8281050 +2167/20000 train_loss: 2.5266 train_time: 3.4m tok/s: 8281023 +2168/20000 train_loss: 2.5840 train_time: 3.4m tok/s: 8280980 +2169/20000 train_loss: 2.6329 train_time: 3.4m tok/s: 8280960 +2170/20000 train_loss: 2.4967 train_time: 3.4m tok/s: 8280939 +2171/20000 train_loss: 2.6035 train_time: 3.4m tok/s: 8280936 +2172/20000 train_loss: 2.5111 train_time: 3.4m tok/s: 8280933 +2173/20000 train_loss: 2.7151 train_time: 3.4m tok/s: 8280937 +2174/20000 train_loss: 2.5512 train_time: 3.4m tok/s: 8280937 +2175/20000 train_loss: 2.4537 train_time: 3.4m tok/s: 8280935 +2176/20000 train_loss: 2.6688 train_time: 3.4m tok/s: 8280916 +2177/20000 train_loss: 2.6123 train_time: 3.4m tok/s: 8280919 +2178/20000 train_loss: 2.5288 train_time: 3.4m tok/s: 8280979 +2179/20000 train_loss: 2.7065 train_time: 3.4m tok/s: 8281009 +2180/20000 train_loss: 2.5595 train_time: 3.5m tok/s: 8281002 +2181/20000 train_loss: 2.5587 train_time: 3.5m tok/s: 8281001 +2182/20000 train_loss: 2.4385 train_time: 3.5m tok/s: 8281004 +2183/20000 train_loss: 2.5338 train_time: 3.5m tok/s: 8281026 +2184/20000 train_loss: 2.5415 train_time: 3.5m tok/s: 8281022 +2185/20000 train_loss: 2.4469 train_time: 3.5m tok/s: 8281015 +2186/20000 train_loss: 2.7454 train_time: 3.5m tok/s: 8281061 +2187/20000 train_loss: 2.5558 train_time: 3.5m tok/s: 8281050 +2188/20000 train_loss: 2.5492 train_time: 3.5m tok/s: 8281046 +2189/20000 train_loss: 2.6345 train_time: 3.5m tok/s: 8281053 +2190/20000 train_loss: 2.6805 train_time: 3.5m tok/s: 8281077 +2191/20000 train_loss: 2.6055 train_time: 3.5m tok/s: 8281138 +2192/20000 train_loss: 2.6476 train_time: 3.5m tok/s: 8281145 +2193/20000 train_loss: 2.5835 train_time: 3.5m tok/s: 8281183 +2194/20000 train_loss: 2.5833 train_time: 3.5m tok/s: 8281192 +2195/20000 train_loss: 2.6098 train_time: 3.5m tok/s: 8281197 +2196/20000 train_loss: 2.6048 train_time: 3.5m tok/s: 8281194 +2197/20000 train_loss: 2.5822 train_time: 3.5m tok/s: 8281191 +2198/20000 train_loss: 2.5373 train_time: 3.5m tok/s: 8281206 +2199/20000 train_loss: 2.5359 train_time: 3.5m tok/s: 8281201 +2200/20000 train_loss: 2.6117 train_time: 3.5m tok/s: 8281217 +2201/20000 train_loss: 2.6639 train_time: 3.5m tok/s: 8281226 +2202/20000 train_loss: 2.5852 train_time: 3.5m tok/s: 8281215 +2203/20000 train_loss: 2.5684 train_time: 3.5m tok/s: 8281193 +2204/20000 train_loss: 2.4215 train_time: 3.5m tok/s: 8281198 +2205/20000 train_loss: 2.6506 train_time: 3.5m tok/s: 8281163 +2206/20000 train_loss: 2.5756 train_time: 3.5m tok/s: 8281186 +2207/20000 train_loss: 2.5267 train_time: 3.5m tok/s: 8281195 +2208/20000 train_loss: 2.6567 train_time: 3.5m tok/s: 8281204 +2209/20000 train_loss: 2.7744 train_time: 3.5m tok/s: 8281227 +layer_loop:enabled step:2209 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2210/20000 train_loss: 3.0965 train_time: 3.5m tok/s: 8279224 +2211/20000 train_loss: 2.8640 train_time: 3.5m tok/s: 8277456 +2212/20000 train_loss: 2.5947 train_time: 3.5m tok/s: 8275691 +2213/20000 train_loss: 2.6749 train_time: 3.5m tok/s: 8273957 +2214/20000 train_loss: 2.5687 train_time: 3.5m tok/s: 8272194 +2215/20000 train_loss: 2.6540 train_time: 3.5m tok/s: 8270447 +2216/20000 train_loss: 2.5774 train_time: 3.5m tok/s: 8268703 +2217/20000 train_loss: 2.7332 train_time: 3.5m tok/s: 8266897 +2218/20000 train_loss: 2.7157 train_time: 3.5m tok/s: 8265181 +2219/20000 train_loss: 2.5369 train_time: 3.5m tok/s: 8263369 +2220/20000 train_loss: 2.5765 train_time: 3.5m tok/s: 8261581 +2221/20000 train_loss: 2.6974 train_time: 3.5m tok/s: 8259888 +2222/20000 train_loss: 2.5206 train_time: 3.5m tok/s: 8258060 +2223/20000 train_loss: 2.6031 train_time: 3.5m tok/s: 8256354 +2224/20000 train_loss: 2.4179 train_time: 3.5m tok/s: 8254603 +2225/20000 train_loss: 2.5771 train_time: 3.5m tok/s: 8252852 +2226/20000 train_loss: 2.5609 train_time: 3.5m tok/s: 8251094 +2227/20000 train_loss: 2.5334 train_time: 3.5m tok/s: 8249317 +2228/20000 train_loss: 2.5732 train_time: 3.5m tok/s: 8247595 +2229/20000 train_loss: 2.5317 train_time: 3.5m tok/s: 8245877 +2230/20000 train_loss: 2.5544 train_time: 3.5m tok/s: 8244126 +2231/20000 train_loss: 2.3579 train_time: 3.5m tok/s: 8242340 +2232/20000 train_loss: 2.5466 train_time: 3.6m tok/s: 8240598 +2233/20000 train_loss: 2.6767 train_time: 3.6m tok/s: 8238926 +2234/20000 train_loss: 2.7034 train_time: 3.6m tok/s: 8237225 +2235/20000 train_loss: 2.6585 train_time: 3.6m tok/s: 8235563 +2236/20000 train_loss: 2.6121 train_time: 3.6m tok/s: 8233871 +2237/20000 train_loss: 2.6417 train_time: 3.6m tok/s: 8232192 +2238/20000 train_loss: 2.6595 train_time: 3.6m tok/s: 8230501 +2239/20000 train_loss: 2.7612 train_time: 3.6m tok/s: 8228777 +2240/20000 train_loss: 2.7671 train_time: 3.6m tok/s: 8227057 +2241/20000 train_loss: 2.4057 train_time: 3.6m tok/s: 8225305 +2242/20000 train_loss: 2.5872 train_time: 3.6m tok/s: 8223642 +2243/20000 train_loss: 2.5721 train_time: 3.6m tok/s: 8221945 +2244/20000 train_loss: 2.5990 train_time: 3.6m tok/s: 8220246 +2245/20000 train_loss: 2.6416 train_time: 3.6m tok/s: 8218521 +2246/20000 train_loss: 2.5461 train_time: 3.6m tok/s: 8216833 +2247/20000 train_loss: 2.6853 train_time: 3.6m tok/s: 8215156 +2248/20000 train_loss: 2.6732 train_time: 3.6m tok/s: 8213476 +2249/20000 train_loss: 2.5491 train_time: 3.6m tok/s: 8211797 +2250/20000 train_loss: 2.5425 train_time: 3.6m tok/s: 8210091 +2251/20000 train_loss: 2.5872 train_time: 3.6m tok/s: 8208398 +2252/20000 train_loss: 2.7405 train_time: 3.6m tok/s: 8206720 +2253/20000 train_loss: 2.5744 train_time: 3.6m tok/s: 8205066 +2254/20000 train_loss: 2.5730 train_time: 3.6m tok/s: 8203342 +2255/20000 train_loss: 2.4348 train_time: 3.6m tok/s: 8201692 +2256/20000 train_loss: 2.4798 train_time: 3.6m tok/s: 8200039 +2257/20000 train_loss: 2.5235 train_time: 3.6m tok/s: 8198287 +2258/20000 train_loss: 2.5450 train_time: 3.6m tok/s: 8196581 +2259/20000 train_loss: 2.5676 train_time: 3.6m tok/s: 8194956 +2260/20000 train_loss: 2.5117 train_time: 3.6m tok/s: 8193188 +2261/20000 train_loss: 2.6111 train_time: 3.6m tok/s: 8191543 +2262/20000 train_loss: 2.7139 train_time: 3.6m tok/s: 8189871 +2263/20000 train_loss: 2.7038 train_time: 3.6m tok/s: 8188206 +2264/20000 train_loss: 2.4706 train_time: 3.6m tok/s: 8186463 +2265/20000 train_loss: 2.5563 train_time: 3.6m tok/s: 8184850 +2266/20000 train_loss: 2.5821 train_time: 3.6m tok/s: 8183221 +2267/20000 train_loss: 2.6031 train_time: 3.6m tok/s: 8181610 +2268/20000 train_loss: 2.6942 train_time: 3.6m tok/s: 8180015 +2269/20000 train_loss: 2.7763 train_time: 3.6m tok/s: 8178271 +2270/20000 train_loss: 2.5271 train_time: 3.6m tok/s: 8176580 +2271/20000 train_loss: 2.5838 train_time: 3.6m tok/s: 8174907 +2272/20000 train_loss: 2.5159 train_time: 3.6m tok/s: 8173312 +2273/20000 train_loss: 2.5772 train_time: 3.6m tok/s: 8171705 +2274/20000 train_loss: 3.2794 train_time: 3.6m tok/s: 8170014 +2275/20000 train_loss: 2.4192 train_time: 3.7m tok/s: 8168326 +2276/20000 train_loss: 2.5269 train_time: 3.7m tok/s: 8166678 +2277/20000 train_loss: 2.7813 train_time: 3.7m tok/s: 8165018 +2278/20000 train_loss: 2.7231 train_time: 3.7m tok/s: 8163421 +2279/20000 train_loss: 2.6438 train_time: 3.7m tok/s: 8161773 +2280/20000 train_loss: 2.7333 train_time: 3.7m tok/s: 8160174 +2281/20000 train_loss: 2.5468 train_time: 3.7m tok/s: 8158631 +2282/20000 train_loss: 2.7350 train_time: 3.7m tok/s: 8157035 +2283/20000 train_loss: 2.9784 train_time: 3.7m tok/s: 8155391 +2284/20000 train_loss: 2.4906 train_time: 3.7m tok/s: 8153770 +2285/20000 train_loss: 2.5307 train_time: 3.7m tok/s: 8152161 +2286/20000 train_loss: 2.4515 train_time: 3.7m tok/s: 8150466 +2287/20000 train_loss: 2.4564 train_time: 3.7m tok/s: 8148872 +2288/20000 train_loss: 2.6521 train_time: 3.7m tok/s: 8147245 +2289/20000 train_loss: 2.6662 train_time: 3.7m tok/s: 8145673 +2290/20000 train_loss: 2.6462 train_time: 3.7m tok/s: 8144031 +2291/20000 train_loss: 2.5393 train_time: 3.7m tok/s: 8142465 +2292/20000 train_loss: 2.4847 train_time: 3.7m tok/s: 8140903 +2293/20000 train_loss: 2.5571 train_time: 3.7m tok/s: 8139344 +2294/20000 train_loss: 2.4627 train_time: 3.7m tok/s: 8137785 +2295/20000 train_loss: 2.6678 train_time: 3.7m tok/s: 8136254 +2296/20000 train_loss: 2.5201 train_time: 3.7m tok/s: 8134717 +2297/20000 train_loss: 2.6251 train_time: 3.7m tok/s: 8133111 +2298/20000 train_loss: 2.5025 train_time: 3.7m tok/s: 8131529 +2299/20000 train_loss: 2.6041 train_time: 3.7m tok/s: 8129990 +2300/20000 train_loss: 2.6289 train_time: 3.7m tok/s: 8128370 +2301/20000 train_loss: 2.3991 train_time: 3.7m tok/s: 8126795 +2302/20000 train_loss: 2.5777 train_time: 3.7m tok/s: 8125223 +2303/20000 train_loss: 2.5044 train_time: 3.7m tok/s: 8123683 +2304/20000 train_loss: 2.5927 train_time: 3.7m tok/s: 8122079 +2305/20000 train_loss: 2.4465 train_time: 3.7m tok/s: 8120399 +2306/20000 train_loss: 2.6724 train_time: 3.7m tok/s: 8118876 +2307/20000 train_loss: 2.6499 train_time: 3.7m tok/s: 8117378 +2308/20000 train_loss: 2.5860 train_time: 3.7m tok/s: 8115881 +2309/20000 train_loss: 2.5462 train_time: 3.7m tok/s: 8114276 +2310/20000 train_loss: 2.6255 train_time: 3.7m tok/s: 8112748 +2311/20000 train_loss: 2.6200 train_time: 3.7m tok/s: 8111209 +2312/20000 train_loss: 2.6413 train_time: 3.7m tok/s: 8109619 +2313/20000 train_loss: 2.6206 train_time: 3.7m tok/s: 8108100 +2314/20000 train_loss: 2.4431 train_time: 3.7m tok/s: 8106568 +2315/20000 train_loss: 2.4094 train_time: 3.7m tok/s: 8105011 +2316/20000 train_loss: 2.3702 train_time: 3.7m tok/s: 8103457 +2317/20000 train_loss: 2.7526 train_time: 3.7m tok/s: 8101770 +2318/20000 train_loss: 2.6424 train_time: 3.8m tok/s: 8100269 +2319/20000 train_loss: 2.4550 train_time: 3.8m tok/s: 8098656 +2320/20000 train_loss: 2.6224 train_time: 3.8m tok/s: 8097146 +2321/20000 train_loss: 2.6036 train_time: 3.8m tok/s: 8095582 +2322/20000 train_loss: 2.4649 train_time: 3.8m tok/s: 8094085 +2323/20000 train_loss: 2.6379 train_time: 3.8m tok/s: 8092578 +2324/20000 train_loss: 2.5787 train_time: 3.8m tok/s: 8090992 +2325/20000 train_loss: 2.5977 train_time: 3.8m tok/s: 8089396 +2326/20000 train_loss: 2.6048 train_time: 3.8m tok/s: 8087868 +2327/20000 train_loss: 2.5806 train_time: 3.8m tok/s: 8086352 +2328/20000 train_loss: 2.5341 train_time: 3.8m tok/s: 8084814 +2329/20000 train_loss: 2.4154 train_time: 3.8m tok/s: 8083335 +2330/20000 train_loss: 2.6747 train_time: 3.8m tok/s: 8081828 +2331/20000 train_loss: 2.5815 train_time: 3.8m tok/s: 8080192 +2332/20000 train_loss: 2.4040 train_time: 3.8m tok/s: 8078678 +2333/20000 train_loss: 2.6210 train_time: 3.8m tok/s: 8077147 +2334/20000 train_loss: 2.2811 train_time: 3.8m tok/s: 8075588 +2335/20000 train_loss: 2.5501 train_time: 3.8m tok/s: 8074053 +2336/20000 train_loss: 2.6287 train_time: 3.8m tok/s: 8072553 +2337/20000 train_loss: 2.6640 train_time: 3.8m tok/s: 8071086 +2338/20000 train_loss: 2.5381 train_time: 3.8m tok/s: 8069622 +2339/20000 train_loss: 2.6183 train_time: 3.8m tok/s: 8068135 +2340/20000 train_loss: 2.5772 train_time: 3.8m tok/s: 8066673 +2341/20000 train_loss: 2.5597 train_time: 3.8m tok/s: 8065231 +2342/20000 train_loss: 2.5036 train_time: 3.8m tok/s: 8063737 +2343/20000 train_loss: 2.4367 train_time: 3.8m tok/s: 8062273 +2344/20000 train_loss: 2.7079 train_time: 3.8m tok/s: 8060789 +2345/20000 train_loss: 3.0479 train_time: 3.8m tok/s: 8059290 +2346/20000 train_loss: 2.5350 train_time: 3.8m tok/s: 8057751 +2347/20000 train_loss: 2.5320 train_time: 3.8m tok/s: 8056284 +2348/20000 train_loss: 2.7238 train_time: 3.8m tok/s: 8054802 +2349/20000 train_loss: 2.5967 train_time: 3.8m tok/s: 8053331 +2350/20000 train_loss: 2.5860 train_time: 3.8m tok/s: 8051884 +2351/20000 train_loss: 2.5691 train_time: 3.8m tok/s: 8050422 +2352/20000 train_loss: 2.6004 train_time: 3.8m tok/s: 8048972 +2353/20000 train_loss: 2.4982 train_time: 3.8m tok/s: 8047410 +2354/20000 train_loss: 2.5507 train_time: 3.8m tok/s: 8045898 +2355/20000 train_loss: 2.5140 train_time: 3.8m tok/s: 8044468 +2356/20000 train_loss: 2.5698 train_time: 3.8m tok/s: 8042988 +2357/20000 train_loss: 2.5216 train_time: 3.8m tok/s: 8041521 +2358/20000 train_loss: 2.5294 train_time: 3.8m tok/s: 8040106 +2359/20000 train_loss: 2.5072 train_time: 3.8m tok/s: 8038636 +2360/20000 train_loss: 2.5844 train_time: 3.8m tok/s: 8037184 +2361/20000 train_loss: 2.5984 train_time: 3.9m tok/s: 8035713 +2362/20000 train_loss: 2.5276 train_time: 3.9m tok/s: 8034178 +2363/20000 train_loss: 2.5469 train_time: 3.9m tok/s: 8032757 +2364/20000 train_loss: 2.6624 train_time: 3.9m tok/s: 8031266 +2365/20000 train_loss: 2.5611 train_time: 3.9m tok/s: 8029785 +2366/20000 train_loss: 2.6251 train_time: 3.9m tok/s: 8028310 +2367/20000 train_loss: 2.5354 train_time: 3.9m tok/s: 8026895 +2368/20000 train_loss: 2.6896 train_time: 3.9m tok/s: 8025507 +2369/20000 train_loss: 2.5222 train_time: 3.9m tok/s: 8024049 +2370/20000 train_loss: 2.6102 train_time: 3.9m tok/s: 8022629 +2371/20000 train_loss: 2.5931 train_time: 3.9m tok/s: 8021192 +2372/20000 train_loss: 2.6300 train_time: 3.9m tok/s: 8019726 +2373/20000 train_loss: 2.4970 train_time: 3.9m tok/s: 8018281 +2374/20000 train_loss: 2.5630 train_time: 3.9m tok/s: 8016826 +2375/20000 train_loss: 2.5555 train_time: 3.9m tok/s: 8015368 +2376/20000 train_loss: 2.4994 train_time: 3.9m tok/s: 8013974 +2377/20000 train_loss: 2.4223 train_time: 3.9m tok/s: 8012518 +2378/20000 train_loss: 2.5156 train_time: 3.9m tok/s: 8011093 +2379/20000 train_loss: 2.8998 train_time: 3.9m tok/s: 8009619 +2380/20000 train_loss: 2.5050 train_time: 3.9m tok/s: 8008181 +2381/20000 train_loss: 2.6668 train_time: 3.9m tok/s: 8006777 +2382/20000 train_loss: 2.4625 train_time: 3.9m tok/s: 8005359 +2383/20000 train_loss: 2.6508 train_time: 3.9m tok/s: 8003958 +2384/20000 train_loss: 2.6595 train_time: 3.9m tok/s: 8002557 +2385/20000 train_loss: 2.6759 train_time: 3.9m tok/s: 8001065 +2386/20000 train_loss: 2.6363 train_time: 3.9m tok/s: 7999617 +2387/20000 train_loss: 2.4936 train_time: 3.9m tok/s: 7998166 +2388/20000 train_loss: 2.9512 train_time: 3.9m tok/s: 7996627 +2389/20000 train_loss: 2.3638 train_time: 3.9m tok/s: 7995065 +2390/20000 train_loss: 2.5952 train_time: 3.9m tok/s: 7993670 +2391/20000 train_loss: 2.5211 train_time: 3.9m tok/s: 7992243 +2392/20000 train_loss: 2.6423 train_time: 3.9m tok/s: 7990858 +2393/20000 train_loss: 2.5467 train_time: 3.9m tok/s: 7989511 +2394/20000 train_loss: 2.6349 train_time: 3.9m tok/s: 7988099 +2395/20000 train_loss: 2.6385 train_time: 3.9m tok/s: 7986719 +2396/20000 train_loss: 2.6105 train_time: 3.9m tok/s: 7985369 +2397/20000 train_loss: 2.6901 train_time: 3.9m tok/s: 7983946 +2398/20000 train_loss: 2.5173 train_time: 3.9m tok/s: 7982533 +2399/20000 train_loss: 2.5248 train_time: 3.9m tok/s: 7981184 +2400/20000 train_loss: 2.5277 train_time: 3.9m tok/s: 7979726 +2401/20000 train_loss: 2.6045 train_time: 3.9m tok/s: 7978372 +2402/20000 train_loss: 2.5078 train_time: 3.9m tok/s: 7977001 +2403/20000 train_loss: 2.8803 train_time: 3.9m tok/s: 7975528 +2404/20000 train_loss: 2.5463 train_time: 4.0m tok/s: 7974198 +2405/20000 train_loss: 2.4864 train_time: 4.0m tok/s: 7972762 +2406/20000 train_loss: 2.5775 train_time: 4.0m tok/s: 7971415 +2407/20000 train_loss: 2.5799 train_time: 4.0m tok/s: 7970102 +2408/20000 train_loss: 2.6620 train_time: 4.0m tok/s: 7968760 +2409/20000 train_loss: 2.5926 train_time: 4.0m tok/s: 7967374 +2410/20000 train_loss: 2.5749 train_time: 4.0m tok/s: 7965976 +2411/20000 train_loss: 2.5366 train_time: 4.0m tok/s: 7964628 +2412/20000 train_loss: 2.6524 train_time: 4.0m tok/s: 7963201 +2413/20000 train_loss: 2.4946 train_time: 4.0m tok/s: 7961809 +2414/20000 train_loss: 2.5640 train_time: 4.0m tok/s: 7960464 +2415/20000 train_loss: 2.5511 train_time: 4.0m tok/s: 7959100 +2416/20000 train_loss: 2.5799 train_time: 4.0m tok/s: 7957711 +2417/20000 train_loss: 2.5487 train_time: 4.0m tok/s: 7956315 +2418/20000 train_loss: 2.5110 train_time: 4.0m tok/s: 7954974 +2419/20000 train_loss: 2.5565 train_time: 4.0m tok/s: 7953591 +2420/20000 train_loss: 2.5953 train_time: 4.0m tok/s: 7952203 +2421/20000 train_loss: 2.6255 train_time: 4.0m tok/s: 7950864 +2422/20000 train_loss: 2.6074 train_time: 4.0m tok/s: 7949551 +2423/20000 train_loss: 2.5042 train_time: 4.0m tok/s: 7948226 +2424/20000 train_loss: 2.6236 train_time: 4.0m tok/s: 7946825 +2425/20000 train_loss: 2.6082 train_time: 4.0m tok/s: 7945420 +2426/20000 train_loss: 2.5413 train_time: 4.0m tok/s: 7944075 +2427/20000 train_loss: 2.4195 train_time: 4.0m tok/s: 7942721 +2428/20000 train_loss: 2.5229 train_time: 4.0m tok/s: 7941302 +2429/20000 train_loss: 2.4959 train_time: 4.0m tok/s: 7939993 +2430/20000 train_loss: 2.4671 train_time: 4.0m tok/s: 7938644 +2431/20000 train_loss: 2.5879 train_time: 4.0m tok/s: 7937203 +2432/20000 train_loss: 2.5449 train_time: 4.0m tok/s: 7935898 +2433/20000 train_loss: 2.6800 train_time: 4.0m tok/s: 7934552 +2434/20000 train_loss: 2.4952 train_time: 4.0m tok/s: 7933225 +2435/20000 train_loss: 2.6794 train_time: 4.0m tok/s: 7931831 +2436/20000 train_loss: 2.5359 train_time: 4.0m tok/s: 7930499 +2437/20000 train_loss: 2.5814 train_time: 4.0m tok/s: 7929182 +2438/20000 train_loss: 2.5341 train_time: 4.0m tok/s: 7927837 +2439/20000 train_loss: 2.5293 train_time: 4.0m tok/s: 7926504 +2440/20000 train_loss: 2.5030 train_time: 4.0m tok/s: 7925190 +2441/20000 train_loss: 2.5396 train_time: 4.0m tok/s: 7923862 +2442/20000 train_loss: 2.5205 train_time: 4.0m tok/s: 7922566 +2443/20000 train_loss: 2.6422 train_time: 4.0m tok/s: 7921229 +2444/20000 train_loss: 2.5891 train_time: 4.0m tok/s: 7919900 +2445/20000 train_loss: 2.5032 train_time: 4.0m tok/s: 7918622 +2446/20000 train_loss: 2.7118 train_time: 4.0m tok/s: 7917299 +2447/20000 train_loss: 2.7300 train_time: 4.1m tok/s: 7916000 +2448/20000 train_loss: 2.5911 train_time: 4.1m tok/s: 7914676 +2449/20000 train_loss: 2.5031 train_time: 4.1m tok/s: 7913332 +2450/20000 train_loss: 2.5583 train_time: 4.1m tok/s: 7912003 +2451/20000 train_loss: 2.5457 train_time: 4.1m tok/s: 7910647 +2452/20000 train_loss: 2.5731 train_time: 4.1m tok/s: 7909303 +2453/20000 train_loss: 2.4383 train_time: 4.1m tok/s: 7908020 +2454/20000 train_loss: 2.4550 train_time: 4.1m tok/s: 7906729 +2455/20000 train_loss: 2.5527 train_time: 4.1m tok/s: 7905383 +2456/20000 train_loss: 2.5561 train_time: 4.1m tok/s: 7904068 +2457/20000 train_loss: 2.6250 train_time: 4.1m tok/s: 7902808 +2458/20000 train_loss: 2.7191 train_time: 4.1m tok/s: 7901465 +2459/20000 train_loss: 2.5697 train_time: 4.1m tok/s: 7900176 +2460/20000 train_loss: 2.5892 train_time: 4.1m tok/s: 7898897 +2461/20000 train_loss: 2.6683 train_time: 4.1m tok/s: 7897602 +2462/20000 train_loss: 2.5413 train_time: 4.1m tok/s: 7896301 +2463/20000 train_loss: 2.6014 train_time: 4.1m tok/s: 7895027 +2464/20000 train_loss: 2.5239 train_time: 4.1m tok/s: 7893666 +2465/20000 train_loss: 2.6176 train_time: 4.1m tok/s: 7892415 +2466/20000 train_loss: 2.3301 train_time: 4.1m tok/s: 7891132 +2467/20000 train_loss: 2.5976 train_time: 4.1m tok/s: 7889808 +2468/20000 train_loss: 2.4819 train_time: 4.1m tok/s: 7888520 +2469/20000 train_loss: 2.5827 train_time: 4.1m tok/s: 7887190 +2470/20000 train_loss: 2.6107 train_time: 4.1m tok/s: 7885944 +2471/20000 train_loss: 2.5842 train_time: 4.1m tok/s: 7884639 +2472/20000 train_loss: 2.6800 train_time: 4.1m tok/s: 7883343 +2473/20000 train_loss: 2.5548 train_time: 4.1m tok/s: 7882083 +2474/20000 train_loss: 2.8354 train_time: 4.1m tok/s: 7880678 +2475/20000 train_loss: 2.6920 train_time: 4.1m tok/s: 7879361 +2476/20000 train_loss: 2.5740 train_time: 4.1m tok/s: 7878111 +2477/20000 train_loss: 2.5005 train_time: 4.1m tok/s: 7876875 +2478/20000 train_loss: 2.5327 train_time: 4.1m tok/s: 7875617 +2479/20000 train_loss: 2.6428 train_time: 4.1m tok/s: 7874382 +2480/20000 train_loss: 2.5813 train_time: 4.1m tok/s: 7873091 +2481/20000 train_loss: 2.4675 train_time: 4.1m tok/s: 7871795 +2482/20000 train_loss: 2.6242 train_time: 4.1m tok/s: 7870519 +2483/20000 train_loss: 2.5796 train_time: 4.1m tok/s: 7869317 +2484/20000 train_loss: 2.5457 train_time: 4.1m tok/s: 7868027 +2485/20000 train_loss: 2.4922 train_time: 4.1m tok/s: 7866773 +2486/20000 train_loss: 2.5805 train_time: 4.1m tok/s: 7865475 +2487/20000 train_loss: 2.6040 train_time: 4.1m tok/s: 7864200 +2488/20000 train_loss: 2.5716 train_time: 4.1m tok/s: 7862951 +2489/20000 train_loss: 2.4657 train_time: 4.1m tok/s: 7861631 +2490/20000 train_loss: 2.6372 train_time: 4.2m tok/s: 7860395 +2491/20000 train_loss: 2.5811 train_time: 4.2m tok/s: 7859129 +2492/20000 train_loss: 2.5440 train_time: 4.2m tok/s: 7857855 +2493/20000 train_loss: 2.5750 train_time: 4.2m tok/s: 7856613 +2494/20000 train_loss: 2.4504 train_time: 4.2m tok/s: 7855310 +2495/20000 train_loss: 2.4983 train_time: 4.2m tok/s: 7854066 +2496/20000 train_loss: 2.6114 train_time: 4.2m tok/s: 7852843 +2497/20000 train_loss: 2.5660 train_time: 4.2m tok/s: 7851585 +2498/20000 train_loss: 2.5641 train_time: 4.2m tok/s: 7850386 +2499/20000 train_loss: 2.6091 train_time: 4.2m tok/s: 7849160 +2500/20000 train_loss: 2.6721 train_time: 4.2m tok/s: 7847881 +2501/20000 train_loss: 2.5633 train_time: 4.2m tok/s: 7846634 +2502/20000 train_loss: 2.4149 train_time: 4.2m tok/s: 7845405 +2503/20000 train_loss: 2.5237 train_time: 4.2m tok/s: 7844176 +2504/20000 train_loss: 2.6151 train_time: 4.2m tok/s: 7842898 +2505/20000 train_loss: 2.5276 train_time: 4.2m tok/s: 7841612 +2506/20000 train_loss: 2.5769 train_time: 4.2m tok/s: 7840335 +2507/20000 train_loss: 2.4355 train_time: 4.2m tok/s: 7839128 +2508/20000 train_loss: 2.5869 train_time: 4.2m tok/s: 7837917 +2509/20000 train_loss: 2.6272 train_time: 4.2m tok/s: 7836711 +2510/20000 train_loss: 2.5320 train_time: 4.2m tok/s: 7835403 +2511/20000 train_loss: 2.5835 train_time: 4.2m tok/s: 7834072 +2512/20000 train_loss: 2.6223 train_time: 4.2m tok/s: 7832841 +2513/20000 train_loss: 2.4795 train_time: 4.2m tok/s: 7831628 +2514/20000 train_loss: 2.5734 train_time: 4.2m tok/s: 7830423 +2515/20000 train_loss: 2.6093 train_time: 4.2m tok/s: 7829249 +2516/20000 train_loss: 2.5989 train_time: 4.2m tok/s: 7828008 +2517/20000 train_loss: 2.4406 train_time: 4.2m tok/s: 7826777 +2518/20000 train_loss: 2.5424 train_time: 4.2m tok/s: 7825542 +2519/20000 train_loss: 2.5843 train_time: 4.2m tok/s: 7824319 +2520/20000 train_loss: 2.5429 train_time: 4.2m tok/s: 7823136 +2521/20000 train_loss: 2.6206 train_time: 4.2m tok/s: 7821978 +2522/20000 train_loss: 2.6235 train_time: 4.2m tok/s: 7820798 +2523/20000 train_loss: 2.5596 train_time: 4.2m tok/s: 7819588 +2524/20000 train_loss: 2.5436 train_time: 4.2m tok/s: 7818368 +2525/20000 train_loss: 2.4467 train_time: 4.2m tok/s: 7817149 +2526/20000 train_loss: 2.5403 train_time: 4.2m tok/s: 7815933 +2527/20000 train_loss: 2.5545 train_time: 4.2m tok/s: 7814650 +2528/20000 train_loss: 2.5993 train_time: 4.2m tok/s: 7813458 +2529/20000 train_loss: 2.5689 train_time: 4.2m tok/s: 7812279 +2530/20000 train_loss: 2.4891 train_time: 4.2m tok/s: 7811067 +2531/20000 train_loss: 2.4053 train_time: 4.2m tok/s: 7809824 +2532/20000 train_loss: 2.5252 train_time: 4.3m tok/s: 7808512 +2533/20000 train_loss: 2.5445 train_time: 4.3m tok/s: 7807325 +2534/20000 train_loss: 2.4602 train_time: 4.3m tok/s: 7806186 +2535/20000 train_loss: 2.5744 train_time: 4.3m tok/s: 7805018 +2536/20000 train_loss: 2.5687 train_time: 4.3m tok/s: 7803791 +2537/20000 train_loss: 2.4723 train_time: 4.3m tok/s: 7802625 +2538/20000 train_loss: 2.5922 train_time: 4.3m tok/s: 7801427 +2539/20000 train_loss: 2.7938 train_time: 4.3m tok/s: 7800237 +2540/20000 train_loss: 2.5502 train_time: 4.3m tok/s: 7799060 +2541/20000 train_loss: 2.5343 train_time: 4.3m tok/s: 7797938 +2542/20000 train_loss: 2.5524 train_time: 4.3m tok/s: 7796661 +2543/20000 train_loss: 2.6376 train_time: 4.3m tok/s: 7795481 +2544/20000 train_loss: 2.6612 train_time: 4.3m tok/s: 7794277 +2545/20000 train_loss: 2.4944 train_time: 4.3m tok/s: 7793060 +2546/20000 train_loss: 2.5390 train_time: 4.3m tok/s: 7791905 +2547/20000 train_loss: 2.5139 train_time: 4.3m tok/s: 7790699 +2548/20000 train_loss: 2.7853 train_time: 4.3m tok/s: 7789508 +2549/20000 train_loss: 2.5559 train_time: 4.3m tok/s: 7788323 +2550/20000 train_loss: 2.8280 train_time: 4.3m tok/s: 7787171 +2551/20000 train_loss: 2.5324 train_time: 4.3m tok/s: 7786037 +2552/20000 train_loss: 2.7464 train_time: 4.3m tok/s: 7784887 +2553/20000 train_loss: 2.5804 train_time: 4.3m tok/s: 7783644 +2554/20000 train_loss: 2.4682 train_time: 4.3m tok/s: 7782484 +2555/20000 train_loss: 2.5551 train_time: 4.3m tok/s: 7781368 +2556/20000 train_loss: 2.5560 train_time: 4.3m tok/s: 7780129 +2557/20000 train_loss: 2.5085 train_time: 4.3m tok/s: 7778981 +2558/20000 train_loss: 2.6169 train_time: 4.3m tok/s: 7777829 +2559/20000 train_loss: 2.4341 train_time: 4.3m tok/s: 7776636 +2560/20000 train_loss: 2.4488 train_time: 4.3m tok/s: 7775387 +2561/20000 train_loss: 2.5331 train_time: 4.3m tok/s: 7774229 +2562/20000 train_loss: 2.4744 train_time: 4.3m tok/s: 7773144 +2563/20000 train_loss: 2.4462 train_time: 4.3m tok/s: 7771994 +2564/20000 train_loss: 2.4335 train_time: 4.3m tok/s: 7770820 +2565/20000 train_loss: 2.5432 train_time: 4.3m tok/s: 7769668 +2566/20000 train_loss: 2.5634 train_time: 4.3m tok/s: 7768503 +2567/20000 train_loss: 2.5830 train_time: 4.3m tok/s: 7767372 +2568/20000 train_loss: 2.6175 train_time: 4.3m tok/s: 7766219 +2569/20000 train_loss: 2.6583 train_time: 4.3m tok/s: 7765084 +2570/20000 train_loss: 2.5526 train_time: 4.3m tok/s: 7763920 +2571/20000 train_loss: 2.6068 train_time: 4.3m tok/s: 7762797 +2572/20000 train_loss: 2.4349 train_time: 4.3m tok/s: 7761626 +2573/20000 train_loss: 2.4984 train_time: 4.3m tok/s: 7760433 +2574/20000 train_loss: 2.6835 train_time: 4.3m tok/s: 7759268 +2575/20000 train_loss: 2.5855 train_time: 4.4m tok/s: 7758142 +2576/20000 train_loss: 2.5133 train_time: 4.4m tok/s: 7756977 +2577/20000 train_loss: 2.4820 train_time: 4.4m tok/s: 7755834 +2578/20000 train_loss: 2.4471 train_time: 4.4m tok/s: 7754730 +2579/20000 train_loss: 2.5388 train_time: 4.4m tok/s: 7753611 +2580/20000 train_loss: 2.4652 train_time: 4.4m tok/s: 7752467 +2581/20000 train_loss: 2.3450 train_time: 4.4m tok/s: 7751237 +2582/20000 train_loss: 2.5650 train_time: 4.4m tok/s: 7750095 +2583/20000 train_loss: 2.5925 train_time: 4.4m tok/s: 7748977 +2584/20000 train_loss: 2.5932 train_time: 4.4m tok/s: 7747899 +2585/20000 train_loss: 2.4862 train_time: 4.4m tok/s: 7746736 +2586/20000 train_loss: 2.5117 train_time: 4.4m tok/s: 7745571 +2587/20000 train_loss: 2.5485 train_time: 4.4m tok/s: 7744470 +2588/20000 train_loss: 2.6063 train_time: 4.4m tok/s: 7743335 +2589/20000 train_loss: 2.4818 train_time: 4.4m tok/s: 7742173 +2590/20000 train_loss: 2.5146 train_time: 4.4m tok/s: 7741057 +2591/20000 train_loss: 2.4733 train_time: 4.4m tok/s: 7739969 +2592/20000 train_loss: 2.4318 train_time: 4.4m tok/s: 7738856 +2593/20000 train_loss: 2.5097 train_time: 4.4m tok/s: 7737721 +2594/20000 train_loss: 2.4319 train_time: 4.4m tok/s: 7736603 +2595/20000 train_loss: 2.6088 train_time: 4.4m tok/s: 7735381 +2596/20000 train_loss: 3.0920 train_time: 4.4m tok/s: 7734263 +2597/20000 train_loss: 2.4101 train_time: 4.4m tok/s: 7733180 +2598/20000 train_loss: 2.5104 train_time: 4.4m tok/s: 7732073 +2599/20000 train_loss: 2.6211 train_time: 4.4m tok/s: 7730991 +2600/20000 train_loss: 2.5808 train_time: 4.4m tok/s: 7729778 +2601/20000 train_loss: 2.4974 train_time: 4.4m tok/s: 7728707 +2602/20000 train_loss: 2.7646 train_time: 4.4m tok/s: 7727565 +2603/20000 train_loss: 2.5218 train_time: 4.4m tok/s: 7726412 +2604/20000 train_loss: 2.5253 train_time: 4.4m tok/s: 7725311 +2605/20000 train_loss: 2.6715 train_time: 4.4m tok/s: 7724151 +2606/20000 train_loss: 2.4213 train_time: 4.4m tok/s: 7723055 +2607/20000 train_loss: 2.4727 train_time: 4.4m tok/s: 7721895 +2608/20000 train_loss: 2.5277 train_time: 4.4m tok/s: 7720809 +2609/20000 train_loss: 2.4881 train_time: 4.4m tok/s: 7719707 +2610/20000 train_loss: 2.4760 train_time: 4.4m tok/s: 7718612 +2611/20000 train_loss: 2.6206 train_time: 4.4m tok/s: 7717506 +2612/20000 train_loss: 2.6816 train_time: 4.4m tok/s: 7716350 +2613/20000 train_loss: 2.5571 train_time: 4.4m tok/s: 7715281 +2614/20000 train_loss: 2.5407 train_time: 4.4m tok/s: 7714255 +2615/20000 train_loss: 2.6715 train_time: 4.4m tok/s: 7713197 +2616/20000 train_loss: 2.5835 train_time: 4.4m tok/s: 7712092 +2617/20000 train_loss: 2.5656 train_time: 4.4m tok/s: 7710988 +2618/20000 train_loss: 2.5697 train_time: 4.5m tok/s: 7709904 +2619/20000 train_loss: 2.4588 train_time: 4.5m tok/s: 7708798 +2620/20000 train_loss: 2.4258 train_time: 4.5m tok/s: 7707667 +2621/20000 train_loss: 2.5021 train_time: 4.5m tok/s: 7706592 +2622/20000 train_loss: 2.4932 train_time: 4.5m tok/s: 7705522 +2623/20000 train_loss: 2.5064 train_time: 4.5m tok/s: 7704392 +2624/20000 train_loss: 2.3494 train_time: 4.5m tok/s: 7703312 +2625/20000 train_loss: 2.6324 train_time: 4.5m tok/s: 7702223 +2626/20000 train_loss: 2.3554 train_time: 4.5m tok/s: 7701133 +2627/20000 train_loss: 2.4657 train_time: 4.5m tok/s: 7700054 +2628/20000 train_loss: 2.6628 train_time: 4.5m tok/s: 7698971 +2629/20000 train_loss: 2.5600 train_time: 4.5m tok/s: 7697935 +2630/20000 train_loss: 2.6181 train_time: 4.5m tok/s: 7696814 +2631/20000 train_loss: 2.5922 train_time: 4.5m tok/s: 7695708 +2632/20000 train_loss: 2.6308 train_time: 4.5m tok/s: 7694665 +2633/20000 train_loss: 2.4802 train_time: 4.5m tok/s: 7693555 +2634/20000 train_loss: 2.5659 train_time: 4.5m tok/s: 7692456 +2635/20000 train_loss: 2.4958 train_time: 4.5m tok/s: 7691382 +2636/20000 train_loss: 2.5453 train_time: 4.5m tok/s: 7690362 +2637/20000 train_loss: 2.4571 train_time: 4.5m tok/s: 7689283 +2638/20000 train_loss: 2.5383 train_time: 4.5m tok/s: 7688226 +2639/20000 train_loss: 2.2712 train_time: 4.5m tok/s: 7687110 +2640/20000 train_loss: 2.5467 train_time: 4.5m tok/s: 7686050 +2641/20000 train_loss: 2.5931 train_time: 4.5m tok/s: 7684913 +2642/20000 train_loss: 2.6449 train_time: 4.5m tok/s: 7683888 +2643/20000 train_loss: 2.5624 train_time: 4.5m tok/s: 7682768 +2644/20000 train_loss: 2.5861 train_time: 4.5m tok/s: 7681743 +2645/20000 train_loss: 2.5328 train_time: 4.5m tok/s: 7680638 +2646/20000 train_loss: 2.5688 train_time: 4.5m tok/s: 7679575 +2647/20000 train_loss: 2.6612 train_time: 4.5m tok/s: 7678507 +2648/20000 train_loss: 2.5220 train_time: 4.5m tok/s: 7677448 +2649/20000 train_loss: 2.5466 train_time: 4.5m tok/s: 7676361 +2650/20000 train_loss: 2.4688 train_time: 4.5m tok/s: 7675313 +2651/20000 train_loss: 2.4383 train_time: 4.5m tok/s: 7674239 +2652/20000 train_loss: 2.3579 train_time: 4.5m tok/s: 7673171 +2653/20000 train_loss: 2.6546 train_time: 4.5m tok/s: 7672095 +2654/20000 train_loss: 2.2730 train_time: 4.5m tok/s: 7670984 +2655/20000 train_loss: 2.9495 train_time: 4.5m tok/s: 7669820 +2656/20000 train_loss: 2.4532 train_time: 4.5m tok/s: 7668773 +2657/20000 train_loss: 2.4474 train_time: 4.5m tok/s: 7667742 +2658/20000 train_loss: 2.6203 train_time: 4.5m tok/s: 7666681 +2659/20000 train_loss: 2.5341 train_time: 4.5m tok/s: 7665617 +2660/20000 train_loss: 2.5962 train_time: 4.5m tok/s: 7664566 +2661/20000 train_loss: 2.5478 train_time: 4.6m tok/s: 7663557 +2662/20000 train_loss: 2.3368 train_time: 4.6m tok/s: 7662509 +2663/20000 train_loss: 2.7269 train_time: 4.6m tok/s: 7661442 +2664/20000 train_loss: 2.5373 train_time: 4.6m tok/s: 7660347 +2665/20000 train_loss: 2.4948 train_time: 4.6m tok/s: 7659296 +2666/20000 train_loss: 2.4076 train_time: 4.6m tok/s: 7658272 +2667/20000 train_loss: 2.2946 train_time: 4.6m tok/s: 7657251 +2668/20000 train_loss: 2.5781 train_time: 4.6m tok/s: 7656211 +2669/20000 train_loss: 2.4581 train_time: 4.6m tok/s: 7655221 +2670/20000 train_loss: 2.5704 train_time: 4.6m tok/s: 7654225 +2671/20000 train_loss: 2.6729 train_time: 4.6m tok/s: 7653237 +2672/20000 train_loss: 2.6146 train_time: 4.6m tok/s: 7652256 +2673/20000 train_loss: 2.5676 train_time: 4.6m tok/s: 7651266 +2674/20000 train_loss: 2.6272 train_time: 4.6m tok/s: 7650230 +2675/20000 train_loss: 2.5346 train_time: 4.6m tok/s: 7649206 +2676/20000 train_loss: 2.5240 train_time: 4.6m tok/s: 7648157 +2677/20000 train_loss: 2.4398 train_time: 4.6m tok/s: 7647111 +2678/20000 train_loss: 2.4773 train_time: 4.6m tok/s: 7646095 +2679/20000 train_loss: 2.3259 train_time: 4.6m tok/s: 7645070 +2680/20000 train_loss: 2.4385 train_time: 4.6m tok/s: 7644010 +2681/20000 train_loss: 2.4505 train_time: 4.6m tok/s: 7642992 +2682/20000 train_loss: 2.5416 train_time: 4.6m tok/s: 7642000 +2683/20000 train_loss: 2.4680 train_time: 4.6m tok/s: 7640995 +2684/20000 train_loss: 2.4799 train_time: 4.6m tok/s: 7639915 +2685/20000 train_loss: 2.8149 train_time: 4.6m tok/s: 7638870 +2686/20000 train_loss: 2.5223 train_time: 4.6m tok/s: 7637884 +2687/20000 train_loss: 2.5809 train_time: 4.6m tok/s: 7636870 +2688/20000 train_loss: 2.4575 train_time: 4.6m tok/s: 7635861 +2689/20000 train_loss: 2.5513 train_time: 4.6m tok/s: 7634833 +2690/20000 train_loss: 2.5512 train_time: 4.6m tok/s: 7633782 +2691/20000 train_loss: 2.5320 train_time: 4.6m tok/s: 7632766 +2692/20000 train_loss: 2.4669 train_time: 4.6m tok/s: 7631735 +2693/20000 train_loss: 2.4542 train_time: 4.6m tok/s: 7630685 +2694/20000 train_loss: 2.4774 train_time: 4.6m tok/s: 7629670 +2695/20000 train_loss: 2.5802 train_time: 4.6m tok/s: 7628625 +2696/20000 train_loss: 2.4791 train_time: 4.6m tok/s: 7627603 +2697/20000 train_loss: 2.5606 train_time: 4.6m tok/s: 7626609 +2698/20000 train_loss: 2.5760 train_time: 4.6m tok/s: 7625627 +2699/20000 train_loss: 2.4795 train_time: 4.6m tok/s: 7624658 +2700/20000 train_loss: 2.4635 train_time: 4.6m tok/s: 7623620 +2701/20000 train_loss: 2.5857 train_time: 4.6m tok/s: 7622620 +2702/20000 train_loss: 2.3709 train_time: 4.6m tok/s: 7621605 +2703/20000 train_loss: 2.5566 train_time: 4.6m tok/s: 7620662 +2704/20000 train_loss: 2.4531 train_time: 4.7m tok/s: 7619597 +2705/20000 train_loss: 2.5107 train_time: 4.7m tok/s: 7618610 +2706/20000 train_loss: 2.5173 train_time: 4.7m tok/s: 7617654 +2707/20000 train_loss: 2.6233 train_time: 4.7m tok/s: 7616619 +2708/20000 train_loss: 2.6539 train_time: 4.7m tok/s: 7615519 +2709/20000 train_loss: 2.5731 train_time: 4.7m tok/s: 7614559 +2710/20000 train_loss: 2.6276 train_time: 4.7m tok/s: 7613514 +2711/20000 train_loss: 2.7054 train_time: 4.7m tok/s: 7612546 +2712/20000 train_loss: 2.4933 train_time: 4.7m tok/s: 7611559 +2713/20000 train_loss: 2.5712 train_time: 4.7m tok/s: 7610559 +2714/20000 train_loss: 2.6236 train_time: 4.7m tok/s: 7609572 +2715/20000 train_loss: 2.4575 train_time: 4.7m tok/s: 7608583 +2716/20000 train_loss: 2.4222 train_time: 4.7m tok/s: 7607607 +2717/20000 train_loss: 2.5256 train_time: 4.7m tok/s: 7606615 +2718/20000 train_loss: 2.4507 train_time: 4.7m tok/s: 7605645 +2719/20000 train_loss: 2.4390 train_time: 4.7m tok/s: 7604699 +2720/20000 train_loss: 2.5693 train_time: 4.7m tok/s: 7603599 +2721/20000 train_loss: 2.4093 train_time: 4.7m tok/s: 7602569 +2722/20000 train_loss: 2.4874 train_time: 4.7m tok/s: 7601582 +2723/20000 train_loss: 2.4600 train_time: 4.7m tok/s: 7600649 +2724/20000 train_loss: 2.5234 train_time: 4.7m tok/s: 7599617 +2725/20000 train_loss: 2.6398 train_time: 4.7m tok/s: 7598625 +2726/20000 train_loss: 2.5041 train_time: 4.7m tok/s: 7597714 +2727/20000 train_loss: 2.5651 train_time: 4.7m tok/s: 7596793 +2728/20000 train_loss: 2.8970 train_time: 4.7m tok/s: 7595810 +2729/20000 train_loss: 2.7025 train_time: 4.7m tok/s: 7594785 +2730/20000 train_loss: 2.5376 train_time: 4.7m tok/s: 7593820 +2731/20000 train_loss: 2.6119 train_time: 4.7m tok/s: 7592880 +2732/20000 train_loss: 2.6286 train_time: 4.7m tok/s: 7591845 +2733/20000 train_loss: 2.5497 train_time: 4.7m tok/s: 7590861 +2734/20000 train_loss: 2.6255 train_time: 4.7m tok/s: 7589908 +2735/20000 train_loss: 2.4312 train_time: 4.7m tok/s: 7588970 +2736/20000 train_loss: 2.5528 train_time: 4.7m tok/s: 7587997 +2737/20000 train_loss: 2.4805 train_time: 4.7m tok/s: 7587006 +2738/20000 train_loss: 2.4185 train_time: 4.7m tok/s: 7586063 +2739/20000 train_loss: 2.5443 train_time: 4.7m tok/s: 7585069 +2740/20000 train_loss: 2.5599 train_time: 4.7m tok/s: 7584082 +2741/20000 train_loss: 2.5245 train_time: 4.7m tok/s: 7583136 +2742/20000 train_loss: 2.4811 train_time: 4.7m tok/s: 7582169 +2743/20000 train_loss: 2.5859 train_time: 4.7m tok/s: 7581194 +2744/20000 train_loss: 2.5912 train_time: 4.7m tok/s: 7580257 +2745/20000 train_loss: 2.6467 train_time: 4.7m tok/s: 7579285 +2746/20000 train_loss: 2.6209 train_time: 4.7m tok/s: 7578305 +2747/20000 train_loss: 2.4445 train_time: 4.8m tok/s: 7577366 +2748/20000 train_loss: 2.5163 train_time: 4.8m tok/s: 7576403 +2749/20000 train_loss: 2.5826 train_time: 4.8m tok/s: 7575443 +2750/20000 train_loss: 2.6471 train_time: 4.8m tok/s: 7574498 +2751/20000 train_loss: 2.6347 train_time: 4.8m tok/s: 7573499 +2752/20000 train_loss: 2.5050 train_time: 4.8m tok/s: 7572513 +2753/20000 train_loss: 2.4909 train_time: 4.8m tok/s: 7571571 +2754/20000 train_loss: 2.4583 train_time: 4.8m tok/s: 7570620 +2755/20000 train_loss: 2.5011 train_time: 4.8m tok/s: 7569667 +2756/20000 train_loss: 2.4799 train_time: 4.8m tok/s: 7568764 +2757/20000 train_loss: 2.4512 train_time: 4.8m tok/s: 7567794 +2758/20000 train_loss: 2.5955 train_time: 4.8m tok/s: 7566826 +2759/20000 train_loss: 2.4786 train_time: 4.8m tok/s: 7565853 +2760/20000 train_loss: 2.4133 train_time: 4.8m tok/s: 7564906 +2761/20000 train_loss: 2.6421 train_time: 4.8m tok/s: 7563920 +2762/20000 train_loss: 2.5430 train_time: 4.8m tok/s: 7562961 +2763/20000 train_loss: 2.5971 train_time: 4.8m tok/s: 7562063 +2764/20000 train_loss: 2.5691 train_time: 4.8m tok/s: 7561095 +2765/20000 train_loss: 2.5098 train_time: 4.8m tok/s: 7560134 +2766/20000 train_loss: 2.4626 train_time: 4.8m tok/s: 7559207 +2767/20000 train_loss: 2.5082 train_time: 4.8m tok/s: 7558243 +2768/20000 train_loss: 2.6431 train_time: 4.8m tok/s: 7557313 +2769/20000 train_loss: 2.5509 train_time: 4.8m tok/s: 7556323 +2770/20000 train_loss: 2.7006 train_time: 4.8m tok/s: 7555402 +2771/20000 train_loss: 2.5085 train_time: 4.8m tok/s: 7554455 +2772/20000 train_loss: 2.5344 train_time: 4.8m tok/s: 7553552 +2773/20000 train_loss: 2.5133 train_time: 4.8m tok/s: 7552596 +2774/20000 train_loss: 2.4703 train_time: 4.8m tok/s: 7551649 +2775/20000 train_loss: 2.4719 train_time: 4.8m tok/s: 7550750 +2776/20000 train_loss: 2.4181 train_time: 4.8m tok/s: 7549852 +2777/20000 train_loss: 2.4830 train_time: 4.8m tok/s: 7548897 +2778/20000 train_loss: 2.5296 train_time: 4.8m tok/s: 7547938 +2779/20000 train_loss: 2.4021 train_time: 4.8m tok/s: 7546973 +2780/20000 train_loss: 2.6585 train_time: 4.8m tok/s: 7546015 +2781/20000 train_loss: 2.6381 train_time: 4.8m tok/s: 7545068 +2782/20000 train_loss: 2.4623 train_time: 4.8m tok/s: 7544160 +2783/20000 train_loss: 2.6268 train_time: 4.8m tok/s: 7543249 +2784/20000 train_loss: 2.6523 train_time: 4.8m tok/s: 7542347 +2785/20000 train_loss: 2.5230 train_time: 4.8m tok/s: 7541431 +2786/20000 train_loss: 2.5267 train_time: 4.8m tok/s: 7540514 +2787/20000 train_loss: 2.5843 train_time: 4.8m tok/s: 7539613 +2788/20000 train_loss: 2.4281 train_time: 4.8m tok/s: 7538669 +2789/20000 train_loss: 2.5450 train_time: 4.8m tok/s: 7537776 +2790/20000 train_loss: 2.6076 train_time: 4.9m tok/s: 7536810 +2791/20000 train_loss: 2.3673 train_time: 4.9m tok/s: 7535817 +2792/20000 train_loss: 2.5218 train_time: 4.9m tok/s: 7534904 +2793/20000 train_loss: 2.5441 train_time: 4.9m tok/s: 7533992 +2794/20000 train_loss: 2.5052 train_time: 4.9m tok/s: 7533073 +2795/20000 train_loss: 2.3813 train_time: 4.9m tok/s: 7532158 +2796/20000 train_loss: 2.5163 train_time: 4.9m tok/s: 7531254 +2797/20000 train_loss: 2.5514 train_time: 4.9m tok/s: 7530337 +2798/20000 train_loss: 2.5183 train_time: 4.9m tok/s: 7529417 +2799/20000 train_loss: 2.7850 train_time: 4.9m tok/s: 7528504 +2800/20000 train_loss: 2.6321 train_time: 4.9m tok/s: 7527603 +2801/20000 train_loss: 2.5022 train_time: 4.9m tok/s: 7526711 +2802/20000 train_loss: 2.5169 train_time: 4.9m tok/s: 7525809 +2803/20000 train_loss: 2.5934 train_time: 4.9m tok/s: 7524852 +2804/20000 train_loss: 2.6347 train_time: 4.9m tok/s: 7523917 +2805/20000 train_loss: 2.4800 train_time: 4.9m tok/s: 7523050 +2806/20000 train_loss: 2.5071 train_time: 4.9m tok/s: 7522107 +2807/20000 train_loss: 2.6591 train_time: 4.9m tok/s: 7521185 +2808/20000 train_loss: 2.5892 train_time: 4.9m tok/s: 7520203 +2809/20000 train_loss: 2.4445 train_time: 4.9m tok/s: 7519285 +2810/20000 train_loss: 2.5282 train_time: 4.9m tok/s: 7518417 +2811/20000 train_loss: 2.5964 train_time: 4.9m tok/s: 7517494 +2812/20000 train_loss: 2.5962 train_time: 4.9m tok/s: 7516603 +2813/20000 train_loss: 2.3920 train_time: 4.9m tok/s: 7515706 +2814/20000 train_loss: 2.5384 train_time: 4.9m tok/s: 7514828 +2815/20000 train_loss: 2.6735 train_time: 4.9m tok/s: 7513949 +2816/20000 train_loss: 2.6361 train_time: 4.9m tok/s: 7513037 +2817/20000 train_loss: 2.5459 train_time: 4.9m tok/s: 7512118 +2818/20000 train_loss: 2.5633 train_time: 4.9m tok/s: 7511232 +2819/20000 train_loss: 2.4764 train_time: 4.9m tok/s: 7510377 +2820/20000 train_loss: 2.5001 train_time: 4.9m tok/s: 7509481 +2821/20000 train_loss: 2.4404 train_time: 4.9m tok/s: 7508569 +2822/20000 train_loss: 2.7942 train_time: 4.9m tok/s: 7507610 +2823/20000 train_loss: 2.6234 train_time: 4.9m tok/s: 7506711 +2824/20000 train_loss: 2.6702 train_time: 4.9m tok/s: 7505815 +2825/20000 train_loss: 2.4817 train_time: 4.9m tok/s: 7504922 +2826/20000 train_loss: 2.5658 train_time: 4.9m tok/s: 7504060 +2827/20000 train_loss: 2.4489 train_time: 4.9m tok/s: 7503198 +2828/20000 train_loss: 2.6232 train_time: 4.9m tok/s: 7502314 +2829/20000 train_loss: 2.4140 train_time: 4.9m tok/s: 7501448 +2830/20000 train_loss: 2.5097 train_time: 4.9m tok/s: 7500549 +2831/20000 train_loss: 2.7177 train_time: 4.9m tok/s: 7499659 +2832/20000 train_loss: 2.5260 train_time: 5.0m tok/s: 7498816 +2833/20000 train_loss: 2.6904 train_time: 5.0m tok/s: 7497945 +2834/20000 train_loss: 2.6334 train_time: 5.0m tok/s: 7497045 +2835/20000 train_loss: 2.5948 train_time: 5.0m tok/s: 7496146 +2836/20000 train_loss: 2.5005 train_time: 5.0m tok/s: 7495266 +2837/20000 train_loss: 2.5803 train_time: 5.0m tok/s: 7494384 +2838/20000 train_loss: 2.5482 train_time: 5.0m tok/s: 7493499 +2839/20000 train_loss: 2.5278 train_time: 5.0m tok/s: 7492612 +2840/20000 train_loss: 2.6128 train_time: 5.0m tok/s: 7491711 +2841/20000 train_loss: 2.5501 train_time: 5.0m tok/s: 7490856 +2842/20000 train_loss: 2.6271 train_time: 5.0m tok/s: 7489950 +2843/20000 train_loss: 2.4480 train_time: 5.0m tok/s: 7489076 +2844/20000 train_loss: 2.4383 train_time: 5.0m tok/s: 7488199 +2845/20000 train_loss: 2.5225 train_time: 5.0m tok/s: 7487359 +2846/20000 train_loss: 2.3911 train_time: 5.0m tok/s: 7486504 +2847/20000 train_loss: 2.4269 train_time: 5.0m tok/s: 7485588 +2848/20000 train_loss: 2.5601 train_time: 5.0m tok/s: 7484731 +2849/20000 train_loss: 2.6292 train_time: 5.0m tok/s: 7483833 +2850/20000 train_loss: 2.5697 train_time: 5.0m tok/s: 7482968 +2851/20000 train_loss: 2.7773 train_time: 5.0m tok/s: 7482099 +2852/20000 train_loss: 2.4619 train_time: 5.0m tok/s: 7481196 +2853/20000 train_loss: 2.5345 train_time: 5.0m tok/s: 7480332 +2854/20000 train_loss: 2.4549 train_time: 5.0m tok/s: 7479467 +2855/20000 train_loss: 2.6358 train_time: 5.0m tok/s: 7478592 +2856/20000 train_loss: 2.4699 train_time: 5.0m tok/s: 7477733 +2857/20000 train_loss: 2.5387 train_time: 5.0m tok/s: 7476907 +2858/20000 train_loss: 2.5087 train_time: 5.0m tok/s: 7476045 +2859/20000 train_loss: 3.1483 train_time: 5.0m tok/s: 7475089 +2860/20000 train_loss: 2.4926 train_time: 5.0m tok/s: 7474190 +2861/20000 train_loss: 2.4970 train_time: 5.0m tok/s: 7473352 +2862/20000 train_loss: 2.5140 train_time: 5.0m tok/s: 7472533 +2863/20000 train_loss: 2.3535 train_time: 5.0m tok/s: 7471671 +2864/20000 train_loss: 2.3840 train_time: 5.0m tok/s: 7470838 +2865/20000 train_loss: 2.6179 train_time: 5.0m tok/s: 7469988 +2866/20000 train_loss: 2.5247 train_time: 5.0m tok/s: 7469158 +2867/20000 train_loss: 2.3925 train_time: 5.0m tok/s: 7468257 +2868/20000 train_loss: 2.4325 train_time: 5.0m tok/s: 7467412 +2869/20000 train_loss: 2.5621 train_time: 5.0m tok/s: 7466550 +2870/20000 train_loss: 2.6573 train_time: 5.0m tok/s: 7465687 +2871/20000 train_loss: 2.4583 train_time: 5.0m tok/s: 7464885 +2872/20000 train_loss: 3.0349 train_time: 5.0m tok/s: 7463969 +2873/20000 train_loss: 2.4699 train_time: 5.0m tok/s: 7463021 +2874/20000 train_loss: 2.6147 train_time: 5.0m tok/s: 7462181 +2875/20000 train_loss: 2.5329 train_time: 5.1m tok/s: 7461336 +2876/20000 train_loss: 2.5894 train_time: 5.1m tok/s: 7460509 +2877/20000 train_loss: 2.5158 train_time: 5.1m tok/s: 7459627 +2878/20000 train_loss: 2.4988 train_time: 5.1m tok/s: 7458812 +2879/20000 train_loss: 2.5379 train_time: 5.1m tok/s: 7457974 +2880/20000 train_loss: 2.5457 train_time: 5.1m tok/s: 7457151 +2881/20000 train_loss: 2.6032 train_time: 5.1m tok/s: 7456342 +2882/20000 train_loss: 2.6876 train_time: 5.1m tok/s: 7455483 +2883/20000 train_loss: 2.6427 train_time: 5.1m tok/s: 7454572 +2884/20000 train_loss: 2.6015 train_time: 5.1m tok/s: 7453727 +2885/20000 train_loss: 2.5577 train_time: 5.1m tok/s: 7452865 +2886/20000 train_loss: 2.5555 train_time: 5.1m tok/s: 7452066 +2887/20000 train_loss: 2.5227 train_time: 5.1m tok/s: 7451218 +2888/20000 train_loss: 2.6214 train_time: 5.1m tok/s: 7450368 +2889/20000 train_loss: 2.5733 train_time: 5.1m tok/s: 7449526 +2890/20000 train_loss: 2.5909 train_time: 5.1m tok/s: 7448682 +2891/20000 train_loss: 2.5321 train_time: 5.1m tok/s: 7447883 +2892/20000 train_loss: 2.4688 train_time: 5.1m tok/s: 7447052 +2893/20000 train_loss: 2.3155 train_time: 5.1m tok/s: 7446225 +2894/20000 train_loss: 2.5542 train_time: 5.1m tok/s: 7445369 +2895/20000 train_loss: 2.5221 train_time: 5.1m tok/s: 7444543 +2896/20000 train_loss: 2.4936 train_time: 5.1m tok/s: 7443706 +2897/20000 train_loss: 2.5728 train_time: 5.1m tok/s: 7442888 +2898/20000 train_loss: 2.5634 train_time: 5.1m tok/s: 7442091 +2899/20000 train_loss: 2.5656 train_time: 5.1m tok/s: 7441253 +2900/20000 train_loss: 2.6114 train_time: 5.1m tok/s: 7440416 +2901/20000 train_loss: 2.4223 train_time: 5.1m tok/s: 7439564 +2902/20000 train_loss: 2.5372 train_time: 5.1m tok/s: 7438700 +2903/20000 train_loss: 2.4670 train_time: 5.1m tok/s: 7437844 +2904/20000 train_loss: 2.4634 train_time: 5.1m tok/s: 7437028 +2905/20000 train_loss: 2.6146 train_time: 5.1m tok/s: 7436191 +2906/20000 train_loss: 2.4180 train_time: 5.1m tok/s: 7435381 +2907/20000 train_loss: 2.4593 train_time: 5.1m tok/s: 7434572 +2908/20000 train_loss: 2.5191 train_time: 5.1m tok/s: 7433770 +2909/20000 train_loss: 2.5161 train_time: 5.1m tok/s: 7432963 +2910/20000 train_loss: 2.3910 train_time: 5.1m tok/s: 7432121 +2911/20000 train_loss: 2.5953 train_time: 5.1m tok/s: 7431292 +2912/20000 train_loss: 2.5398 train_time: 5.1m tok/s: 7430425 +2913/20000 train_loss: 2.5704 train_time: 5.1m tok/s: 7429619 +2914/20000 train_loss: 2.6247 train_time: 5.1m tok/s: 7428801 +2915/20000 train_loss: 2.5378 train_time: 5.1m tok/s: 7427980 +2916/20000 train_loss: 2.4730 train_time: 5.1m tok/s: 7427144 +2917/20000 train_loss: 2.5247 train_time: 5.1m tok/s: 7426335 +2918/20000 train_loss: 2.8612 train_time: 5.2m tok/s: 7425527 +2919/20000 train_loss: 2.6980 train_time: 5.2m tok/s: 7424688 +2920/20000 train_loss: 2.4797 train_time: 5.2m tok/s: 7423857 +2921/20000 train_loss: 2.4286 train_time: 5.2m tok/s: 7423049 +2922/20000 train_loss: 2.4657 train_time: 5.2m tok/s: 7422237 +2923/20000 train_loss: 2.4552 train_time: 5.2m tok/s: 7421441 +2924/20000 train_loss: 2.5767 train_time: 5.2m tok/s: 7420611 +2925/20000 train_loss: 2.6226 train_time: 5.2m tok/s: 7419776 +2926/20000 train_loss: 2.5006 train_time: 5.2m tok/s: 7418934 +2927/20000 train_loss: 2.5176 train_time: 5.2m tok/s: 7418125 +2928/20000 train_loss: 2.3554 train_time: 5.2m tok/s: 7417341 +2929/20000 train_loss: 2.5624 train_time: 5.2m tok/s: 7416526 +2930/20000 train_loss: 2.4513 train_time: 5.2m tok/s: 7415728 +2931/20000 train_loss: 2.4476 train_time: 5.2m tok/s: 7414924 +2932/20000 train_loss: 2.4269 train_time: 5.2m tok/s: 7414114 +2933/20000 train_loss: 2.5253 train_time: 5.2m tok/s: 7413312 +2934/20000 train_loss: 2.5403 train_time: 5.2m tok/s: 7412497 +2935/20000 train_loss: 2.5275 train_time: 5.2m tok/s: 7411697 +2936/20000 train_loss: 2.5200 train_time: 5.2m tok/s: 7410891 +2937/20000 train_loss: 2.6509 train_time: 5.2m tok/s: 7410096 +2938/20000 train_loss: 2.5963 train_time: 5.2m tok/s: 7409313 +2939/20000 train_loss: 2.5735 train_time: 5.2m tok/s: 7408511 +2940/20000 train_loss: 2.3908 train_time: 5.2m tok/s: 7407684 +2941/20000 train_loss: 2.6243 train_time: 5.2m tok/s: 7406874 +2942/20000 train_loss: 2.5333 train_time: 5.2m tok/s: 7406056 +2943/20000 train_loss: 2.4281 train_time: 5.2m tok/s: 7405239 +2944/20000 train_loss: 2.4187 train_time: 5.2m tok/s: 7404433 +2945/20000 train_loss: 2.5183 train_time: 5.2m tok/s: 7403622 +2946/20000 train_loss: 2.4905 train_time: 5.2m tok/s: 7402858 +2947/20000 train_loss: 2.4926 train_time: 5.2m tok/s: 7402000 +2948/20000 train_loss: 2.4752 train_time: 5.2m tok/s: 7401234 +2949/20000 train_loss: 2.5488 train_time: 5.2m tok/s: 7400466 +2950/20000 train_loss: 2.6375 train_time: 5.2m tok/s: 7399651 +2951/20000 train_loss: 2.7267 train_time: 5.2m tok/s: 7398852 +2952/20000 train_loss: 2.5537 train_time: 5.2m tok/s: 7398047 +2953/20000 train_loss: 2.5061 train_time: 5.2m tok/s: 7397271 +2954/20000 train_loss: 2.4781 train_time: 5.2m tok/s: 7396467 +2955/20000 train_loss: 2.4688 train_time: 5.2m tok/s: 7395679 +2956/20000 train_loss: 2.4647 train_time: 5.2m tok/s: 7394898 +2957/20000 train_loss: 2.7222 train_time: 5.2m tok/s: 7394082 +2958/20000 train_loss: 2.4373 train_time: 5.2m tok/s: 7393314 +2959/20000 train_loss: 2.4376 train_time: 5.2m tok/s: 7392559 +2960/20000 train_loss: 2.4290 train_time: 5.2m tok/s: 7391765 +2961/20000 train_loss: 2.5371 train_time: 5.3m tok/s: 7390973 +2962/20000 train_loss: 2.4780 train_time: 5.3m tok/s: 7390220 +2963/20000 train_loss: 2.6915 train_time: 5.3m tok/s: 7389415 +2964/20000 train_loss: 2.5893 train_time: 5.3m tok/s: 7388626 +2965/20000 train_loss: 2.6999 train_time: 5.3m tok/s: 7387833 +2966/20000 train_loss: 2.5813 train_time: 5.3m tok/s: 7387061 +2967/20000 train_loss: 2.5507 train_time: 5.3m tok/s: 7386270 +2968/20000 train_loss: 2.4840 train_time: 5.3m tok/s: 7385511 +2969/20000 train_loss: 2.6636 train_time: 5.3m tok/s: 7384699 +2970/20000 train_loss: 2.4755 train_time: 5.3m tok/s: 7383896 +2971/20000 train_loss: 2.4880 train_time: 5.3m tok/s: 7383111 +2972/20000 train_loss: 2.5237 train_time: 5.3m tok/s: 7382357 +2973/20000 train_loss: 2.4493 train_time: 5.3m tok/s: 7381575 +2974/20000 train_loss: 2.5862 train_time: 5.3m tok/s: 7380780 +2975/20000 train_loss: 2.5886 train_time: 5.3m tok/s: 7379997 +2976/20000 train_loss: 2.5513 train_time: 5.3m tok/s: 7379222 +2977/20000 train_loss: 2.4867 train_time: 5.3m tok/s: 7378453 +2978/20000 train_loss: 2.5806 train_time: 5.3m tok/s: 7377655 +2979/20000 train_loss: 2.4800 train_time: 5.3m tok/s: 7376888 +2980/20000 train_loss: 2.5836 train_time: 5.3m tok/s: 7376110 +2981/20000 train_loss: 2.3958 train_time: 5.3m tok/s: 7375305 +2982/20000 train_loss: 2.6258 train_time: 5.3m tok/s: 7374508 +2983/20000 train_loss: 2.4155 train_time: 5.3m tok/s: 7373774 +2984/20000 train_loss: 2.4495 train_time: 5.3m tok/s: 7372990 +2985/20000 train_loss: 2.4490 train_time: 5.3m tok/s: 7372230 +2986/20000 train_loss: 2.5737 train_time: 5.3m tok/s: 7371468 +2987/20000 train_loss: 2.4999 train_time: 5.3m tok/s: 7370646 +2988/20000 train_loss: 2.5546 train_time: 5.3m tok/s: 7369929 +2989/20000 train_loss: 2.4322 train_time: 5.3m tok/s: 7369150 +2990/20000 train_loss: 2.7025 train_time: 5.3m tok/s: 7368363 +2991/20000 train_loss: 2.4831 train_time: 5.3m tok/s: 7367619 +2992/20000 train_loss: 2.6001 train_time: 5.3m tok/s: 7366883 +2993/20000 train_loss: 2.6531 train_time: 5.3m tok/s: 7366052 +2994/20000 train_loss: 2.4351 train_time: 5.3m tok/s: 7365286 +2995/20000 train_loss: 2.6427 train_time: 5.3m tok/s: 7364499 +2996/20000 train_loss: 2.4320 train_time: 5.3m tok/s: 7363734 +2997/20000 train_loss: 2.5787 train_time: 5.3m tok/s: 7362975 +2998/20000 train_loss: 2.4854 train_time: 5.3m tok/s: 7362222 +2999/20000 train_loss: 2.4685 train_time: 5.3m tok/s: 7361471 +3000/20000 train_loss: 2.4860 train_time: 5.3m tok/s: 7360666 +3001/20000 train_loss: 2.5500 train_time: 5.3m tok/s: 7359888 +3002/20000 train_loss: 2.5080 train_time: 5.3m tok/s: 7359159 +3003/20000 train_loss: 2.5388 train_time: 5.3m tok/s: 7358409 +3004/20000 train_loss: 2.4660 train_time: 5.4m tok/s: 7357668 +3005/20000 train_loss: 2.5411 train_time: 5.4m tok/s: 7356924 +3006/20000 train_loss: 2.7568 train_time: 5.4m tok/s: 7356159 +3007/20000 train_loss: 2.5046 train_time: 5.4m tok/s: 7355408 +3008/20000 train_loss: 2.4979 train_time: 5.4m tok/s: 7354668 +3009/20000 train_loss: 2.4757 train_time: 5.4m tok/s: 7353895 +3010/20000 train_loss: 2.4119 train_time: 5.4m tok/s: 7353086 +3011/20000 train_loss: 2.5777 train_time: 5.4m tok/s: 7352307 +3012/20000 train_loss: 2.5513 train_time: 5.4m tok/s: 7351591 +3013/20000 train_loss: 2.5197 train_time: 5.4m tok/s: 7350853 +3014/20000 train_loss: 2.5639 train_time: 5.4m tok/s: 7350148 +3015/20000 train_loss: 2.6657 train_time: 5.4m tok/s: 7349393 +3016/20000 train_loss: 2.6889 train_time: 5.4m tok/s: 7348621 +3017/20000 train_loss: 2.4739 train_time: 5.4m tok/s: 7347854 +3018/20000 train_loss: 2.6194 train_time: 5.4m tok/s: 7347142 +3019/20000 train_loss: 2.4570 train_time: 5.4m tok/s: 7346393 +3020/20000 train_loss: 3.1212 train_time: 5.4m tok/s: 7345579 +3021/20000 train_loss: 2.4699 train_time: 5.4m tok/s: 7344833 +3022/20000 train_loss: 2.4561 train_time: 5.4m tok/s: 7344113 +3023/20000 train_loss: 2.5763 train_time: 5.4m tok/s: 7343318 +3024/20000 train_loss: 3.4371 train_time: 5.4m tok/s: 7342457 +3025/20000 train_loss: 2.4439 train_time: 5.4m tok/s: 7341699 +3026/20000 train_loss: 2.5290 train_time: 5.4m tok/s: 7340896 +3027/20000 train_loss: 2.5665 train_time: 5.4m tok/s: 7340159 +3028/20000 train_loss: 2.6507 train_time: 5.4m tok/s: 7339423 +3029/20000 train_loss: 2.7223 train_time: 5.4m tok/s: 7338729 +3030/20000 train_loss: 2.5703 train_time: 5.4m tok/s: 7338021 +3031/20000 train_loss: 2.5134 train_time: 5.4m tok/s: 7337290 +3032/20000 train_loss: 2.5435 train_time: 5.4m tok/s: 7336538 +3033/20000 train_loss: 2.5709 train_time: 5.4m tok/s: 7335844 +3034/20000 train_loss: 2.4863 train_time: 5.4m tok/s: 7335149 +3035/20000 train_loss: 2.3081 train_time: 5.4m tok/s: 7334416 +3036/20000 train_loss: 2.5762 train_time: 5.4m tok/s: 7333684 +3037/20000 train_loss: 2.4894 train_time: 5.4m tok/s: 7332966 +3038/20000 train_loss: 2.5284 train_time: 5.4m tok/s: 7332246 +3039/20000 train_loss: 2.5183 train_time: 5.4m tok/s: 7331522 +3040/20000 train_loss: 2.4118 train_time: 5.4m tok/s: 7330792 +3041/20000 train_loss: 2.6325 train_time: 5.4m tok/s: 7330055 +3042/20000 train_loss: 2.5827 train_time: 5.4m tok/s: 7329348 +3043/20000 train_loss: 2.6451 train_time: 5.4m tok/s: 7328556 +3044/20000 train_loss: 2.5934 train_time: 5.4m tok/s: 7327868 +3045/20000 train_loss: 2.5812 train_time: 5.4m tok/s: 7327130 +3046/20000 train_loss: 2.5608 train_time: 5.4m tok/s: 7326408 +3047/20000 train_loss: 2.3915 train_time: 5.5m tok/s: 7325654 +3048/20000 train_loss: 2.3591 train_time: 5.5m tok/s: 7324959 +3049/20000 train_loss: 2.5097 train_time: 5.5m tok/s: 7324211 +3050/20000 train_loss: 2.6015 train_time: 5.5m tok/s: 7323473 +3051/20000 train_loss: 2.3650 train_time: 5.5m tok/s: 7322774 +3052/20000 train_loss: 2.4220 train_time: 5.5m tok/s: 7322039 +3053/20000 train_loss: 2.6089 train_time: 5.5m tok/s: 7321270 +3054/20000 train_loss: 2.4221 train_time: 5.5m tok/s: 7320541 +3055/20000 train_loss: 2.5340 train_time: 5.5m tok/s: 7319819 +3056/20000 train_loss: 2.5458 train_time: 5.5m tok/s: 7319099 +3057/20000 train_loss: 2.5058 train_time: 5.5m tok/s: 7318381 +3058/20000 train_loss: 2.5496 train_time: 5.5m tok/s: 7317668 +3059/20000 train_loss: 2.4637 train_time: 5.5m tok/s: 7316955 +3060/20000 train_loss: 2.5736 train_time: 5.5m tok/s: 7316231 +3061/20000 train_loss: 2.4445 train_time: 5.5m tok/s: 7315500 +3062/20000 train_loss: 2.5938 train_time: 5.5m tok/s: 7314773 +3063/20000 train_loss: 2.5459 train_time: 5.5m tok/s: 7314053 +3064/20000 train_loss: 2.4400 train_time: 5.5m tok/s: 7313310 +3065/20000 train_loss: 2.4171 train_time: 5.5m tok/s: 7312612 +3066/20000 train_loss: 2.6747 train_time: 5.5m tok/s: 7311839 +3067/20000 train_loss: 2.4599 train_time: 5.5m tok/s: 7311147 +3068/20000 train_loss: 2.6009 train_time: 5.5m tok/s: 7310415 +3069/20000 train_loss: 2.3988 train_time: 5.5m tok/s: 7309493 +3070/20000 train_loss: 2.5602 train_time: 5.5m tok/s: 7308683 +3071/20000 train_loss: 2.5112 train_time: 5.5m tok/s: 7307923 +3072/20000 train_loss: 2.5408 train_time: 5.5m tok/s: 7307064 +3073/20000 train_loss: 2.6066 train_time: 5.5m tok/s: 7306394 +3074/20000 train_loss: 2.4282 train_time: 5.5m tok/s: 7305474 +3075/20000 train_loss: 2.4153 train_time: 5.5m tok/s: 7304774 +3076/20000 train_loss: 2.4770 train_time: 5.5m tok/s: 7303958 +3077/20000 train_loss: 2.5302 train_time: 5.5m tok/s: 7303292 +3078/20000 train_loss: 2.3932 train_time: 5.5m tok/s: 7302540 +3079/20000 train_loss: 2.4346 train_time: 5.5m tok/s: 7301733 +3080/20000 train_loss: 3.1981 train_time: 5.5m tok/s: 7300929 +3081/20000 train_loss: 2.3831 train_time: 5.5m tok/s: 7300099 +3082/20000 train_loss: 2.4426 train_time: 5.5m tok/s: 7299423 +3083/20000 train_loss: 2.4693 train_time: 5.5m tok/s: 7298593 +3084/20000 train_loss: 2.4718 train_time: 5.5m tok/s: 7297889 +3085/20000 train_loss: 2.5361 train_time: 5.5m tok/s: 7297004 +3086/20000 train_loss: 2.5748 train_time: 5.5m tok/s: 7296305 +3087/20000 train_loss: 2.6139 train_time: 5.5m tok/s: 7295600 +3088/20000 train_loss: 2.5760 train_time: 5.5m tok/s: 7294913 +3089/20000 train_loss: 2.4508 train_time: 5.6m tok/s: 7294205 +3090/20000 train_loss: 2.6975 train_time: 5.6m tok/s: 7293534 +3091/20000 train_loss: 2.4464 train_time: 5.6m tok/s: 7292854 +3092/20000 train_loss: 2.4826 train_time: 5.6m tok/s: 7292123 +3093/20000 train_loss: 2.5253 train_time: 5.6m tok/s: 7291405 +3094/20000 train_loss: 2.4785 train_time: 5.6m tok/s: 7290678 +3095/20000 train_loss: 2.3325 train_time: 5.6m tok/s: 7289971 +3096/20000 train_loss: 2.5155 train_time: 5.6m tok/s: 7289256 +3097/20000 train_loss: 2.5556 train_time: 5.6m tok/s: 7288580 +3098/20000 train_loss: 2.4706 train_time: 5.6m tok/s: 7287893 +3099/20000 train_loss: 2.3260 train_time: 5.6m tok/s: 7287172 +3100/20000 train_loss: 2.4126 train_time: 5.6m tok/s: 7286466 +3101/20000 train_loss: 2.6686 train_time: 5.6m tok/s: 7285794 +3102/20000 train_loss: 2.6482 train_time: 5.6m tok/s: 7285078 +3103/20000 train_loss: 2.4882 train_time: 5.6m tok/s: 7284400 +3104/20000 train_loss: 2.5867 train_time: 5.6m tok/s: 7283720 +3105/20000 train_loss: 2.3819 train_time: 5.6m tok/s: 7283031 +3106/20000 train_loss: 2.5882 train_time: 5.6m tok/s: 7282332 +3107/20000 train_loss: 2.3385 train_time: 5.6m tok/s: 7281590 +3108/20000 train_loss: 2.4734 train_time: 5.6m tok/s: 7280895 +3109/20000 train_loss: 2.5577 train_time: 5.6m tok/s: 7280224 +3110/20000 train_loss: 2.4209 train_time: 5.6m tok/s: 7279562 +3111/20000 train_loss: 2.4370 train_time: 5.6m tok/s: 7278844 +3112/20000 train_loss: 2.3920 train_time: 5.6m tok/s: 7278157 +3113/20000 train_loss: 2.5282 train_time: 5.6m tok/s: 7277463 +3114/20000 train_loss: 2.5434 train_time: 5.6m tok/s: 7276765 +3115/20000 train_loss: 2.5668 train_time: 5.6m tok/s: 7276081 +3116/20000 train_loss: 2.5805 train_time: 5.6m tok/s: 7275391 +3117/20000 train_loss: 2.6045 train_time: 5.6m tok/s: 7274723 +3118/20000 train_loss: 2.5397 train_time: 5.6m tok/s: 7274036 +3119/20000 train_loss: 2.5678 train_time: 5.6m tok/s: 7273336 +3120/20000 train_loss: 2.5547 train_time: 5.6m tok/s: 7272662 +3121/20000 train_loss: 2.5356 train_time: 5.6m tok/s: 7271965 +3122/20000 train_loss: 2.5542 train_time: 5.6m tok/s: 7271274 +3123/20000 train_loss: 2.4328 train_time: 5.6m tok/s: 7270607 +3124/20000 train_loss: 2.5316 train_time: 5.6m tok/s: 7269923 +3125/20000 train_loss: 2.4327 train_time: 5.6m tok/s: 7269220 +3126/20000 train_loss: 2.1292 train_time: 5.6m tok/s: 7268500 +3127/20000 train_loss: 2.5990 train_time: 5.6m tok/s: 7267801 +3128/20000 train_loss: 2.5021 train_time: 5.6m tok/s: 7267105 +3129/20000 train_loss: 2.5328 train_time: 5.6m tok/s: 7266445 +3130/20000 train_loss: 2.4766 train_time: 5.6m tok/s: 7265757 +3131/20000 train_loss: 2.4455 train_time: 5.6m tok/s: 7265071 +3132/20000 train_loss: 2.5702 train_time: 5.7m tok/s: 7264396 +3133/20000 train_loss: 2.5178 train_time: 5.7m tok/s: 7263744 +3134/20000 train_loss: 2.5491 train_time: 5.7m tok/s: 7263066 +3135/20000 train_loss: 2.5047 train_time: 5.7m tok/s: 7262384 +3136/20000 train_loss: 2.5704 train_time: 5.7m tok/s: 7261718 +3137/20000 train_loss: 2.6041 train_time: 5.7m tok/s: 7261028 +3138/20000 train_loss: 2.4472 train_time: 5.7m tok/s: 7260343 +3139/20000 train_loss: 2.4493 train_time: 5.7m tok/s: 7259688 +3140/20000 train_loss: 2.4276 train_time: 5.7m tok/s: 7258999 +3141/20000 train_loss: 2.5741 train_time: 5.7m tok/s: 7258288 +3142/20000 train_loss: 2.5461 train_time: 5.7m tok/s: 7257624 +3143/20000 train_loss: 2.5506 train_time: 5.7m tok/s: 7256941 +3144/20000 train_loss: 2.5437 train_time: 5.7m tok/s: 7256288 +3145/20000 train_loss: 2.0732 train_time: 5.7m tok/s: 7255576 +3146/20000 train_loss: 2.4535 train_time: 5.7m tok/s: 7254902 +3147/20000 train_loss: 2.5530 train_time: 5.7m tok/s: 7254237 +3148/20000 train_loss: 2.4819 train_time: 5.7m tok/s: 7253540 +3149/20000 train_loss: 2.5958 train_time: 5.7m tok/s: 7252892 +3150/20000 train_loss: 2.3915 train_time: 5.7m tok/s: 7252201 +3151/20000 train_loss: 2.4099 train_time: 5.7m tok/s: 7251544 +3152/20000 train_loss: 2.4800 train_time: 5.7m tok/s: 7250898 +3153/20000 train_loss: 2.5204 train_time: 5.7m tok/s: 7250255 +3154/20000 train_loss: 2.3756 train_time: 5.7m tok/s: 7249613 +3155/20000 train_loss: 2.4738 train_time: 5.7m tok/s: 7248922 +3156/20000 train_loss: 2.5923 train_time: 5.7m tok/s: 7248244 +3157/20000 train_loss: 2.5958 train_time: 5.7m tok/s: 7247588 +3158/20000 train_loss: 2.6428 train_time: 5.7m tok/s: 7246909 +3159/20000 train_loss: 2.5233 train_time: 5.7m tok/s: 7246255 +3160/20000 train_loss: 2.3706 train_time: 5.7m tok/s: 7245580 +3161/20000 train_loss: 2.5399 train_time: 5.7m tok/s: 7244920 +3162/20000 train_loss: 2.6273 train_time: 5.7m tok/s: 7244293 +3163/20000 train_loss: 2.5473 train_time: 5.7m tok/s: 7243613 +3164/20000 train_loss: 2.3727 train_time: 5.7m tok/s: 7242927 +3165/20000 train_loss: 2.5085 train_time: 5.7m tok/s: 7242265 +3166/20000 train_loss: 2.4206 train_time: 5.7m tok/s: 7241615 +3167/20000 train_loss: 2.5291 train_time: 5.7m tok/s: 7240963 +3168/20000 train_loss: 2.5098 train_time: 5.7m tok/s: 7240304 +3169/20000 train_loss: 2.4887 train_time: 5.7m tok/s: 7239648 +3170/20000 train_loss: 2.6191 train_time: 5.7m tok/s: 7238913 +3171/20000 train_loss: 2.6568 train_time: 5.7m tok/s: 7238269 +3172/20000 train_loss: 2.6575 train_time: 5.7m tok/s: 7237653 +3173/20000 train_loss: 2.7168 train_time: 5.7m tok/s: 7236963 +3174/20000 train_loss: 2.4722 train_time: 5.7m tok/s: 7236305 +3175/20000 train_loss: 2.6095 train_time: 5.8m tok/s: 7235658 +3176/20000 train_loss: 2.4503 train_time: 5.8m tok/s: 7235018 +3177/20000 train_loss: 2.4525 train_time: 5.8m tok/s: 7234370 +3178/20000 train_loss: 2.5950 train_time: 5.8m tok/s: 7233678 +3179/20000 train_loss: 2.4262 train_time: 5.8m tok/s: 7232938 +3180/20000 train_loss: 2.5258 train_time: 5.8m tok/s: 7232268 +3181/20000 train_loss: 2.3792 train_time: 5.8m tok/s: 7231623 +3182/20000 train_loss: 2.8194 train_time: 5.8m tok/s: 7230961 +3183/20000 train_loss: 2.6630 train_time: 5.8m tok/s: 7230315 +3184/20000 train_loss: 2.5047 train_time: 5.8m tok/s: 7229616 +3185/20000 train_loss: 2.5773 train_time: 5.8m tok/s: 7228992 +3186/20000 train_loss: 2.5296 train_time: 5.8m tok/s: 7228346 +3187/20000 train_loss: 2.5746 train_time: 5.8m tok/s: 7227700 +3188/20000 train_loss: 2.3955 train_time: 5.8m tok/s: 7227043 +3189/20000 train_loss: 2.5944 train_time: 5.8m tok/s: 7226407 +3190/20000 train_loss: 2.5778 train_time: 5.8m tok/s: 7225794 +3191/20000 train_loss: 2.4830 train_time: 5.8m tok/s: 7225134 +3192/20000 train_loss: 2.3810 train_time: 5.8m tok/s: 7224533 +3193/20000 train_loss: 2.4598 train_time: 5.8m tok/s: 7223922 +3194/20000 train_loss: 2.4485 train_time: 5.8m tok/s: 7223296 +3195/20000 train_loss: 2.5039 train_time: 5.8m tok/s: 7222687 +3196/20000 train_loss: 2.3909 train_time: 5.8m tok/s: 7222002 +3197/20000 train_loss: 2.4691 train_time: 5.8m tok/s: 7221391 +3198/20000 train_loss: 2.5014 train_time: 5.8m tok/s: 7220735 +3199/20000 train_loss: 2.5148 train_time: 5.8m tok/s: 7220104 +3200/20000 train_loss: 2.6319 train_time: 5.8m tok/s: 7219442 +3201/20000 train_loss: 2.5703 train_time: 5.8m tok/s: 7218836 +3202/20000 train_loss: 2.2692 train_time: 5.8m tok/s: 7218130 +3203/20000 train_loss: 2.5434 train_time: 5.8m tok/s: 7217453 +3204/20000 train_loss: 2.4446 train_time: 5.8m tok/s: 7216832 +3205/20000 train_loss: 2.5036 train_time: 5.8m tok/s: 7216167 +3206/20000 train_loss: 2.4463 train_time: 5.8m tok/s: 7215531 +3207/20000 train_loss: 2.5554 train_time: 5.8m tok/s: 7214902 +3208/20000 train_loss: 2.5809 train_time: 5.8m tok/s: 7214291 +3209/20000 train_loss: 2.4589 train_time: 5.8m tok/s: 7213665 +3210/20000 train_loss: 2.7165 train_time: 5.8m tok/s: 7212988 +3211/20000 train_loss: 2.5157 train_time: 5.8m tok/s: 7212361 +3212/20000 train_loss: 2.4069 train_time: 5.8m tok/s: 7211742 +3213/20000 train_loss: 2.5613 train_time: 5.8m tok/s: 7211082 +3214/20000 train_loss: 2.4234 train_time: 5.8m tok/s: 7210426 +3215/20000 train_loss: 2.4344 train_time: 5.8m tok/s: 7209795 +3216/20000 train_loss: 2.3228 train_time: 5.8m tok/s: 7209152 +3217/20000 train_loss: 2.4583 train_time: 5.8m tok/s: 7208538 +3218/20000 train_loss: 2.5019 train_time: 5.9m tok/s: 7207923 +3219/20000 train_loss: 2.5634 train_time: 5.9m tok/s: 7207333 +3220/20000 train_loss: 2.6326 train_time: 5.9m tok/s: 7206695 +3221/20000 train_loss: 2.4298 train_time: 5.9m tok/s: 7206064 +3222/20000 train_loss: 2.5313 train_time: 5.9m tok/s: 7205471 +3223/20000 train_loss: 2.6340 train_time: 5.9m tok/s: 7204791 +3224/20000 train_loss: 2.9697 train_time: 5.9m tok/s: 7204143 +3225/20000 train_loss: 2.5837 train_time: 5.9m tok/s: 7203490 +3226/20000 train_loss: 2.4101 train_time: 5.9m tok/s: 7202869 +3227/20000 train_loss: 2.4840 train_time: 5.9m tok/s: 7202233 +3228/20000 train_loss: 2.8148 train_time: 5.9m tok/s: 7201615 +3229/20000 train_loss: 2.4105 train_time: 5.9m tok/s: 7200988 +3230/20000 train_loss: 2.4468 train_time: 5.9m tok/s: 7200382 +3231/20000 train_loss: 2.5551 train_time: 5.9m tok/s: 7199769 +3232/20000 train_loss: 2.5343 train_time: 5.9m tok/s: 7199133 +3233/20000 train_loss: 2.5270 train_time: 5.9m tok/s: 7198504 +3234/20000 train_loss: 2.5980 train_time: 5.9m tok/s: 7197883 +3235/20000 train_loss: 2.5330 train_time: 5.9m tok/s: 7197242 +3236/20000 train_loss: 2.2840 train_time: 5.9m tok/s: 7196609 +3237/20000 train_loss: 2.4687 train_time: 5.9m tok/s: 7196004 +3238/20000 train_loss: 2.3642 train_time: 5.9m tok/s: 7195378 +3239/20000 train_loss: 2.3962 train_time: 5.9m tok/s: 7194721 +3240/20000 train_loss: 2.5002 train_time: 5.9m tok/s: 7194112 +3241/20000 train_loss: 2.4987 train_time: 5.9m tok/s: 7193495 +3242/20000 train_loss: 2.5321 train_time: 5.9m tok/s: 7192882 +3243/20000 train_loss: 2.4466 train_time: 5.9m tok/s: 7192261 +3244/20000 train_loss: 2.5594 train_time: 5.9m tok/s: 7191657 +3245/20000 train_loss: 2.6333 train_time: 5.9m tok/s: 7191056 +3246/20000 train_loss: 2.5169 train_time: 5.9m tok/s: 7190405 +3247/20000 train_loss: 2.4124 train_time: 5.9m tok/s: 7189781 +3248/20000 train_loss: 2.4686 train_time: 5.9m tok/s: 7189164 +3249/20000 train_loss: 2.5947 train_time: 5.9m tok/s: 7188578 +3250/20000 train_loss: 2.5571 train_time: 5.9m tok/s: 7187921 +3251/20000 train_loss: 2.5918 train_time: 5.9m tok/s: 7187289 +3252/20000 train_loss: 2.4486 train_time: 5.9m tok/s: 7186677 +3253/20000 train_loss: 2.4423 train_time: 5.9m tok/s: 7186060 +3254/20000 train_loss: 2.9104 train_time: 5.9m tok/s: 7185371 +3255/20000 train_loss: 2.4604 train_time: 5.9m tok/s: 7184747 +3256/20000 train_loss: 2.5041 train_time: 5.9m tok/s: 7184180 +3257/20000 train_loss: 2.5762 train_time: 5.9m tok/s: 7183523 +3258/20000 train_loss: 2.5418 train_time: 5.9m tok/s: 7182950 +3259/20000 train_loss: 2.4949 train_time: 5.9m tok/s: 7182340 +3260/20000 train_loss: 2.4864 train_time: 5.9m tok/s: 7181760 +3261/20000 train_loss: 2.5263 train_time: 6.0m tok/s: 7181161 +3262/20000 train_loss: 2.4059 train_time: 6.0m tok/s: 7180535 +3263/20000 train_loss: 2.4678 train_time: 6.0m tok/s: 7179909 +3264/20000 train_loss: 2.5175 train_time: 6.0m tok/s: 7179288 +3265/20000 train_loss: 2.5341 train_time: 6.0m tok/s: 7178709 +3266/20000 train_loss: 2.5432 train_time: 6.0m tok/s: 7178097 +3267/20000 train_loss: 2.4932 train_time: 6.0m tok/s: 7177492 +3268/20000 train_loss: 2.5175 train_time: 6.0m tok/s: 7176873 +3269/20000 train_loss: 2.6209 train_time: 6.0m tok/s: 7176255 +3270/20000 train_loss: 2.4708 train_time: 6.0m tok/s: 7175625 +3271/20000 train_loss: 2.5532 train_time: 6.0m tok/s: 7174991 +3272/20000 train_loss: 2.5181 train_time: 6.0m tok/s: 7174402 +3273/20000 train_loss: 2.6717 train_time: 6.0m tok/s: 7173814 +3274/20000 train_loss: 2.4754 train_time: 6.0m tok/s: 7173210 +3275/20000 train_loss: 2.5261 train_time: 6.0m tok/s: 7172622 +3276/20000 train_loss: 2.4992 train_time: 6.0m tok/s: 7172005 +3277/20000 train_loss: 2.5155 train_time: 6.0m tok/s: 7171327 +3278/20000 train_loss: 2.4040 train_time: 6.0m tok/s: 7170747 +3279/20000 train_loss: 2.4751 train_time: 6.0m tok/s: 7170154 +3280/20000 train_loss: 2.3836 train_time: 6.0m tok/s: 7169547 +3281/20000 train_loss: 2.4226 train_time: 6.0m tok/s: 7168892 +3282/20000 train_loss: 2.4922 train_time: 6.0m tok/s: 7168266 +3283/20000 train_loss: 2.4363 train_time: 6.0m tok/s: 7167662 +3284/20000 train_loss: 2.6385 train_time: 6.0m tok/s: 7167065 +3285/20000 train_loss: 2.4895 train_time: 6.0m tok/s: 7166472 +3286/20000 train_loss: 2.4477 train_time: 6.0m tok/s: 7165878 +3287/20000 train_loss: 2.5134 train_time: 6.0m tok/s: 7165319 +3288/20000 train_loss: 2.5073 train_time: 6.0m tok/s: 7164730 +3289/20000 train_loss: 2.4870 train_time: 6.0m tok/s: 7164147 +3290/20000 train_loss: 2.4588 train_time: 6.0m tok/s: 7163556 +3291/20000 train_loss: 2.4501 train_time: 6.0m tok/s: 7162954 +3292/20000 train_loss: 2.4649 train_time: 6.0m tok/s: 7162378 +3293/20000 train_loss: 2.4634 train_time: 6.0m tok/s: 7161764 +3294/20000 train_loss: 2.4345 train_time: 6.0m tok/s: 7161176 +3295/20000 train_loss: 2.5251 train_time: 6.0m tok/s: 7160578 +3296/20000 train_loss: 2.3930 train_time: 6.0m tok/s: 7159966 +3297/20000 train_loss: 2.4077 train_time: 6.0m tok/s: 7159374 +3298/20000 train_loss: 2.4839 train_time: 6.0m tok/s: 7158756 +3299/20000 train_loss: 2.3012 train_time: 6.0m tok/s: 7158101 +3300/20000 train_loss: 2.4608 train_time: 6.0m tok/s: 7157502 +3301/20000 train_loss: 2.4325 train_time: 6.0m tok/s: 7156919 +3302/20000 train_loss: 2.2204 train_time: 6.0m tok/s: 7156251 +3303/20000 train_loss: 2.6574 train_time: 6.1m tok/s: 7155656 +3304/20000 train_loss: 2.5004 train_time: 6.1m tok/s: 7155095 +3305/20000 train_loss: 2.5696 train_time: 6.1m tok/s: 7154540 +3306/20000 train_loss: 2.6163 train_time: 6.1m tok/s: 7153948 +3307/20000 train_loss: 2.5265 train_time: 6.1m tok/s: 7153320 +3308/20000 train_loss: 2.2498 train_time: 6.1m tok/s: 7152746 +3309/20000 train_loss: 2.4774 train_time: 6.1m tok/s: 7152174 +3310/20000 train_loss: 2.5415 train_time: 6.1m tok/s: 7151628 +3311/20000 train_loss: 2.6638 train_time: 6.1m tok/s: 7151001 +3312/20000 train_loss: 2.5331 train_time: 6.1m tok/s: 7150425 +3313/20000 train_loss: 2.3760 train_time: 6.1m tok/s: 7149842 +3314/20000 train_loss: 2.4970 train_time: 6.1m tok/s: 7149267 +3315/20000 train_loss: 2.5203 train_time: 6.1m tok/s: 7148690 +3316/20000 train_loss: 2.5088 train_time: 6.1m tok/s: 7148132 +3317/20000 train_loss: 2.4289 train_time: 6.1m tok/s: 7147524 +3318/20000 train_loss: 2.4002 train_time: 6.1m tok/s: 7146941 +3319/20000 train_loss: 2.4007 train_time: 6.1m tok/s: 7146336 +3320/20000 train_loss: 2.4288 train_time: 6.1m tok/s: 7145748 +3321/20000 train_loss: 2.4467 train_time: 6.1m tok/s: 7145184 +3322/20000 train_loss: 2.4017 train_time: 6.1m tok/s: 7144578 +3323/20000 train_loss: 2.4123 train_time: 6.1m tok/s: 7143994 +3324/20000 train_loss: 2.4539 train_time: 6.1m tok/s: 7143420 +3325/20000 train_loss: 2.5590 train_time: 6.1m tok/s: 7142880 +3326/20000 train_loss: 2.5077 train_time: 6.1m tok/s: 7142291 +3327/20000 train_loss: 2.4247 train_time: 6.1m tok/s: 7141713 +3328/20000 train_loss: 2.5676 train_time: 6.1m tok/s: 7141122 +3329/20000 train_loss: 2.4625 train_time: 6.1m tok/s: 7140545 +3330/20000 train_loss: 2.8037 train_time: 6.1m tok/s: 7139960 +3331/20000 train_loss: 2.6184 train_time: 6.1m tok/s: 7139342 +3332/20000 train_loss: 2.5302 train_time: 6.1m tok/s: 7138789 +3333/20000 train_loss: 2.3845 train_time: 6.1m tok/s: 7138231 +3334/20000 train_loss: 2.4681 train_time: 6.1m tok/s: 7137639 +3335/20000 train_loss: 2.5720 train_time: 6.1m tok/s: 7137048 +3336/20000 train_loss: 2.5187 train_time: 6.1m tok/s: 7136484 +3337/20000 train_loss: 2.2974 train_time: 6.1m tok/s: 7135915 +3338/20000 train_loss: 2.4682 train_time: 6.1m tok/s: 7135341 +3339/20000 train_loss: 2.4121 train_time: 6.1m tok/s: 7134761 +3340/20000 train_loss: 2.4342 train_time: 6.1m tok/s: 7134198 +3341/20000 train_loss: 2.4528 train_time: 6.1m tok/s: 7133610 +3342/20000 train_loss: 2.3825 train_time: 6.1m tok/s: 7133016 +3343/20000 train_loss: 2.3834 train_time: 6.1m tok/s: 7132442 +3344/20000 train_loss: 2.5434 train_time: 6.1m tok/s: 7131869 +3345/20000 train_loss: 2.4738 train_time: 6.1m tok/s: 7131291 +3346/20000 train_loss: 2.4884 train_time: 6.2m tok/s: 7130706 +3347/20000 train_loss: 2.5262 train_time: 6.2m tok/s: 7130163 +3348/20000 train_loss: 2.5833 train_time: 6.2m tok/s: 7129570 +3349/20000 train_loss: 2.4993 train_time: 6.2m tok/s: 7128998 +3350/20000 train_loss: 2.4570 train_time: 6.2m tok/s: 7128432 +3351/20000 train_loss: 2.4860 train_time: 6.2m tok/s: 7127856 +3352/20000 train_loss: 2.5012 train_time: 6.2m tok/s: 7127297 +3353/20000 train_loss: 2.4338 train_time: 6.2m tok/s: 7126715 +3354/20000 train_loss: 2.5252 train_time: 6.2m tok/s: 7126152 +3355/20000 train_loss: 2.5696 train_time: 6.2m tok/s: 7125557 +3356/20000 train_loss: 2.3541 train_time: 6.2m tok/s: 7124939 +3357/20000 train_loss: 2.3602 train_time: 6.2m tok/s: 7124359 +3358/20000 train_loss: 2.5043 train_time: 6.2m tok/s: 7123761 +3359/20000 train_loss: 2.4421 train_time: 6.2m tok/s: 7123192 +3360/20000 train_loss: 2.4509 train_time: 6.2m tok/s: 7122672 +3361/20000 train_loss: 2.5175 train_time: 6.2m tok/s: 7122133 +3362/20000 train_loss: 2.4887 train_time: 6.2m tok/s: 7121558 +3363/20000 train_loss: 2.4231 train_time: 6.2m tok/s: 7120989 +3364/20000 train_loss: 2.5105 train_time: 6.2m tok/s: 7120436 +3365/20000 train_loss: 2.5173 train_time: 6.2m tok/s: 7119868 +3366/20000 train_loss: 2.3852 train_time: 6.2m tok/s: 7119276 +3367/20000 train_loss: 2.4252 train_time: 6.2m tok/s: 7118699 +3368/20000 train_loss: 2.3779 train_time: 6.2m tok/s: 7118165 +3369/20000 train_loss: 2.6108 train_time: 6.2m tok/s: 7117565 +3370/20000 train_loss: 2.5871 train_time: 6.2m tok/s: 7116997 +3371/20000 train_loss: 2.5060 train_time: 6.2m tok/s: 7116427 +3372/20000 train_loss: 2.5787 train_time: 6.2m tok/s: 7115907 +3373/20000 train_loss: 2.4683 train_time: 6.2m tok/s: 7115343 +3374/20000 train_loss: 2.4671 train_time: 6.2m tok/s: 7114782 +3375/20000 train_loss: 2.4781 train_time: 6.2m tok/s: 7114217 +3376/20000 train_loss: 2.4273 train_time: 6.2m tok/s: 7113664 +3377/20000 train_loss: 2.5538 train_time: 6.2m tok/s: 7113113 +3378/20000 train_loss: 2.3416 train_time: 6.2m tok/s: 7112534 +3379/20000 train_loss: 2.4152 train_time: 6.2m tok/s: 7111994 +3380/20000 train_loss: 2.3567 train_time: 6.2m tok/s: 7111403 +3381/20000 train_loss: 2.3326 train_time: 6.2m tok/s: 7110803 +3382/20000 train_loss: 2.5760 train_time: 6.2m tok/s: 7110266 +3383/20000 train_loss: 2.5016 train_time: 6.2m tok/s: 7109736 +3384/20000 train_loss: 2.4818 train_time: 6.2m tok/s: 7109177 +3385/20000 train_loss: 2.4769 train_time: 6.2m tok/s: 7108637 +3386/20000 train_loss: 2.4949 train_time: 6.2m tok/s: 7108091 +3387/20000 train_loss: 2.5232 train_time: 6.2m tok/s: 7107526 +3388/20000 train_loss: 2.2891 train_time: 6.2m tok/s: 7106889 +3389/20000 train_loss: 2.4278 train_time: 6.3m tok/s: 7106355 +3390/20000 train_loss: 2.5000 train_time: 6.3m tok/s: 7105815 +3391/20000 train_loss: 2.5007 train_time: 6.3m tok/s: 7105293 +3392/20000 train_loss: 2.4633 train_time: 6.3m tok/s: 7104733 +3393/20000 train_loss: 2.4511 train_time: 6.3m tok/s: 7104191 +3394/20000 train_loss: 2.4640 train_time: 6.3m tok/s: 7103677 +3395/20000 train_loss: 2.4617 train_time: 6.3m tok/s: 7103118 +3396/20000 train_loss: 2.6050 train_time: 6.3m tok/s: 7102549 +3397/20000 train_loss: 2.5261 train_time: 6.3m tok/s: 7101955 +3398/20000 train_loss: 2.3639 train_time: 6.3m tok/s: 7101429 +3399/20000 train_loss: 2.4255 train_time: 6.3m tok/s: 7100850 +3400/20000 train_loss: 2.5280 train_time: 6.3m tok/s: 7100282 +3401/20000 train_loss: 2.4075 train_time: 6.3m tok/s: 7099722 +3402/20000 train_loss: 2.4485 train_time: 6.3m tok/s: 7099192 +3403/20000 train_loss: 2.4900 train_time: 6.3m tok/s: 7098649 +3404/20000 train_loss: 2.4499 train_time: 6.3m tok/s: 7098128 +3405/20000 train_loss: 2.6260 train_time: 6.3m tok/s: 7097588 +3406/20000 train_loss: 2.4729 train_time: 6.3m tok/s: 7097048 +3407/20000 train_loss: 2.5036 train_time: 6.3m tok/s: 7096483 +3408/20000 train_loss: 2.5684 train_time: 6.3m tok/s: 7095919 +3409/20000 train_loss: 2.4139 train_time: 6.3m tok/s: 7095372 +3410/20000 train_loss: 2.4226 train_time: 6.3m tok/s: 7094823 +3411/20000 train_loss: 2.3462 train_time: 6.3m tok/s: 7094273 +3412/20000 train_loss: 2.3423 train_time: 6.3m tok/s: 7093758 +3413/20000 train_loss: 2.3376 train_time: 6.3m tok/s: 7093192 +3414/20000 train_loss: 2.4450 train_time: 6.3m tok/s: 7092646 +3415/20000 train_loss: 2.5944 train_time: 6.3m tok/s: 7092117 +3416/20000 train_loss: 2.4977 train_time: 6.3m tok/s: 7091583 +3417/20000 train_loss: 2.5712 train_time: 6.3m tok/s: 7091053 +3418/20000 train_loss: 2.5062 train_time: 6.3m tok/s: 7090500 +3419/20000 train_loss: 2.5312 train_time: 6.3m tok/s: 7089953 +3420/20000 train_loss: 2.5377 train_time: 6.3m tok/s: 7089383 +3421/20000 train_loss: 2.3779 train_time: 6.3m tok/s: 7088833 +3422/20000 train_loss: 2.6874 train_time: 6.3m tok/s: 7088272 +3423/20000 train_loss: 2.4000 train_time: 6.3m tok/s: 7087730 +3424/20000 train_loss: 2.4541 train_time: 6.3m tok/s: 7087231 +3425/20000 train_loss: 2.4303 train_time: 6.3m tok/s: 7086681 +3426/20000 train_loss: 2.5105 train_time: 6.3m tok/s: 7086140 +3427/20000 train_loss: 2.4585 train_time: 6.3m tok/s: 7085600 +3428/20000 train_loss: 2.4551 train_time: 6.3m tok/s: 7085053 +3429/20000 train_loss: 2.6024 train_time: 6.3m tok/s: 7084506 +3430/20000 train_loss: 2.4204 train_time: 6.3m tok/s: 7083958 +3431/20000 train_loss: 2.4695 train_time: 6.3m tok/s: 7083436 +3432/20000 train_loss: 2.5961 train_time: 6.4m tok/s: 7082893 +3433/20000 train_loss: 2.4583 train_time: 6.4m tok/s: 7082366 +3434/20000 train_loss: 2.5114 train_time: 6.4m tok/s: 7081828 +3435/20000 train_loss: 2.4376 train_time: 6.4m tok/s: 7081269 +3436/20000 train_loss: 2.4138 train_time: 6.4m tok/s: 7080720 +3437/20000 train_loss: 2.5503 train_time: 6.4m tok/s: 7080204 +3438/20000 train_loss: 2.4864 train_time: 6.4m tok/s: 7079680 +3439/20000 train_loss: 2.3277 train_time: 6.4m tok/s: 7079123 +3440/20000 train_loss: 2.4817 train_time: 6.4m tok/s: 7078586 +3441/20000 train_loss: 2.3814 train_time: 6.4m tok/s: 7078003 +3442/20000 train_loss: 2.4021 train_time: 6.4m tok/s: 7077510 +3443/20000 train_loss: 2.6648 train_time: 6.4m tok/s: 7076903 +3444/20000 train_loss: 2.3234 train_time: 6.4m tok/s: 7076377 +3445/20000 train_loss: 2.4638 train_time: 6.4m tok/s: 7075898 +3446/20000 train_loss: 2.5260 train_time: 6.4m tok/s: 7075372 +3447/20000 train_loss: 2.4863 train_time: 6.4m tok/s: 7074839 +3448/20000 train_loss: 2.6026 train_time: 6.4m tok/s: 7074315 +3449/20000 train_loss: 2.4733 train_time: 6.4m tok/s: 7073804 +3450/20000 train_loss: 2.4280 train_time: 6.4m tok/s: 7073254 +3451/20000 train_loss: 2.4918 train_time: 6.4m tok/s: 7072755 +3452/20000 train_loss: 2.5184 train_time: 6.4m tok/s: 7072247 +3453/20000 train_loss: 2.4059 train_time: 6.4m tok/s: 7071744 +3454/20000 train_loss: 2.5257 train_time: 6.4m tok/s: 7071200 +3455/20000 train_loss: 2.4845 train_time: 6.4m tok/s: 7070679 +3456/20000 train_loss: 2.4849 train_time: 6.4m tok/s: 7070140 +3457/20000 train_loss: 2.4412 train_time: 6.4m tok/s: 7069615 +3458/20000 train_loss: 2.3997 train_time: 6.4m tok/s: 7069079 +3459/20000 train_loss: 2.3683 train_time: 6.4m tok/s: 7068521 +3460/20000 train_loss: 2.5196 train_time: 6.4m tok/s: 7068017 +3461/20000 train_loss: 2.6246 train_time: 6.4m tok/s: 7067488 +3462/20000 train_loss: 2.5508 train_time: 6.4m tok/s: 7066948 +3463/20000 train_loss: 2.5735 train_time: 6.4m tok/s: 7066439 +3464/20000 train_loss: 2.4842 train_time: 6.4m tok/s: 7065924 +3465/20000 train_loss: 2.5349 train_time: 6.4m tok/s: 7065376 +3466/20000 train_loss: 2.5924 train_time: 6.4m tok/s: 7064803 +3467/20000 train_loss: 2.4467 train_time: 6.4m tok/s: 7064280 +3468/20000 train_loss: 2.6319 train_time: 6.4m tok/s: 7063711 +3469/20000 train_loss: 2.3939 train_time: 6.4m tok/s: 7063167 +3470/20000 train_loss: 2.3993 train_time: 6.4m tok/s: 7062664 +3471/20000 train_loss: 2.3929 train_time: 6.4m tok/s: 7062146 +3472/20000 train_loss: 2.4341 train_time: 6.4m tok/s: 7061632 +3473/20000 train_loss: 2.3993 train_time: 6.4m tok/s: 7061104 +3474/20000 train_loss: 2.4744 train_time: 6.4m tok/s: 7060610 +3475/20000 train_loss: 2.5226 train_time: 6.5m tok/s: 7060095 +3476/20000 train_loss: 2.4549 train_time: 6.5m tok/s: 7059572 +3477/20000 train_loss: 2.5324 train_time: 6.5m tok/s: 7059075 +3478/20000 train_loss: 2.5535 train_time: 6.5m tok/s: 7058553 +3479/20000 train_loss: 2.4599 train_time: 6.5m tok/s: 7058042 +3480/20000 train_loss: 2.5231 train_time: 6.5m tok/s: 7057546 +3481/20000 train_loss: 2.4842 train_time: 6.5m tok/s: 7057003 +3482/20000 train_loss: 2.4288 train_time: 6.5m tok/s: 7056477 +3483/20000 train_loss: 2.4127 train_time: 6.5m tok/s: 7055944 +3484/20000 train_loss: 2.4662 train_time: 6.5m tok/s: 7055400 +3485/20000 train_loss: 2.3986 train_time: 6.5m tok/s: 7054893 +3486/20000 train_loss: 2.3821 train_time: 6.5m tok/s: 7054362 +3487/20000 train_loss: 2.4755 train_time: 6.5m tok/s: 7053856 +3488/20000 train_loss: 2.3663 train_time: 6.5m tok/s: 7053354 +3489/20000 train_loss: 2.3873 train_time: 6.5m tok/s: 7052829 +3490/20000 train_loss: 2.4820 train_time: 6.5m tok/s: 7052301 +3491/20000 train_loss: 2.5840 train_time: 6.5m tok/s: 7051817 +3492/20000 train_loss: 2.4249 train_time: 6.5m tok/s: 7051299 +3493/20000 train_loss: 2.5534 train_time: 6.5m tok/s: 7050793 +3494/20000 train_loss: 2.5277 train_time: 6.5m tok/s: 7050286 +3495/20000 train_loss: 2.4907 train_time: 6.5m tok/s: 7049772 +3496/20000 train_loss: 2.5968 train_time: 6.5m tok/s: 7049225 +3497/20000 train_loss: 2.6824 train_time: 6.5m tok/s: 7048694 +3498/20000 train_loss: 2.3511 train_time: 6.5m tok/s: 7048178 +3499/20000 train_loss: 2.4600 train_time: 6.5m tok/s: 7047679 +3500/20000 train_loss: 2.3666 train_time: 6.5m tok/s: 7047162 +3501/20000 train_loss: 2.4142 train_time: 6.5m tok/s: 7046645 +3502/20000 train_loss: 3.2613 train_time: 6.5m tok/s: 7046082 +3503/20000 train_loss: 2.4620 train_time: 6.5m tok/s: 7045598 +3504/20000 train_loss: 2.5556 train_time: 6.5m tok/s: 7045090 +3505/20000 train_loss: 2.4995 train_time: 6.5m tok/s: 7044567 +3506/20000 train_loss: 2.5622 train_time: 6.5m tok/s: 7044073 +3507/20000 train_loss: 2.5269 train_time: 6.5m tok/s: 7043592 +3508/20000 train_loss: 2.6192 train_time: 6.5m tok/s: 7043055 +3509/20000 train_loss: 2.4363 train_time: 6.5m tok/s: 7042520 +3510/20000 train_loss: 2.4292 train_time: 6.5m tok/s: 7042033 +3511/20000 train_loss: 2.4147 train_time: 6.5m tok/s: 7041508 +3512/20000 train_loss: 2.4649 train_time: 6.5m tok/s: 7041017 +3513/20000 train_loss: 2.4137 train_time: 6.5m tok/s: 7040550 +3514/20000 train_loss: 2.5413 train_time: 6.5m tok/s: 7040035 +3515/20000 train_loss: 2.3643 train_time: 6.5m tok/s: 7039532 +3516/20000 train_loss: 2.2750 train_time: 6.5m tok/s: 7038995 +3517/20000 train_loss: 2.4485 train_time: 6.5m tok/s: 7038488 +3518/20000 train_loss: 2.4144 train_time: 6.6m tok/s: 7037972 +3519/20000 train_loss: 2.8127 train_time: 6.6m tok/s: 7037445 +3520/20000 train_loss: 2.6393 train_time: 6.6m tok/s: 7036936 +3521/20000 train_loss: 2.4987 train_time: 6.6m tok/s: 7036426 +3522/20000 train_loss: 2.5067 train_time: 6.6m tok/s: 7035930 +3523/20000 train_loss: 2.3843 train_time: 6.6m tok/s: 7035422 +3524/20000 train_loss: 2.7613 train_time: 6.6m tok/s: 7034900 +3525/20000 train_loss: 2.5236 train_time: 6.6m tok/s: 7034427 +3526/20000 train_loss: 2.4878 train_time: 6.6m tok/s: 7033908 +3527/20000 train_loss: 2.4172 train_time: 6.6m tok/s: 7033417 +3528/20000 train_loss: 2.4569 train_time: 6.6m tok/s: 7032946 +3529/20000 train_loss: 2.4493 train_time: 6.6m tok/s: 7032444 +3530/20000 train_loss: 2.6714 train_time: 6.6m tok/s: 7031961 +3531/20000 train_loss: 2.2977 train_time: 6.6m tok/s: 7031454 +3532/20000 train_loss: 2.2762 train_time: 6.6m tok/s: 7030945 +3533/20000 train_loss: 2.4345 train_time: 6.6m tok/s: 7030435 +3534/20000 train_loss: 2.4729 train_time: 6.6m tok/s: 7029950 +3535/20000 train_loss: 2.4041 train_time: 6.6m tok/s: 7029415 +3536/20000 train_loss: 2.4706 train_time: 6.6m tok/s: 7028927 +3537/20000 train_loss: 2.6063 train_time: 6.6m tok/s: 7028451 +3538/20000 train_loss: 2.4481 train_time: 6.6m tok/s: 7027949 +3539/20000 train_loss: 2.4781 train_time: 6.6m tok/s: 7027432 +3540/20000 train_loss: 2.4951 train_time: 6.6m tok/s: 7026930 +3541/20000 train_loss: 2.2979 train_time: 6.6m tok/s: 7026422 +3542/20000 train_loss: 2.4078 train_time: 6.6m tok/s: 7025955 +3543/20000 train_loss: 2.5151 train_time: 6.6m tok/s: 7025471 +3544/20000 train_loss: 2.4238 train_time: 6.6m tok/s: 7024992 +3545/20000 train_loss: 2.4573 train_time: 6.6m tok/s: 7024496 +3546/20000 train_loss: 2.3651 train_time: 6.6m tok/s: 7023995 +3547/20000 train_loss: 2.3354 train_time: 6.6m tok/s: 7023501 +3548/20000 train_loss: 2.5770 train_time: 6.6m tok/s: 7023004 +3549/20000 train_loss: 2.5417 train_time: 6.6m tok/s: 7022525 +3550/20000 train_loss: 2.5460 train_time: 6.6m tok/s: 7022025 +3551/20000 train_loss: 2.5698 train_time: 6.6m tok/s: 7021543 +3552/20000 train_loss: 2.4710 train_time: 6.6m tok/s: 7021035 +3553/20000 train_loss: 2.5823 train_time: 6.6m tok/s: 7020523 +3554/20000 train_loss: 2.4992 train_time: 6.6m tok/s: 7020010 +3555/20000 train_loss: 2.5143 train_time: 6.6m tok/s: 7019520 +3556/20000 train_loss: 2.5207 train_time: 6.6m tok/s: 7019009 +3557/20000 train_loss: 2.4410 train_time: 6.6m tok/s: 7018551 +3558/20000 train_loss: 2.6233 train_time: 6.6m tok/s: 7018049 +3559/20000 train_loss: 2.4923 train_time: 6.6m tok/s: 7017561 +3560/20000 train_loss: 2.4459 train_time: 6.6m tok/s: 7017072 +3561/20000 train_loss: 3.1465 train_time: 6.7m tok/s: 7016554 +3562/20000 train_loss: 2.3838 train_time: 6.7m tok/s: 7016077 +3563/20000 train_loss: 2.4799 train_time: 6.7m tok/s: 7015594 +3564/20000 train_loss: 2.4844 train_time: 6.7m tok/s: 7015127 +3565/20000 train_loss: 2.4681 train_time: 6.7m tok/s: 7014645 +3566/20000 train_loss: 2.4666 train_time: 6.7m tok/s: 7014140 +3567/20000 train_loss: 2.5243 train_time: 6.7m tok/s: 7013638 +3568/20000 train_loss: 2.5337 train_time: 6.7m tok/s: 7013148 +3569/20000 train_loss: 2.3281 train_time: 6.7m tok/s: 7012648 +3570/20000 train_loss: 2.3019 train_time: 6.7m tok/s: 7012158 +3571/20000 train_loss: 2.3832 train_time: 6.7m tok/s: 7011673 +3572/20000 train_loss: 2.3914 train_time: 6.7m tok/s: 7011203 +3573/20000 train_loss: 2.2672 train_time: 6.7m tok/s: 7010683 +3574/20000 train_loss: 2.4242 train_time: 6.7m tok/s: 7010165 +3575/20000 train_loss: 2.5045 train_time: 6.7m tok/s: 7009686 +3576/20000 train_loss: 2.5092 train_time: 6.7m tok/s: 7009221 +3577/20000 train_loss: 2.4848 train_time: 6.7m tok/s: 7008742 +3578/20000 train_loss: 2.5610 train_time: 6.7m tok/s: 7008271 +3579/20000 train_loss: 2.5194 train_time: 6.7m tok/s: 7007794 +3580/20000 train_loss: 2.5210 train_time: 6.7m tok/s: 7007305 +3581/20000 train_loss: 2.5115 train_time: 6.7m tok/s: 7006830 +3582/20000 train_loss: 2.3791 train_time: 6.7m tok/s: 7006341 +3583/20000 train_loss: 2.4147 train_time: 6.7m tok/s: 7005848 +3584/20000 train_loss: 2.3580 train_time: 6.7m tok/s: 7005349 +3585/20000 train_loss: 2.3395 train_time: 6.7m tok/s: 7004833 +3586/20000 train_loss: 2.1756 train_time: 6.7m tok/s: 7004307 +3587/20000 train_loss: 2.3308 train_time: 6.7m tok/s: 7003853 +3588/20000 train_loss: 2.3801 train_time: 6.7m tok/s: 7003378 +3589/20000 train_loss: 2.6193 train_time: 6.7m tok/s: 7002867 +3590/20000 train_loss: 2.5106 train_time: 6.7m tok/s: 7002397 +3591/20000 train_loss: 2.5268 train_time: 6.7m tok/s: 7001935 +3592/20000 train_loss: 2.5208 train_time: 6.7m tok/s: 7001472 +3593/20000 train_loss: 2.5709 train_time: 6.7m tok/s: 7001001 +3594/20000 train_loss: 2.4924 train_time: 6.7m tok/s: 7000528 +3595/20000 train_loss: 2.5374 train_time: 6.7m tok/s: 7000056 +3596/20000 train_loss: 2.5190 train_time: 6.7m tok/s: 6999591 +3597/20000 train_loss: 2.5500 train_time: 6.7m tok/s: 6999120 +3598/20000 train_loss: 2.2688 train_time: 6.7m tok/s: 6998612 +3599/20000 train_loss: 2.4736 train_time: 6.7m tok/s: 6998142 +3600/20000 train_loss: 2.5624 train_time: 6.7m tok/s: 6997686 +3601/20000 train_loss: 2.4300 train_time: 6.7m tok/s: 6997222 +3602/20000 train_loss: 2.3179 train_time: 6.7m tok/s: 6996747 +3603/20000 train_loss: 2.5487 train_time: 6.8m tok/s: 6996275 +3604/20000 train_loss: 2.5783 train_time: 6.8m tok/s: 6995807 +3605/20000 train_loss: 2.4440 train_time: 6.8m tok/s: 6995328 +3606/20000 train_loss: 2.4473 train_time: 6.8m tok/s: 6994845 +3607/20000 train_loss: 2.5774 train_time: 6.8m tok/s: 6994356 +3608/20000 train_loss: 2.4193 train_time: 6.8m tok/s: 6993886 +3609/20000 train_loss: 2.4335 train_time: 6.8m tok/s: 6993407 +3610/20000 train_loss: 2.5459 train_time: 6.8m tok/s: 6992937 +3611/20000 train_loss: 2.5320 train_time: 6.8m tok/s: 6992475 +3612/20000 train_loss: 2.4099 train_time: 6.8m tok/s: 6992003 +3613/20000 train_loss: 2.4477 train_time: 6.8m tok/s: 6991525 +3614/20000 train_loss: 2.5454 train_time: 6.8m tok/s: 6991072 +3615/20000 train_loss: 2.3676 train_time: 6.8m tok/s: 6990591 +3616/20000 train_loss: 2.4756 train_time: 6.8m tok/s: 6990110 +3617/20000 train_loss: 2.4351 train_time: 6.8m tok/s: 6989633 +3618/20000 train_loss: 2.3693 train_time: 6.8m tok/s: 6989147 +3619/20000 train_loss: 2.6458 train_time: 6.8m tok/s: 6988677 +3620/20000 train_loss: 2.3821 train_time: 6.8m tok/s: 6988209 +3621/20000 train_loss: 2.5038 train_time: 6.8m tok/s: 6987749 +3622/20000 train_loss: 2.5285 train_time: 6.8m tok/s: 6987249 +3623/20000 train_loss: 2.5359 train_time: 6.8m tok/s: 6986732 +3624/20000 train_loss: 2.5947 train_time: 6.8m tok/s: 6986249 +3625/20000 train_loss: 2.4439 train_time: 6.8m tok/s: 6985819 +3626/20000 train_loss: 2.4044 train_time: 6.8m tok/s: 6985369 +3627/20000 train_loss: 2.3816 train_time: 6.8m tok/s: 6984924 +3628/20000 train_loss: 2.3982 train_time: 6.8m tok/s: 6984463 +3629/20000 train_loss: 2.5300 train_time: 6.8m tok/s: 6984009 +3630/20000 train_loss: 2.5393 train_time: 6.8m tok/s: 6983528 +3631/20000 train_loss: 2.5354 train_time: 6.8m tok/s: 6983058 +3632/20000 train_loss: 2.5869 train_time: 6.8m tok/s: 6982591 +3633/20000 train_loss: 2.3972 train_time: 6.8m tok/s: 6982153 +3634/20000 train_loss: 2.5175 train_time: 6.8m tok/s: 6981715 +3635/20000 train_loss: 2.5024 train_time: 6.8m tok/s: 6981238 +3636/20000 train_loss: 2.4855 train_time: 6.8m tok/s: 6980756 +3637/20000 train_loss: 2.4420 train_time: 6.8m tok/s: 6980274 +3638/20000 train_loss: 2.4300 train_time: 6.8m tok/s: 6979805 +3639/20000 train_loss: 2.4515 train_time: 6.8m tok/s: 6979325 +3640/20000 train_loss: 2.4013 train_time: 6.8m tok/s: 6978851 +3641/20000 train_loss: 2.3872 train_time: 6.8m tok/s: 6978414 +3642/20000 train_loss: 2.3036 train_time: 6.8m tok/s: 6977931 +3643/20000 train_loss: 2.3904 train_time: 6.8m tok/s: 6977477 +3644/20000 train_loss: 2.1024 train_time: 6.8m tok/s: 6976984 +3645/20000 train_loss: 2.5231 train_time: 6.8m tok/s: 6976524 +3646/20000 train_loss: 2.5359 train_time: 6.9m tok/s: 6976080 +3647/20000 train_loss: 2.5373 train_time: 6.9m tok/s: 6975633 +3648/20000 train_loss: 2.4791 train_time: 6.9m tok/s: 6975181 +3649/20000 train_loss: 2.4510 train_time: 6.9m tok/s: 6974717 +3650/20000 train_loss: 2.6570 train_time: 6.9m tok/s: 6974224 +3651/20000 train_loss: 2.5567 train_time: 6.9m tok/s: 6973756 +3652/20000 train_loss: 2.4394 train_time: 6.9m tok/s: 6973302 +3653/20000 train_loss: 2.4510 train_time: 6.9m tok/s: 6972851 +3654/20000 train_loss: 2.3826 train_time: 6.9m tok/s: 6972420 +3655/20000 train_loss: 2.4060 train_time: 6.9m tok/s: 6971954 +3656/20000 train_loss: 2.3851 train_time: 6.9m tok/s: 6971469 +3657/20000 train_loss: 2.4269 train_time: 6.9m tok/s: 6971006 +3658/20000 train_loss: 2.5552 train_time: 6.9m tok/s: 6970537 +3659/20000 train_loss: 2.4341 train_time: 6.9m tok/s: 6970100 +3660/20000 train_loss: 2.3424 train_time: 6.9m tok/s: 6969617 +3661/20000 train_loss: 2.4460 train_time: 6.9m tok/s: 6969145 +3662/20000 train_loss: 2.4626 train_time: 6.9m tok/s: 6968704 +3663/20000 train_loss: 2.0474 train_time: 6.9m tok/s: 6968229 +3664/20000 train_loss: 2.6473 train_time: 6.9m tok/s: 6967793 +3665/20000 train_loss: 2.4913 train_time: 6.9m tok/s: 6967319 +3666/20000 train_loss: 2.4467 train_time: 6.9m tok/s: 6966846 +3667/20000 train_loss: 2.3273 train_time: 6.9m tok/s: 6966393 +3668/20000 train_loss: 2.2318 train_time: 6.9m tok/s: 6965920 +3669/20000 train_loss: 2.3908 train_time: 6.9m tok/s: 6965425 +3670/20000 train_loss: 2.4290 train_time: 6.9m tok/s: 6964987 +3671/20000 train_loss: 2.4360 train_time: 6.9m tok/s: 6964545 +3672/20000 train_loss: 2.5413 train_time: 6.9m tok/s: 6964103 +3673/20000 train_loss: 2.4566 train_time: 6.9m tok/s: 6963665 +3674/20000 train_loss: 2.5405 train_time: 6.9m tok/s: 6963226 +3675/20000 train_loss: 2.5830 train_time: 6.9m tok/s: 6962773 +3676/20000 train_loss: 2.4695 train_time: 6.9m tok/s: 6962323 +3677/20000 train_loss: 2.5379 train_time: 6.9m tok/s: 6961901 +3678/20000 train_loss: 2.4787 train_time: 6.9m tok/s: 6961441 +3679/20000 train_loss: 2.5713 train_time: 6.9m tok/s: 6960953 +3680/20000 train_loss: 2.4726 train_time: 6.9m tok/s: 6960528 +3681/20000 train_loss: 2.5354 train_time: 6.9m tok/s: 6960080 +3682/20000 train_loss: 2.3550 train_time: 6.9m tok/s: 6959616 +3683/20000 train_loss: 2.5751 train_time: 6.9m tok/s: 6959165 +3684/20000 train_loss: 2.4065 train_time: 6.9m tok/s: 6958713 +3685/20000 train_loss: 2.6496 train_time: 6.9m tok/s: 6958255 +3686/20000 train_loss: 2.4960 train_time: 6.9m tok/s: 6957790 +3687/20000 train_loss: 2.5694 train_time: 6.9m tok/s: 6957369 +3688/20000 train_loss: 2.4781 train_time: 6.9m tok/s: 6956917 +3689/20000 train_loss: 2.5447 train_time: 7.0m tok/s: 6956473 +3690/20000 train_loss: 2.6045 train_time: 7.0m tok/s: 6956037 +3691/20000 train_loss: 2.5553 train_time: 7.0m tok/s: 6955582 +3692/20000 train_loss: 2.4950 train_time: 7.0m tok/s: 6955138 +3693/20000 train_loss: 2.5060 train_time: 7.0m tok/s: 6954676 +3694/20000 train_loss: 2.5063 train_time: 7.0m tok/s: 6954250 +3695/20000 train_loss: 2.4252 train_time: 7.0m tok/s: 6953784 +3696/20000 train_loss: 2.4112 train_time: 7.0m tok/s: 6953317 +3697/20000 train_loss: 2.4252 train_time: 7.0m tok/s: 6952860 +3698/20000 train_loss: 2.4755 train_time: 7.0m tok/s: 6952431 +3699/20000 train_loss: 2.4955 train_time: 7.0m tok/s: 6952006 +3700/20000 train_loss: 2.6308 train_time: 7.0m tok/s: 6951530 +3701/20000 train_loss: 2.5308 train_time: 7.0m tok/s: 6951081 +3702/20000 train_loss: 2.6049 train_time: 7.0m tok/s: 6950633 +3703/20000 train_loss: 2.5395 train_time: 7.0m tok/s: 6950170 +3704/20000 train_loss: 2.8504 train_time: 7.0m tok/s: 6949699 +3705/20000 train_loss: 2.4415 train_time: 7.0m tok/s: 6949281 +3706/20000 train_loss: 2.4586 train_time: 7.0m tok/s: 6948838 +3707/20000 train_loss: 2.4563 train_time: 7.0m tok/s: 6948409 +3708/20000 train_loss: 2.4078 train_time: 7.0m tok/s: 6947980 +3709/20000 train_loss: 2.5379 train_time: 7.0m tok/s: 6947557 +3710/20000 train_loss: 2.4045 train_time: 7.0m tok/s: 6947106 +3711/20000 train_loss: 2.5436 train_time: 7.0m tok/s: 6946655 +3712/20000 train_loss: 2.4579 train_time: 7.0m tok/s: 6946208 +3713/20000 train_loss: 2.4666 train_time: 7.0m tok/s: 6945771 +3714/20000 train_loss: 2.4311 train_time: 7.0m tok/s: 6945336 +3715/20000 train_loss: 2.4250 train_time: 7.0m tok/s: 6944905 +3716/20000 train_loss: 2.4175 train_time: 7.0m tok/s: 6944443 +3717/20000 train_loss: 2.3527 train_time: 7.0m tok/s: 6944004 +3718/20000 train_loss: 2.5505 train_time: 7.0m tok/s: 6943562 +3719/20000 train_loss: 2.4370 train_time: 7.0m tok/s: 6943130 +3720/20000 train_loss: 2.5071 train_time: 7.0m tok/s: 6942681 +3721/20000 train_loss: 2.5569 train_time: 7.0m tok/s: 6942229 +3722/20000 train_loss: 2.4425 train_time: 7.0m tok/s: 6941778 +3723/20000 train_loss: 2.6144 train_time: 7.0m tok/s: 6941331 +3724/20000 train_loss: 2.4241 train_time: 7.0m tok/s: 6940899 +3725/20000 train_loss: 2.4685 train_time: 7.0m tok/s: 6940452 +3726/20000 train_loss: 2.4326 train_time: 7.0m tok/s: 6940018 +3727/20000 train_loss: 2.4226 train_time: 7.0m tok/s: 6939582 +3728/20000 train_loss: 2.4421 train_time: 7.0m tok/s: 6939152 +3729/20000 train_loss: 2.5146 train_time: 7.0m tok/s: 6938722 +3730/20000 train_loss: 2.4277 train_time: 7.0m tok/s: 6938285 +3731/20000 train_loss: 2.5127 train_time: 7.0m tok/s: 6937857 +3732/20000 train_loss: 2.4351 train_time: 7.1m tok/s: 6937409 +3733/20000 train_loss: 2.5374 train_time: 7.1m tok/s: 6936987 +3734/20000 train_loss: 2.4500 train_time: 7.1m tok/s: 6936568 +3735/20000 train_loss: 2.5236 train_time: 7.1m tok/s: 6936104 +3736/20000 train_loss: 2.4301 train_time: 7.1m tok/s: 6935632 +3737/20000 train_loss: 2.4456 train_time: 7.1m tok/s: 6935217 +3738/20000 train_loss: 2.3940 train_time: 7.1m tok/s: 6934770 +3739/20000 train_loss: 2.4605 train_time: 7.1m tok/s: 6934354 +3740/20000 train_loss: 2.5335 train_time: 7.1m tok/s: 6933930 +3741/20000 train_loss: 2.3322 train_time: 7.1m tok/s: 6933452 +3742/20000 train_loss: 2.4603 train_time: 7.1m tok/s: 6933036 +3743/20000 train_loss: 2.4098 train_time: 7.1m tok/s: 6932581 +3744/20000 train_loss: 2.4352 train_time: 7.1m tok/s: 6932173 +3745/20000 train_loss: 2.5164 train_time: 7.1m tok/s: 6931757 +3746/20000 train_loss: 2.3734 train_time: 7.1m tok/s: 6931312 +3747/20000 train_loss: 2.4375 train_time: 7.1m tok/s: 6930879 +3748/20000 train_loss: 2.5497 train_time: 7.1m tok/s: 6930462 +3749/20000 train_loss: 2.5413 train_time: 7.1m tok/s: 6930024 +3750/20000 train_loss: 2.4797 train_time: 7.1m tok/s: 6929574 +3751/20000 train_loss: 2.5771 train_time: 7.1m tok/s: 6929156 +3752/20000 train_loss: 2.4399 train_time: 7.1m tok/s: 6928723 +3753/20000 train_loss: 2.3564 train_time: 7.1m tok/s: 6928293 +3754/20000 train_loss: 2.4468 train_time: 7.1m tok/s: 6927868 +3755/20000 train_loss: 2.3877 train_time: 7.1m tok/s: 6927418 +3756/20000 train_loss: 2.4834 train_time: 7.1m tok/s: 6926999 +3757/20000 train_loss: 2.3679 train_time: 7.1m tok/s: 6926566 +3758/20000 train_loss: 2.7266 train_time: 7.1m tok/s: 6926136 +3759/20000 train_loss: 2.6070 train_time: 7.1m tok/s: 6925719 +3760/20000 train_loss: 2.5438 train_time: 7.1m tok/s: 6925301 +3761/20000 train_loss: 2.6252 train_time: 7.1m tok/s: 6924841 +3762/20000 train_loss: 2.4711 train_time: 7.1m tok/s: 6924410 +3763/20000 train_loss: 2.4951 train_time: 7.1m tok/s: 6923984 +3764/20000 train_loss: 2.4541 train_time: 7.1m tok/s: 6923564 +3765/20000 train_loss: 2.4298 train_time: 7.1m tok/s: 6923135 +3766/20000 train_loss: 2.4160 train_time: 7.1m tok/s: 6922726 +3767/20000 train_loss: 2.4754 train_time: 7.1m tok/s: 6922287 +3768/20000 train_loss: 2.4730 train_time: 7.1m tok/s: 6921858 +3769/20000 train_loss: 2.4116 train_time: 7.1m tok/s: 6921437 +3770/20000 train_loss: 2.5575 train_time: 7.1m tok/s: 6921015 +3771/20000 train_loss: 2.4484 train_time: 7.1m tok/s: 6920582 +3772/20000 train_loss: 2.5163 train_time: 7.1m tok/s: 6920153 +3773/20000 train_loss: 2.4890 train_time: 7.1m tok/s: 6919738 +3774/20000 train_loss: 2.3915 train_time: 7.1m tok/s: 6919325 +3775/20000 train_loss: 2.6255 train_time: 7.2m tok/s: 6918905 +3776/20000 train_loss: 2.4993 train_time: 7.2m tok/s: 6918469 +3777/20000 train_loss: 2.4009 train_time: 7.2m tok/s: 6918025 +3778/20000 train_loss: 2.4639 train_time: 7.2m tok/s: 6917620 +3779/20000 train_loss: 2.4570 train_time: 7.2m tok/s: 6917203 +3780/20000 train_loss: 2.3806 train_time: 7.2m tok/s: 6916774 +3781/20000 train_loss: 2.4478 train_time: 7.2m tok/s: 6916321 +3782/20000 train_loss: 2.3845 train_time: 7.2m tok/s: 6915883 +3783/20000 train_loss: 2.4574 train_time: 7.2m tok/s: 6915474 +3784/20000 train_loss: 2.4628 train_time: 7.2m tok/s: 6915085 +3785/20000 train_loss: 2.4815 train_time: 7.2m tok/s: 6914685 +3786/20000 train_loss: 2.5002 train_time: 7.2m tok/s: 6914264 +3787/20000 train_loss: 2.5079 train_time: 7.2m tok/s: 6913824 +3788/20000 train_loss: 2.4197 train_time: 7.2m tok/s: 6913400 +3789/20000 train_loss: 2.3864 train_time: 7.2m tok/s: 6913009 +3790/20000 train_loss: 2.4499 train_time: 7.2m tok/s: 6912589 +3791/20000 train_loss: 2.3469 train_time: 7.2m tok/s: 6912162 +3792/20000 train_loss: 2.4631 train_time: 7.2m tok/s: 6911726 +3793/20000 train_loss: 2.3849 train_time: 7.2m tok/s: 6911278 +3794/20000 train_loss: 2.4797 train_time: 7.2m tok/s: 6910860 +3795/20000 train_loss: 2.4674 train_time: 7.2m tok/s: 6910433 +3796/20000 train_loss: 2.5369 train_time: 7.2m tok/s: 6910009 +3797/20000 train_loss: 2.5795 train_time: 7.2m tok/s: 6909592 +3798/20000 train_loss: 2.6861 train_time: 7.2m tok/s: 6909180 +3799/20000 train_loss: 2.4848 train_time: 7.2m tok/s: 6908756 +3800/20000 train_loss: 2.4874 train_time: 7.2m tok/s: 6908315 +3801/20000 train_loss: 2.2677 train_time: 7.2m tok/s: 6907882 +3802/20000 train_loss: 2.4870 train_time: 7.2m tok/s: 6907472 +3803/20000 train_loss: 2.3987 train_time: 7.2m tok/s: 6907081 +3804/20000 train_loss: 2.4704 train_time: 7.2m tok/s: 6906681 +3805/20000 train_loss: 2.3905 train_time: 7.2m tok/s: 6906275 +3806/20000 train_loss: 2.4750 train_time: 7.2m tok/s: 6905859 +3807/20000 train_loss: 2.4750 train_time: 7.2m tok/s: 6905438 +3808/20000 train_loss: 2.6567 train_time: 7.2m tok/s: 6905026 +3809/20000 train_loss: 2.5722 train_time: 7.2m tok/s: 6904617 +3810/20000 train_loss: 2.4330 train_time: 7.2m tok/s: 6904183 +3811/20000 train_loss: 2.4417 train_time: 7.2m tok/s: 6903760 +3812/20000 train_loss: 2.4410 train_time: 7.2m tok/s: 6903372 +3813/20000 train_loss: 2.4562 train_time: 7.2m tok/s: 6902953 +3814/20000 train_loss: 4.3643 train_time: 7.2m tok/s: 6902480 +3815/20000 train_loss: 2.4399 train_time: 7.2m tok/s: 6902055 +3816/20000 train_loss: 2.5606 train_time: 7.2m tok/s: 6901615 +3817/20000 train_loss: 2.4954 train_time: 7.2m tok/s: 6901230 +3818/20000 train_loss: 2.5139 train_time: 7.3m tok/s: 6900825 +3819/20000 train_loss: 2.3856 train_time: 7.3m tok/s: 6900430 +3820/20000 train_loss: 2.5529 train_time: 7.3m tok/s: 6900005 +3821/20000 train_loss: 2.4249 train_time: 7.3m tok/s: 6899606 +3822/20000 train_loss: 2.4596 train_time: 7.3m tok/s: 6899209 +3823/20000 train_loss: 2.4849 train_time: 7.3m tok/s: 6898815 +3824/20000 train_loss: 2.5082 train_time: 7.3m tok/s: 6898430 +3825/20000 train_loss: 2.4818 train_time: 7.3m tok/s: 6898013 +3826/20000 train_loss: 2.5556 train_time: 7.3m tok/s: 6897592 +3827/20000 train_loss: 2.5690 train_time: 7.3m tok/s: 6897179 +3828/20000 train_loss: 2.5521 train_time: 7.3m tok/s: 6896759 +3829/20000 train_loss: 2.5103 train_time: 7.3m tok/s: 6896358 +3830/20000 train_loss: 2.5445 train_time: 7.3m tok/s: 6895959 +3831/20000 train_loss: 2.5065 train_time: 7.3m tok/s: 6895538 +3832/20000 train_loss: 2.4585 train_time: 7.3m tok/s: 6895128 +3833/20000 train_loss: 2.4892 train_time: 7.3m tok/s: 6894718 +3834/20000 train_loss: 2.4200 train_time: 7.3m tok/s: 6894300 +3835/20000 train_loss: 2.4584 train_time: 7.3m tok/s: 6893896 +3836/20000 train_loss: 2.4239 train_time: 7.3m tok/s: 6893483 +3837/20000 train_loss: 2.4438 train_time: 7.3m tok/s: 6893096 +3838/20000 train_loss: 2.4516 train_time: 7.3m tok/s: 6892665 +3839/20000 train_loss: 2.4359 train_time: 7.3m tok/s: 6892258 +3840/20000 train_loss: 2.3938 train_time: 7.3m tok/s: 6891856 +3841/20000 train_loss: 2.5600 train_time: 7.3m tok/s: 6891459 +3842/20000 train_loss: 2.4213 train_time: 7.3m tok/s: 6891053 +3843/20000 train_loss: 2.4505 train_time: 7.3m tok/s: 6890640 +3844/20000 train_loss: 2.3915 train_time: 7.3m tok/s: 6890269 +3845/20000 train_loss: 2.5076 train_time: 7.3m tok/s: 6889860 +3846/20000 train_loss: 2.5979 train_time: 7.3m tok/s: 6889435 +3847/20000 train_loss: 2.4147 train_time: 7.3m tok/s: 6889039 +3848/20000 train_loss: 2.4411 train_time: 7.3m tok/s: 6888631 +3849/20000 train_loss: 2.5728 train_time: 7.3m tok/s: 6888213 +3850/20000 train_loss: 2.5451 train_time: 7.3m tok/s: 6887781 +3851/20000 train_loss: 2.4591 train_time: 7.3m tok/s: 6887368 +3852/20000 train_loss: 2.4417 train_time: 7.3m tok/s: 6886975 +3853/20000 train_loss: 2.3931 train_time: 7.3m tok/s: 6886571 +3854/20000 train_loss: 2.2923 train_time: 7.3m tok/s: 6886184 +3855/20000 train_loss: 2.4881 train_time: 7.3m tok/s: 6885811 +3856/20000 train_loss: 2.4109 train_time: 7.3m tok/s: 6885391 +3857/20000 train_loss: 2.2105 train_time: 7.3m tok/s: 6884937 +3858/20000 train_loss: 2.4890 train_time: 7.3m tok/s: 6884497 +3859/20000 train_loss: 2.3970 train_time: 7.3m tok/s: 6884101 +3860/20000 train_loss: 2.3325 train_time: 7.3m tok/s: 6883725 +3861/20000 train_loss: 2.4748 train_time: 7.4m tok/s: 6883340 +3862/20000 train_loss: 2.5965 train_time: 7.4m tok/s: 6882947 +3863/20000 train_loss: 2.5172 train_time: 7.4m tok/s: 6882571 +3864/20000 train_loss: 2.4664 train_time: 7.4m tok/s: 6882160 +3865/20000 train_loss: 2.4892 train_time: 7.4m tok/s: 6881753 +3866/20000 train_loss: 2.4108 train_time: 7.4m tok/s: 6881352 +3867/20000 train_loss: 2.8966 train_time: 7.4m tok/s: 6880929 +3868/20000 train_loss: 2.3213 train_time: 7.4m tok/s: 6880500 +3869/20000 train_loss: 2.4434 train_time: 7.4m tok/s: 6880117 +3870/20000 train_loss: 2.5595 train_time: 7.4m tok/s: 6879739 +3871/20000 train_loss: 2.4040 train_time: 7.4m tok/s: 6879339 +3872/20000 train_loss: 2.4521 train_time: 7.4m tok/s: 6878956 +3873/20000 train_loss: 2.3848 train_time: 7.4m tok/s: 6878552 +3874/20000 train_loss: 2.4459 train_time: 7.4m tok/s: 6878180 +3875/20000 train_loss: 2.4445 train_time: 7.4m tok/s: 6877772 +3876/20000 train_loss: 2.3330 train_time: 7.4m tok/s: 6877369 +3877/20000 train_loss: 2.6177 train_time: 7.4m tok/s: 6876959 +3878/20000 train_loss: 2.9864 train_time: 7.4m tok/s: 6876522 +3879/20000 train_loss: 2.4215 train_time: 7.4m tok/s: 6876109 +3880/20000 train_loss: 2.4189 train_time: 7.4m tok/s: 6875732 +3881/20000 train_loss: 2.4080 train_time: 7.4m tok/s: 6875333 +3882/20000 train_loss: 2.3107 train_time: 7.4m tok/s: 6874939 +3883/20000 train_loss: 2.4713 train_time: 7.4m tok/s: 6874544 +3884/20000 train_loss: 2.4661 train_time: 7.4m tok/s: 6874174 +3885/20000 train_loss: 2.4107 train_time: 7.4m tok/s: 6873769 +3886/20000 train_loss: 2.3840 train_time: 7.4m tok/s: 6873381 +3887/20000 train_loss: 2.4196 train_time: 7.4m tok/s: 6872980 +3888/20000 train_loss: 2.4241 train_time: 7.4m tok/s: 6872598 +3889/20000 train_loss: 2.4070 train_time: 7.4m tok/s: 6872206 +3890/20000 train_loss: 2.3569 train_time: 7.4m tok/s: 6871823 +3891/20000 train_loss: 2.5891 train_time: 7.4m tok/s: 6871426 +3892/20000 train_loss: 2.3919 train_time: 7.4m tok/s: 6871046 +3893/20000 train_loss: 2.5019 train_time: 7.4m tok/s: 6870660 +3894/20000 train_loss: 2.2318 train_time: 7.4m tok/s: 6870251 +3895/20000 train_loss: 2.5604 train_time: 7.4m tok/s: 6869862 +3896/20000 train_loss: 2.4684 train_time: 7.4m tok/s: 6869450 +3897/20000 train_loss: 2.4666 train_time: 7.4m tok/s: 6869046 +3898/20000 train_loss: 2.3843 train_time: 7.4m tok/s: 6868671 +3899/20000 train_loss: 2.6210 train_time: 7.4m tok/s: 6868292 +3900/20000 train_loss: 2.5398 train_time: 7.4m tok/s: 6867869 +3901/20000 train_loss: 2.4488 train_time: 7.4m tok/s: 6867496 +3902/20000 train_loss: 2.4836 train_time: 7.4m tok/s: 6867075 +3903/20000 train_loss: 2.4843 train_time: 7.5m tok/s: 6866670 +3904/20000 train_loss: 2.3097 train_time: 7.5m tok/s: 6866278 +3905/20000 train_loss: 2.4339 train_time: 7.5m tok/s: 6865883 +3906/20000 train_loss: 2.5458 train_time: 7.5m tok/s: 6865482 +3907/20000 train_loss: 2.4673 train_time: 7.5m tok/s: 6865113 +3908/20000 train_loss: 2.4658 train_time: 7.5m tok/s: 6864735 +3909/20000 train_loss: 2.4747 train_time: 7.5m tok/s: 6864350 +3910/20000 train_loss: 2.4505 train_time: 7.5m tok/s: 6863940 +3911/20000 train_loss: 2.5220 train_time: 7.5m tok/s: 6863553 +3912/20000 train_loss: 2.5772 train_time: 7.5m tok/s: 6863171 +3913/20000 train_loss: 2.4878 train_time: 7.5m tok/s: 6862791 +3914/20000 train_loss: 2.4568 train_time: 7.5m tok/s: 6862414 +3915/20000 train_loss: 2.3558 train_time: 7.5m tok/s: 6862046 +3916/20000 train_loss: 2.4161 train_time: 7.5m tok/s: 6861649 +3917/20000 train_loss: 2.5839 train_time: 7.5m tok/s: 6861241 +3918/20000 train_loss: 2.4461 train_time: 7.5m tok/s: 6860863 +3919/20000 train_loss: 2.4019 train_time: 7.5m tok/s: 6860484 +3920/20000 train_loss: 2.3553 train_time: 7.5m tok/s: 6860086 +3921/20000 train_loss: 2.4960 train_time: 7.5m tok/s: 6859707 +3922/20000 train_loss: 2.5812 train_time: 7.5m tok/s: 6859312 +3923/20000 train_loss: 2.4943 train_time: 7.5m tok/s: 6858954 +3924/20000 train_loss: 2.5576 train_time: 7.5m tok/s: 6858563 +3925/20000 train_loss: 2.4399 train_time: 7.5m tok/s: 6858173 +3926/20000 train_loss: 2.3679 train_time: 7.5m tok/s: 6857759 +3927/20000 train_loss: 2.3425 train_time: 7.5m tok/s: 6857376 +3928/20000 train_loss: 2.4462 train_time: 7.5m tok/s: 6856983 +3929/20000 train_loss: 2.4344 train_time: 7.5m tok/s: 6856599 +3930/20000 train_loss: 2.5224 train_time: 7.5m tok/s: 6856227 +3931/20000 train_loss: 2.4873 train_time: 7.5m tok/s: 6855827 +3932/20000 train_loss: 2.4741 train_time: 7.5m tok/s: 6855441 +3933/20000 train_loss: 2.5457 train_time: 7.5m tok/s: 6855079 +3934/20000 train_loss: 2.4876 train_time: 7.5m tok/s: 6854686 +3935/20000 train_loss: 2.4068 train_time: 7.5m tok/s: 6854285 +3936/20000 train_loss: 2.3849 train_time: 7.5m tok/s: 6853886 +3937/20000 train_loss: 2.4073 train_time: 7.5m tok/s: 6853517 +3938/20000 train_loss: 2.3514 train_time: 7.5m tok/s: 6853152 +3939/20000 train_loss: 2.4975 train_time: 7.5m tok/s: 6852799 +3940/20000 train_loss: 2.4246 train_time: 7.5m tok/s: 6852408 +3941/20000 train_loss: 2.5815 train_time: 7.5m tok/s: 6852010 +3942/20000 train_loss: 2.3550 train_time: 7.5m tok/s: 6851620 +3943/20000 train_loss: 2.4203 train_time: 7.5m tok/s: 6851233 +3944/20000 train_loss: 2.4997 train_time: 7.5m tok/s: 6850869 +3945/20000 train_loss: 2.4129 train_time: 7.5m tok/s: 6850479 +3946/20000 train_loss: 2.4487 train_time: 7.6m tok/s: 6850073 +3947/20000 train_loss: 2.4708 train_time: 7.6m tok/s: 6849727 +3948/20000 train_loss: 2.3537 train_time: 7.6m tok/s: 6849338 +3949/20000 train_loss: 2.3739 train_time: 7.6m tok/s: 6848980 +3950/20000 train_loss: 2.3438 train_time: 7.6m tok/s: 6848599 +3951/20000 train_loss: 2.4704 train_time: 7.6m tok/s: 6848199 +3952/20000 train_loss: 2.4448 train_time: 7.6m tok/s: 6847816 +3953/20000 train_loss: 2.4676 train_time: 7.6m tok/s: 6847442 +3954/20000 train_loss: 2.4827 train_time: 7.6m tok/s: 6847081 +3955/20000 train_loss: 2.4738 train_time: 7.6m tok/s: 6846716 +3956/20000 train_loss: 2.4587 train_time: 7.6m tok/s: 6846345 +3957/20000 train_loss: 2.3429 train_time: 7.6m tok/s: 6845950 +3958/20000 train_loss: 2.5289 train_time: 7.6m tok/s: 6845563 +3959/20000 train_loss: 2.2728 train_time: 7.6m tok/s: 6845157 +3960/20000 train_loss: 2.4012 train_time: 7.6m tok/s: 6844782 +3961/20000 train_loss: 2.3845 train_time: 7.6m tok/s: 6844400 +3962/20000 train_loss: 2.3858 train_time: 7.6m tok/s: 6844035 +3963/20000 train_loss: 2.4865 train_time: 7.6m tok/s: 6843646 +3964/20000 train_loss: 2.2929 train_time: 7.6m tok/s: 6843266 +3965/20000 train_loss: 2.5916 train_time: 7.6m tok/s: 6842895 +3966/20000 train_loss: 2.4607 train_time: 7.6m tok/s: 6842514 +3967/20000 train_loss: 2.4265 train_time: 7.6m tok/s: 6842139 +3968/20000 train_loss: 2.4696 train_time: 7.6m tok/s: 6841770 +3969/20000 train_loss: 2.4181 train_time: 7.6m tok/s: 6841416 +3970/20000 train_loss: 2.5118 train_time: 7.6m tok/s: 6841043 +3971/20000 train_loss: 2.4174 train_time: 7.6m tok/s: 6840655 +3972/20000 train_loss: 2.4308 train_time: 7.6m tok/s: 6840273 +3973/20000 train_loss: 2.3967 train_time: 7.6m tok/s: 6839902 +3974/20000 train_loss: 2.3551 train_time: 7.6m tok/s: 6839534 +3975/20000 train_loss: 2.5185 train_time: 7.6m tok/s: 6839146 +3976/20000 train_loss: 2.4908 train_time: 7.6m tok/s: 6838773 +3977/20000 train_loss: 2.7632 train_time: 7.6m tok/s: 6838404 +3978/20000 train_loss: 2.4931 train_time: 7.6m tok/s: 6838030 +3979/20000 train_loss: 3.1353 train_time: 7.6m tok/s: 6837606 +3980/20000 train_loss: 2.4527 train_time: 7.6m tok/s: 6837221 +3981/20000 train_loss: 2.3806 train_time: 7.6m tok/s: 6836871 +3982/20000 train_loss: 2.4762 train_time: 7.6m tok/s: 6836502 +3983/20000 train_loss: 2.3985 train_time: 7.6m tok/s: 6836120 +3984/20000 train_loss: 2.5233 train_time: 7.6m tok/s: 6835746 +3985/20000 train_loss: 2.4775 train_time: 7.6m tok/s: 6835382 +3986/20000 train_loss: 2.4070 train_time: 7.6m tok/s: 6835031 +3987/20000 train_loss: 2.5668 train_time: 7.6m tok/s: 6834665 +3988/20000 train_loss: 2.4912 train_time: 7.6m tok/s: 6834310 +3989/20000 train_loss: 2.4397 train_time: 7.7m tok/s: 6833953 +3990/20000 train_loss: 2.4308 train_time: 7.7m tok/s: 6833573 +3991/20000 train_loss: 2.4465 train_time: 7.7m tok/s: 6833213 +3992/20000 train_loss: 2.4581 train_time: 7.7m tok/s: 6832840 +3993/20000 train_loss: 2.3835 train_time: 7.7m tok/s: 6832487 +3994/20000 train_loss: 2.1509 train_time: 7.7m tok/s: 6832095 +3995/20000 train_loss: 2.4570 train_time: 7.7m tok/s: 6831707 +3996/20000 train_loss: 2.3691 train_time: 7.7m tok/s: 6831340 +3997/20000 train_loss: 2.4743 train_time: 7.7m tok/s: 6830976 +3998/20000 train_loss: 2.4019 train_time: 7.7m tok/s: 6830592 +3999/20000 train_loss: 2.4127 train_time: 7.7m tok/s: 6830216 +4000/20000 train_loss: 2.5109 train_time: 7.7m tok/s: 6829869 +4001/20000 train_loss: 2.4235 train_time: 7.7m tok/s: 6829501 +4002/20000 train_loss: 2.4290 train_time: 7.7m tok/s: 6829146 +4003/20000 train_loss: 2.3757 train_time: 7.7m tok/s: 6828777 +4004/20000 train_loss: 2.4843 train_time: 7.7m tok/s: 6828430 +4005/20000 train_loss: 2.4378 train_time: 7.7m tok/s: 6828069 +4006/20000 train_loss: 2.4658 train_time: 7.7m tok/s: 6827686 +4007/20000 train_loss: 2.4707 train_time: 7.7m tok/s: 6827284 +4008/20000 train_loss: 2.3598 train_time: 7.7m tok/s: 6826900 +4009/20000 train_loss: 2.3881 train_time: 7.7m tok/s: 6826559 +4010/20000 train_loss: 2.4929 train_time: 7.7m tok/s: 6826205 +4011/20000 train_loss: 2.4572 train_time: 7.7m tok/s: 6825842 +4012/20000 train_loss: 2.4730 train_time: 7.7m tok/s: 6825477 +4013/20000 train_loss: 2.3537 train_time: 7.7m tok/s: 6825084 +4014/20000 train_loss: 2.4029 train_time: 7.7m tok/s: 6824740 +4015/20000 train_loss: 2.4278 train_time: 7.7m tok/s: 6824349 +4016/20000 train_loss: 2.4270 train_time: 7.7m tok/s: 6823986 +4017/20000 train_loss: 2.5356 train_time: 7.7m tok/s: 6823622 +4018/20000 train_loss: 2.3880 train_time: 7.7m tok/s: 6823269 +4019/20000 train_loss: 2.3028 train_time: 7.7m tok/s: 6822895 +4020/20000 train_loss: 2.3337 train_time: 7.7m tok/s: 6822535 +4021/20000 train_loss: 2.3816 train_time: 7.7m tok/s: 6822140 +4022/20000 train_loss: 2.4630 train_time: 7.7m tok/s: 6821787 +4023/20000 train_loss: 2.4455 train_time: 7.7m tok/s: 6821435 +4024/20000 train_loss: 2.5103 train_time: 7.7m tok/s: 6821083 +4025/20000 train_loss: 2.4874 train_time: 7.7m tok/s: 6820749 +4026/20000 train_loss: 2.4179 train_time: 7.7m tok/s: 6820369 +4027/20000 train_loss: 2.3457 train_time: 7.7m tok/s: 6819987 +4028/20000 train_loss: 2.2439 train_time: 7.7m tok/s: 6819618 +4029/20000 train_loss: 2.3894 train_time: 7.7m tok/s: 6819268 +4030/20000 train_loss: 2.3979 train_time: 7.7m tok/s: 6818925 +4031/20000 train_loss: 2.4774 train_time: 7.7m tok/s: 6818543 +4032/20000 train_loss: 2.3726 train_time: 7.8m tok/s: 6818187 +4033/20000 train_loss: 2.4349 train_time: 7.8m tok/s: 6817828 +4034/20000 train_loss: 2.6007 train_time: 7.8m tok/s: 6817485 +4035/20000 train_loss: 2.5282 train_time: 7.8m tok/s: 6817126 +4036/20000 train_loss: 2.4933 train_time: 7.8m tok/s: 6816766 +4037/20000 train_loss: 2.4826 train_time: 7.8m tok/s: 6816405 +4038/20000 train_loss: 2.4953 train_time: 7.8m tok/s: 6816036 +4039/20000 train_loss: 2.4712 train_time: 7.8m tok/s: 6815684 +4040/20000 train_loss: 2.4241 train_time: 7.8m tok/s: 6815307 +4041/20000 train_loss: 2.3883 train_time: 7.8m tok/s: 6814941 +4042/20000 train_loss: 2.3673 train_time: 7.8m tok/s: 6814581 +4043/20000 train_loss: 2.3778 train_time: 7.8m tok/s: 6814220 +4044/20000 train_loss: 2.4259 train_time: 7.8m tok/s: 6813854 +4045/20000 train_loss: 2.4568 train_time: 7.8m tok/s: 6813507 +4046/20000 train_loss: 2.5261 train_time: 7.8m tok/s: 6813147 +4047/20000 train_loss: 2.4990 train_time: 7.8m tok/s: 6812800 +4048/20000 train_loss: 2.4298 train_time: 7.8m tok/s: 6812436 +4049/20000 train_loss: 2.5541 train_time: 7.8m tok/s: 6812068 +4050/20000 train_loss: 2.4457 train_time: 7.8m tok/s: 6811695 +4051/20000 train_loss: 2.4080 train_time: 7.8m tok/s: 6811358 +4052/20000 train_loss: 2.4907 train_time: 7.8m tok/s: 6811005 +4053/20000 train_loss: 2.4391 train_time: 7.8m tok/s: 6810630 +4054/20000 train_loss: 2.4194 train_time: 7.8m tok/s: 6810277 +4055/20000 train_loss: 2.4473 train_time: 7.8m tok/s: 6809923 +4056/20000 train_loss: 2.3248 train_time: 7.8m tok/s: 6809532 +4057/20000 train_loss: 2.4150 train_time: 7.8m tok/s: 6809185 +4058/20000 train_loss: 2.5571 train_time: 7.8m tok/s: 6808816 +4059/20000 train_loss: 2.2852 train_time: 7.8m tok/s: 6808479 +4060/20000 train_loss: 2.3964 train_time: 7.8m tok/s: 6808133 +4061/20000 train_loss: 2.4993 train_time: 7.8m tok/s: 6807788 +4062/20000 train_loss: 2.4867 train_time: 7.8m tok/s: 6807439 +4063/20000 train_loss: 2.4527 train_time: 7.8m tok/s: 6807058 +4064/20000 train_loss: 2.4276 train_time: 7.8m tok/s: 6806716 +4065/20000 train_loss: 2.5007 train_time: 7.8m tok/s: 6806378 +4066/20000 train_loss: 2.4107 train_time: 7.8m tok/s: 6806024 +4067/20000 train_loss: 2.4784 train_time: 7.8m tok/s: 6805673 +4068/20000 train_loss: 2.4842 train_time: 7.8m tok/s: 6805320 +4069/20000 train_loss: 2.3702 train_time: 7.8m tok/s: 6804977 +4070/20000 train_loss: 2.3748 train_time: 7.8m tok/s: 6804610 +4071/20000 train_loss: 2.7210 train_time: 7.8m tok/s: 6804234 +4072/20000 train_loss: 2.4521 train_time: 7.8m tok/s: 6803901 +4073/20000 train_loss: 2.4464 train_time: 7.8m tok/s: 6803524 +4074/20000 train_loss: 2.4076 train_time: 7.8m tok/s: 6803175 +4075/20000 train_loss: 2.5606 train_time: 7.9m tok/s: 6802826 +4076/20000 train_loss: 2.5849 train_time: 7.9m tok/s: 6802463 +4077/20000 train_loss: 2.4957 train_time: 7.9m tok/s: 6802131 +4078/20000 train_loss: 2.3756 train_time: 7.9m tok/s: 6801775 +4079/20000 train_loss: 3.0149 train_time: 7.9m tok/s: 6801372 +4080/20000 train_loss: 2.3631 train_time: 7.9m tok/s: 6801003 +4081/20000 train_loss: 2.3900 train_time: 7.9m tok/s: 6800652 +4082/20000 train_loss: 2.4094 train_time: 7.9m tok/s: 6800324 +4083/20000 train_loss: 2.3173 train_time: 7.9m tok/s: 6799984 +4084/20000 train_loss: 2.2540 train_time: 7.9m tok/s: 6799649 +4085/20000 train_loss: 2.3575 train_time: 7.9m tok/s: 6799285 +4086/20000 train_loss: 2.4926 train_time: 7.9m tok/s: 6798956 +4087/20000 train_loss: 2.5344 train_time: 7.9m tok/s: 6798616 +4088/20000 train_loss: 2.5354 train_time: 7.9m tok/s: 6798289 +4089/20000 train_loss: 2.4436 train_time: 7.9m tok/s: 6797939 +4090/20000 train_loss: 2.3718 train_time: 7.9m tok/s: 6797591 +4091/20000 train_loss: 2.5231 train_time: 7.9m tok/s: 6797237 +4092/20000 train_loss: 2.5259 train_time: 7.9m tok/s: 6796863 +4093/20000 train_loss: 2.4950 train_time: 7.9m tok/s: 6796515 +4094/20000 train_loss: 2.2989 train_time: 7.9m tok/s: 6796144 +4095/20000 train_loss: 2.4546 train_time: 7.9m tok/s: 6795817 +4096/20000 train_loss: 2.2626 train_time: 7.9m tok/s: 6795432 +4097/20000 train_loss: 2.3126 train_time: 7.9m tok/s: 6795087 +4098/20000 train_loss: 2.4212 train_time: 7.9m tok/s: 6794761 +4099/20000 train_loss: 2.1958 train_time: 7.9m tok/s: 6794396 +4100/20000 train_loss: 2.4239 train_time: 7.9m tok/s: 6794044 +4101/20000 train_loss: 2.5327 train_time: 7.9m tok/s: 6793728 +4102/20000 train_loss: 2.2890 train_time: 7.9m tok/s: 6793385 +4103/20000 train_loss: 2.4391 train_time: 7.9m tok/s: 6793052 +4104/20000 train_loss: 2.3818 train_time: 7.9m tok/s: 6792697 +4105/20000 train_loss: 2.4331 train_time: 7.9m tok/s: 6792337 +4106/20000 train_loss: 2.4675 train_time: 7.9m tok/s: 6791997 +4107/20000 train_loss: 2.4084 train_time: 7.9m tok/s: 6791664 +4108/20000 train_loss: 2.4185 train_time: 7.9m tok/s: 6791338 +4109/20000 train_loss: 2.3947 train_time: 7.9m tok/s: 6791008 +4110/20000 train_loss: 2.3449 train_time: 7.9m tok/s: 6790642 +4111/20000 train_loss: 2.3994 train_time: 7.9m tok/s: 6790298 +4112/20000 train_loss: 2.4441 train_time: 7.9m tok/s: 6789953 +4113/20000 train_loss: 2.4252 train_time: 7.9m tok/s: 6789603 +4114/20000 train_loss: 2.3944 train_time: 7.9m tok/s: 6789257 +4115/20000 train_loss: 2.5275 train_time: 7.9m tok/s: 6788940 +4116/20000 train_loss: 2.3680 train_time: 7.9m tok/s: 6788601 +4117/20000 train_loss: 2.3803 train_time: 7.9m tok/s: 6788261 +4118/20000 train_loss: 2.4317 train_time: 8.0m tok/s: 6787926 +4119/20000 train_loss: 2.4651 train_time: 8.0m tok/s: 6787592 +4120/20000 train_loss: 2.4368 train_time: 8.0m tok/s: 6787211 +4121/20000 train_loss: 2.4031 train_time: 8.0m tok/s: 6786819 +4122/20000 train_loss: 2.2889 train_time: 8.0m tok/s: 6786489 +4123/20000 train_loss: 2.3871 train_time: 8.0m tok/s: 6786151 +4124/20000 train_loss: 2.2592 train_time: 8.0m tok/s: 6785799 +4125/20000 train_loss: 2.4553 train_time: 8.0m tok/s: 6785463 +4126/20000 train_loss: 2.3752 train_time: 8.0m tok/s: 6785118 +4127/20000 train_loss: 2.3759 train_time: 8.0m tok/s: 6784781 +4128/20000 train_loss: 2.4330 train_time: 8.0m tok/s: 6784468 +4129/20000 train_loss: 2.4652 train_time: 8.0m tok/s: 6784121 +4130/20000 train_loss: 2.4007 train_time: 8.0m tok/s: 6783788 +4131/20000 train_loss: 2.4516 train_time: 8.0m tok/s: 6783446 +4132/20000 train_loss: 2.4241 train_time: 8.0m tok/s: 6783094 +4133/20000 train_loss: 2.4413 train_time: 8.0m tok/s: 6782741 +4134/20000 train_loss: 2.4733 train_time: 8.0m tok/s: 6782379 +4135/20000 train_loss: 2.4713 train_time: 8.0m tok/s: 6782054 +4136/20000 train_loss: 2.2978 train_time: 8.0m tok/s: 6781718 +4137/20000 train_loss: 2.4996 train_time: 8.0m tok/s: 6781407 +4138/20000 train_loss: 2.3066 train_time: 8.0m tok/s: 6781056 +4139/20000 train_loss: 2.4580 train_time: 8.0m tok/s: 6780712 +4140/20000 train_loss: 2.5223 train_time: 8.0m tok/s: 6780379 +4141/20000 train_loss: 2.3355 train_time: 8.0m tok/s: 6780028 +4142/20000 train_loss: 2.5032 train_time: 8.0m tok/s: 6779710 +4143/20000 train_loss: 2.4084 train_time: 8.0m tok/s: 6779393 +4144/20000 train_loss: 2.5095 train_time: 8.0m tok/s: 6779078 +4145/20000 train_loss: 2.2764 train_time: 8.0m tok/s: 6778711 +4146/20000 train_loss: 2.4539 train_time: 8.0m tok/s: 6778360 +4147/20000 train_loss: 2.5479 train_time: 8.0m tok/s: 6778008 +4148/20000 train_loss: 2.4394 train_time: 8.0m tok/s: 6777690 +4149/20000 train_loss: 2.2991 train_time: 8.0m tok/s: 6777353 +4150/20000 train_loss: 2.3534 train_time: 8.0m tok/s: 6777020 +4151/20000 train_loss: 2.4357 train_time: 8.0m tok/s: 6776705 +4152/20000 train_loss: 2.4172 train_time: 8.0m tok/s: 6776374 +4153/20000 train_loss: 2.5071 train_time: 8.0m tok/s: 6776036 +4154/20000 train_loss: 2.3608 train_time: 8.0m tok/s: 6775715 +4155/20000 train_loss: 2.4973 train_time: 8.0m tok/s: 6775407 +4156/20000 train_loss: 2.4304 train_time: 8.0m tok/s: 6775109 +4157/20000 train_loss: 2.4824 train_time: 8.0m tok/s: 6774789 +4158/20000 train_loss: 2.4443 train_time: 8.0m tok/s: 6774482 +4159/20000 train_loss: 2.3514 train_time: 8.0m tok/s: 6774133 +4160/20000 train_loss: 2.3525 train_time: 8.0m tok/s: 6773825 +4161/20000 train_loss: 2.4091 train_time: 8.1m tok/s: 6773469 +4162/20000 train_loss: 2.3212 train_time: 8.1m tok/s: 6773121 +4163/20000 train_loss: 2.5249 train_time: 8.1m tok/s: 6772792 +4164/20000 train_loss: 2.4435 train_time: 8.1m tok/s: 6772485 +4165/20000 train_loss: 2.4464 train_time: 8.1m tok/s: 6772188 +4166/20000 train_loss: 2.4473 train_time: 8.1m tok/s: 6771855 +4167/20000 train_loss: 2.5277 train_time: 8.1m tok/s: 6771528 +4168/20000 train_loss: 2.4350 train_time: 8.1m tok/s: 6771202 +4169/20000 train_loss: 2.4214 train_time: 8.1m tok/s: 6770880 +4170/20000 train_loss: 2.3401 train_time: 8.1m tok/s: 6770524 +4171/20000 train_loss: 2.4406 train_time: 8.1m tok/s: 6770187 +4172/20000 train_loss: 2.3988 train_time: 8.1m tok/s: 6769873 +4173/20000 train_loss: 2.5159 train_time: 8.1m tok/s: 6769546 +4174/20000 train_loss: 2.3083 train_time: 8.1m tok/s: 6769235 +4175/20000 train_loss: 2.4046 train_time: 8.1m tok/s: 6768895 +4176/20000 train_loss: 2.5119 train_time: 8.1m tok/s: 6768570 +4177/20000 train_loss: 2.3823 train_time: 8.1m tok/s: 6768258 +4178/20000 train_loss: 2.4478 train_time: 8.1m tok/s: 6767942 +4179/20000 train_loss: 2.4008 train_time: 8.1m tok/s: 6767595 +4180/20000 train_loss: 2.3933 train_time: 8.1m tok/s: 6767262 +4181/20000 train_loss: 2.5454 train_time: 8.1m tok/s: 6766932 +4182/20000 train_loss: 2.4708 train_time: 8.1m tok/s: 6766600 +4183/20000 train_loss: 2.4617 train_time: 8.1m tok/s: 6766271 +4184/20000 train_loss: 2.4772 train_time: 8.1m tok/s: 6765932 +4185/20000 train_loss: 2.5712 train_time: 8.1m tok/s: 6765591 +4186/20000 train_loss: 2.4695 train_time: 8.1m tok/s: 6765260 +4187/20000 train_loss: 2.2912 train_time: 8.1m tok/s: 6764932 +4188/20000 train_loss: 2.4588 train_time: 8.1m tok/s: 6764592 +4189/20000 train_loss: 2.4491 train_time: 8.1m tok/s: 6764282 +4190/20000 train_loss: 2.5327 train_time: 8.1m tok/s: 6763945 +4191/20000 train_loss: 2.4205 train_time: 8.1m tok/s: 6763618 +4192/20000 train_loss: 2.3717 train_time: 8.1m tok/s: 6763278 +4193/20000 train_loss: 2.4053 train_time: 8.1m tok/s: 6762968 +4194/20000 train_loss: 2.4726 train_time: 8.1m tok/s: 6762624 +4195/20000 train_loss: 2.4852 train_time: 8.1m tok/s: 6762288 +4196/20000 train_loss: 2.3096 train_time: 8.1m tok/s: 6761977 +4197/20000 train_loss: 2.3271 train_time: 8.1m tok/s: 6761650 +4198/20000 train_loss: 2.4300 train_time: 8.1m tok/s: 6761332 +4199/20000 train_loss: 2.2444 train_time: 8.1m tok/s: 6760994 +4200/20000 train_loss: 2.4366 train_time: 8.1m tok/s: 6760646 +4201/20000 train_loss: 2.3697 train_time: 8.1m tok/s: 6760310 +4202/20000 train_loss: 2.4529 train_time: 8.1m tok/s: 6759990 +4203/20000 train_loss: 2.5889 train_time: 8.1m tok/s: 6759669 +4204/20000 train_loss: 2.4933 train_time: 8.2m tok/s: 6759343 +4205/20000 train_loss: 2.4680 train_time: 8.2m tok/s: 6759009 +4206/20000 train_loss: 2.3970 train_time: 8.2m tok/s: 6758683 +4207/20000 train_loss: 2.4083 train_time: 8.2m tok/s: 6758367 +4208/20000 train_loss: 2.3884 train_time: 8.2m tok/s: 6758021 +4209/20000 train_loss: 2.3817 train_time: 8.2m tok/s: 6757706 +4210/20000 train_loss: 2.3879 train_time: 8.2m tok/s: 6757383 +4211/20000 train_loss: 2.5586 train_time: 8.2m tok/s: 6757057 +4212/20000 train_loss: 2.3679 train_time: 8.2m tok/s: 6756717 +4213/20000 train_loss: 2.3980 train_time: 8.2m tok/s: 6756399 +4214/20000 train_loss: 2.4242 train_time: 8.2m tok/s: 6756093 +4215/20000 train_loss: 2.4521 train_time: 8.2m tok/s: 6755771 +4216/20000 train_loss: 2.5499 train_time: 8.2m tok/s: 6755436 +4217/20000 train_loss: 2.2980 train_time: 8.2m tok/s: 6755121 +4218/20000 train_loss: 2.4565 train_time: 8.2m tok/s: 6754783 +4219/20000 train_loss: 2.3816 train_time: 8.2m tok/s: 6754465 +4220/20000 train_loss: 2.0787 train_time: 8.2m tok/s: 6754131 +4221/20000 train_loss: 2.4712 train_time: 8.2m tok/s: 6753810 +4222/20000 train_loss: 2.4677 train_time: 8.2m tok/s: 6753493 +4223/20000 train_loss: 2.5702 train_time: 8.2m tok/s: 6753169 +4224/20000 train_loss: 2.4494 train_time: 8.2m tok/s: 6752855 +4225/20000 train_loss: 2.4554 train_time: 8.2m tok/s: 6752533 +4226/20000 train_loss: 2.4276 train_time: 8.2m tok/s: 6752214 +4227/20000 train_loss: 2.4675 train_time: 8.2m tok/s: 6751874 +4228/20000 train_loss: 2.4481 train_time: 8.2m tok/s: 6751544 +4229/20000 train_loss: 2.4379 train_time: 8.2m tok/s: 6751206 +4230/20000 train_loss: 2.3751 train_time: 8.2m tok/s: 6750881 +4231/20000 train_loss: 2.3649 train_time: 8.2m tok/s: 6750586 +4232/20000 train_loss: 2.4053 train_time: 8.2m tok/s: 6750250 +4233/20000 train_loss: 2.5404 train_time: 8.2m tok/s: 6749916 +4234/20000 train_loss: 2.3469 train_time: 8.2m tok/s: 6749611 +4235/20000 train_loss: 2.5126 train_time: 8.2m tok/s: 6749258 +4236/20000 train_loss: 2.5598 train_time: 8.2m tok/s: 6748940 +4237/20000 train_loss: 2.4393 train_time: 8.2m tok/s: 6748637 +4238/20000 train_loss: 2.5933 train_time: 8.2m tok/s: 6748324 +4239/20000 train_loss: 2.4947 train_time: 8.2m tok/s: 6748020 +4240/20000 train_loss: 2.5707 train_time: 8.2m tok/s: 6747704 +4241/20000 train_loss: 2.3574 train_time: 8.2m tok/s: 6747395 +4242/20000 train_loss: 2.4589 train_time: 8.2m tok/s: 6747057 +4243/20000 train_loss: 2.4055 train_time: 8.2m tok/s: 6746752 +4244/20000 train_loss: 2.4566 train_time: 8.2m tok/s: 6746404 +4245/20000 train_loss: 2.4502 train_time: 8.2m tok/s: 6746093 +4246/20000 train_loss: 2.4900 train_time: 8.3m tok/s: 6745761 +4247/20000 train_loss: 2.3756 train_time: 8.3m tok/s: 6745427 +4248/20000 train_loss: 2.4083 train_time: 8.3m tok/s: 6745116 +4249/20000 train_loss: 2.4298 train_time: 8.3m tok/s: 6744794 +4250/20000 train_loss: 2.3856 train_time: 8.3m tok/s: 6744490 +4251/20000 train_loss: 2.4068 train_time: 8.3m tok/s: 6744193 +4252/20000 train_loss: 2.4820 train_time: 8.3m tok/s: 6743884 +4253/20000 train_loss: 2.5433 train_time: 8.3m tok/s: 6743545 +4254/20000 train_loss: 2.6653 train_time: 8.3m tok/s: 6743216 +4255/20000 train_loss: 2.4511 train_time: 8.3m tok/s: 6742908 +4256/20000 train_loss: 2.5048 train_time: 8.3m tok/s: 6742589 +4257/20000 train_loss: 2.5284 train_time: 8.3m tok/s: 6742281 +4258/20000 train_loss: 2.3311 train_time: 8.3m tok/s: 6741949 +4259/20000 train_loss: 2.4198 train_time: 8.3m tok/s: 6741632 +4260/20000 train_loss: 2.3915 train_time: 8.3m tok/s: 6741316 +4261/20000 train_loss: 2.2948 train_time: 8.3m tok/s: 6741000 +4262/20000 train_loss: 2.3748 train_time: 8.3m tok/s: 6740701 +4263/20000 train_loss: 2.3370 train_time: 8.3m tok/s: 6740383 +4264/20000 train_loss: 2.3772 train_time: 8.3m tok/s: 6740090 +4265/20000 train_loss: 2.4376 train_time: 8.3m tok/s: 6739777 +4266/20000 train_loss: 2.5148 train_time: 8.3m tok/s: 6739453 +4267/20000 train_loss: 2.4057 train_time: 8.3m tok/s: 6739142 +4268/20000 train_loss: 2.4648 train_time: 8.3m tok/s: 6738804 +4269/20000 train_loss: 2.4043 train_time: 8.3m tok/s: 6738490 +4270/20000 train_loss: 2.3187 train_time: 8.3m tok/s: 6738180 +4271/20000 train_loss: 2.4323 train_time: 8.3m tok/s: 6737855 +4272/20000 train_loss: 2.3868 train_time: 8.3m tok/s: 6737528 +4273/20000 train_loss: 2.4063 train_time: 8.3m tok/s: 6737221 +4274/20000 train_loss: 2.4568 train_time: 8.3m tok/s: 6736842 +4275/20000 train_loss: 2.3039 train_time: 8.3m tok/s: 6736528 +4276/20000 train_loss: 2.3880 train_time: 8.3m tok/s: 6736232 +4277/20000 train_loss: 2.3630 train_time: 8.3m tok/s: 6735915 +4278/20000 train_loss: 2.3824 train_time: 8.3m tok/s: 6735624 +4279/20000 train_loss: 2.3721 train_time: 8.3m tok/s: 6735325 +4280/20000 train_loss: 2.5270 train_time: 8.3m tok/s: 6735029 +4281/20000 train_loss: 2.6060 train_time: 8.3m tok/s: 6734726 +4282/20000 train_loss: 2.4338 train_time: 8.3m tok/s: 6734403 +4283/20000 train_loss: 2.6000 train_time: 8.3m tok/s: 6734094 +4284/20000 train_loss: 2.5154 train_time: 8.3m tok/s: 6733779 +4285/20000 train_loss: 2.3909 train_time: 8.3m tok/s: 6733478 +4286/20000 train_loss: 2.4847 train_time: 8.3m tok/s: 6733148 +4287/20000 train_loss: 2.3741 train_time: 8.3m tok/s: 6732837 +4288/20000 train_loss: 2.3949 train_time: 8.3m tok/s: 6732554 +4289/20000 train_loss: 2.5701 train_time: 8.4m tok/s: 6732167 +4290/20000 train_loss: 2.3677 train_time: 8.4m tok/s: 6731883 +4291/20000 train_loss: 2.3203 train_time: 8.4m tok/s: 6731599 +4292/20000 train_loss: 2.4575 train_time: 8.4m tok/s: 6731274 +4293/20000 train_loss: 2.4277 train_time: 8.4m tok/s: 6730978 +4294/20000 train_loss: 2.3798 train_time: 8.4m tok/s: 6730677 +4295/20000 train_loss: 2.4201 train_time: 8.4m tok/s: 6730358 +4296/20000 train_loss: 2.2157 train_time: 8.4m tok/s: 6730042 +4297/20000 train_loss: 2.4754 train_time: 8.4m tok/s: 6729754 +4298/20000 train_loss: 2.4063 train_time: 8.4m tok/s: 6729439 +4299/20000 train_loss: 2.3419 train_time: 8.4m tok/s: 6729124 +4300/20000 train_loss: 2.5151 train_time: 8.4m tok/s: 6728810 +4301/20000 train_loss: 2.1872 train_time: 8.4m tok/s: 6728471 +4302/20000 train_loss: 2.3389 train_time: 8.4m tok/s: 6728139 +4303/20000 train_loss: 2.3301 train_time: 8.4m tok/s: 6727833 +4304/20000 train_loss: 2.3478 train_time: 8.4m tok/s: 6727552 +4305/20000 train_loss: 2.3325 train_time: 8.4m tok/s: 6727245 +4306/20000 train_loss: 2.3940 train_time: 8.4m tok/s: 6726920 +4307/20000 train_loss: 2.3410 train_time: 8.4m tok/s: 6726635 +4308/20000 train_loss: 2.3731 train_time: 8.4m tok/s: 6726319 +4309/20000 train_loss: 2.5797 train_time: 8.4m tok/s: 6726013 +4310/20000 train_loss: 2.4568 train_time: 8.4m tok/s: 6725706 +4311/20000 train_loss: 2.5394 train_time: 8.4m tok/s: 6725408 +4312/20000 train_loss: 2.4171 train_time: 8.4m tok/s: 6725132 +4313/20000 train_loss: 2.2539 train_time: 8.4m tok/s: 6724827 +4314/20000 train_loss: 2.4301 train_time: 8.4m tok/s: 6724521 +4315/20000 train_loss: 2.4073 train_time: 8.4m tok/s: 6724217 +4316/20000 train_loss: 2.4199 train_time: 8.4m tok/s: 6723899 +4317/20000 train_loss: 2.4908 train_time: 8.4m tok/s: 6723570 +4318/20000 train_loss: 2.4865 train_time: 8.4m tok/s: 6723241 +4319/20000 train_loss: 2.1779 train_time: 8.4m tok/s: 6722933 +4320/20000 train_loss: 2.2692 train_time: 8.4m tok/s: 6722647 +4321/20000 train_loss: 2.3728 train_time: 8.4m tok/s: 6722340 +4322/20000 train_loss: 2.4214 train_time: 8.4m tok/s: 6722042 +4323/20000 train_loss: 2.4201 train_time: 8.4m tok/s: 6721741 +4324/20000 train_loss: 2.4411 train_time: 8.4m tok/s: 6721433 +4325/20000 train_loss: 2.4638 train_time: 8.4m tok/s: 6721149 +4326/20000 train_loss: 2.3654 train_time: 8.4m tok/s: 6720827 +4327/20000 train_loss: 2.4112 train_time: 8.4m tok/s: 6720528 +4328/20000 train_loss: 2.3850 train_time: 8.4m tok/s: 6720230 +4329/20000 train_loss: 2.4377 train_time: 8.4m tok/s: 6719927 +4330/20000 train_loss: 2.3047 train_time: 8.4m tok/s: 6719601 +4331/20000 train_loss: 2.3908 train_time: 8.4m tok/s: 6719293 +4332/20000 train_loss: 2.9184 train_time: 8.5m tok/s: 6718952 +4333/20000 train_loss: 2.3836 train_time: 8.5m tok/s: 6718658 +4334/20000 train_loss: 2.4439 train_time: 8.5m tok/s: 6718368 +4335/20000 train_loss: 2.5390 train_time: 8.5m tok/s: 6718055 +4336/20000 train_loss: 2.4153 train_time: 8.5m tok/s: 6717763 +4337/20000 train_loss: 2.5100 train_time: 8.5m tok/s: 6717468 +4338/20000 train_loss: 2.5340 train_time: 8.5m tok/s: 6717158 +4339/20000 train_loss: 2.4406 train_time: 8.5m tok/s: 6716843 +4340/20000 train_loss: 2.4258 train_time: 8.5m tok/s: 6716552 +4341/20000 train_loss: 2.4118 train_time: 8.5m tok/s: 6716250 +4342/20000 train_loss: 2.5188 train_time: 8.5m tok/s: 6715961 +4343/20000 train_loss: 2.4892 train_time: 8.5m tok/s: 6715657 +4344/20000 train_loss: 2.4908 train_time: 8.5m tok/s: 6715354 +4345/20000 train_loss: 2.3258 train_time: 8.5m tok/s: 6715036 +4346/20000 train_loss: 2.3957 train_time: 8.5m tok/s: 6714727 +4347/20000 train_loss: 2.4077 train_time: 8.5m tok/s: 6714450 +4348/20000 train_loss: 2.3980 train_time: 8.5m tok/s: 6714129 +4349/20000 train_loss: 1.8630 train_time: 8.5m tok/s: 6713779 +4350/20000 train_loss: 2.2316 train_time: 8.5m tok/s: 6713482 +4351/20000 train_loss: 2.4199 train_time: 8.5m tok/s: 6713184 +4352/20000 train_loss: 2.3693 train_time: 8.5m tok/s: 6712871 +4353/20000 train_loss: 2.4635 train_time: 8.5m tok/s: 6712586 +4354/20000 train_loss: 2.4780 train_time: 8.5m tok/s: 6712297 +4355/20000 train_loss: 2.4055 train_time: 8.5m tok/s: 6712022 +4356/20000 train_loss: 2.4685 train_time: 8.5m tok/s: 6711721 +4357/20000 train_loss: 2.5099 train_time: 8.5m tok/s: 6711432 +4358/20000 train_loss: 2.4078 train_time: 8.5m tok/s: 6711120 +4359/20000 train_loss: 2.4194 train_time: 8.5m tok/s: 6710820 +4360/20000 train_loss: 2.3209 train_time: 8.5m tok/s: 6710527 +4361/20000 train_loss: 2.2833 train_time: 8.5m tok/s: 6710238 +4362/20000 train_loss: 2.4589 train_time: 8.5m tok/s: 6709949 +4363/20000 train_loss: 2.2172 train_time: 8.5m tok/s: 6709625 +4364/20000 train_loss: 2.3525 train_time: 8.5m tok/s: 6709329 +4365/20000 train_loss: 2.5513 train_time: 8.5m tok/s: 6709030 +4366/20000 train_loss: 2.7729 train_time: 8.5m tok/s: 6708720 +4367/20000 train_loss: 2.5279 train_time: 8.5m tok/s: 6708425 +4368/20000 train_loss: 2.5058 train_time: 8.5m tok/s: 6708146 +4369/20000 train_loss: 2.3113 train_time: 8.5m tok/s: 6707835 +4370/20000 train_loss: 2.4965 train_time: 8.5m tok/s: 6707552 +4371/20000 train_loss: 2.3937 train_time: 8.5m tok/s: 6707260 +4372/20000 train_loss: 2.4795 train_time: 8.5m tok/s: 6706949 +4373/20000 train_loss: 2.2506 train_time: 8.5m tok/s: 6706653 +4374/20000 train_loss: 2.3155 train_time: 8.5m tok/s: 6706341 +4375/20000 train_loss: 2.4081 train_time: 8.6m tok/s: 6706074 +4376/20000 train_loss: 2.3821 train_time: 8.6m tok/s: 6705758 +4377/20000 train_loss: 2.5571 train_time: 8.6m tok/s: 6705462 +4378/20000 train_loss: 2.2715 train_time: 8.6m tok/s: 6705156 +4379/20000 train_loss: 2.3293 train_time: 8.6m tok/s: 6704869 +4380/20000 train_loss: 2.4374 train_time: 8.6m tok/s: 6704583 +4381/20000 train_loss: 2.5038 train_time: 8.6m tok/s: 6704317 +4382/20000 train_loss: 2.4885 train_time: 8.6m tok/s: 6704025 +4383/20000 train_loss: 2.4362 train_time: 8.6m tok/s: 6703727 +4384/20000 train_loss: 2.4967 train_time: 8.6m tok/s: 6703411 +4385/20000 train_loss: 2.4260 train_time: 8.6m tok/s: 6703121 +4386/20000 train_loss: 2.4018 train_time: 8.6m tok/s: 6702827 +4387/20000 train_loss: 2.4719 train_time: 8.6m tok/s: 6702512 +4388/20000 train_loss: 2.3864 train_time: 8.6m tok/s: 6702224 +4389/20000 train_loss: 2.2663 train_time: 8.6m tok/s: 6701922 +4390/20000 train_loss: 2.4075 train_time: 8.6m tok/s: 6701629 +4391/20000 train_loss: 2.2877 train_time: 8.6m tok/s: 6701335 +4392/20000 train_loss: 2.3958 train_time: 8.6m tok/s: 6701049 +4393/20000 train_loss: 2.3930 train_time: 8.6m tok/s: 6700760 +4394/20000 train_loss: 2.3707 train_time: 8.6m tok/s: 6700453 +4395/20000 train_loss: 2.3411 train_time: 8.6m tok/s: 6700148 +4396/20000 train_loss: 2.5812 train_time: 8.6m tok/s: 6699846 +4397/20000 train_loss: 2.4183 train_time: 8.6m tok/s: 6699560 +4398/20000 train_loss: 2.3458 train_time: 8.6m tok/s: 6699272 +4399/20000 train_loss: 2.3643 train_time: 8.6m tok/s: 6698991 +4400/20000 train_loss: 2.5068 train_time: 8.6m tok/s: 6698690 +4401/20000 train_loss: 2.4340 train_time: 8.6m tok/s: 6698403 +4402/20000 train_loss: 2.3762 train_time: 8.6m tok/s: 6698110 +4403/20000 train_loss: 2.3696 train_time: 8.6m tok/s: 6697815 +4404/20000 train_loss: 2.3183 train_time: 8.6m tok/s: 6697521 +4405/20000 train_loss: 2.5382 train_time: 8.6m tok/s: 6697217 +4406/20000 train_loss: 2.4099 train_time: 8.6m tok/s: 6696929 +4407/20000 train_loss: 2.4551 train_time: 8.6m tok/s: 6696631 +4408/20000 train_loss: 2.4768 train_time: 8.6m tok/s: 6696356 +4409/20000 train_loss: 2.4232 train_time: 8.6m tok/s: 6696064 +4410/20000 train_loss: 2.4456 train_time: 8.6m tok/s: 6695775 +4411/20000 train_loss: 2.4647 train_time: 8.6m tok/s: 6695503 +4412/20000 train_loss: 2.5264 train_time: 8.6m tok/s: 6695213 +4413/20000 train_loss: 2.3452 train_time: 8.6m tok/s: 6694915 +4414/20000 train_loss: 2.3544 train_time: 8.6m tok/s: 6694618 +4415/20000 train_loss: 2.5621 train_time: 8.6m tok/s: 6694294 +4416/20000 train_loss: 2.3834 train_time: 8.6m tok/s: 6694011 +4417/20000 train_loss: 2.2985 train_time: 8.6m tok/s: 6693736 +4418/20000 train_loss: 2.3731 train_time: 8.7m tok/s: 6693416 +4419/20000 train_loss: 2.3707 train_time: 8.7m tok/s: 6693125 +4420/20000 train_loss: 2.2848 train_time: 8.7m tok/s: 6692856 +4421/20000 train_loss: 2.2557 train_time: 8.7m tok/s: 6692550 +4422/20000 train_loss: 2.3869 train_time: 8.7m tok/s: 6692263 +4423/20000 train_loss: 2.5359 train_time: 8.7m tok/s: 6691957 +4424/20000 train_loss: 2.4697 train_time: 8.7m tok/s: 6691698 +4425/20000 train_loss: 2.5028 train_time: 8.7m tok/s: 6691398 +4426/20000 train_loss: 2.3080 train_time: 8.7m tok/s: 6691124 +4427/20000 train_loss: 2.3345 train_time: 8.7m tok/s: 6690837 +4428/20000 train_loss: 2.4062 train_time: 8.7m tok/s: 6690548 +4429/20000 train_loss: 2.4093 train_time: 8.7m tok/s: 6690249 +4430/20000 train_loss: 2.5657 train_time: 8.7m tok/s: 6689936 +4431/20000 train_loss: 2.4655 train_time: 8.7m tok/s: 6689616 +4432/20000 train_loss: 2.3598 train_time: 8.7m tok/s: 6689327 +4433/20000 train_loss: 2.2747 train_time: 8.7m tok/s: 6689046 +4434/20000 train_loss: 2.5746 train_time: 8.7m tok/s: 6688735 +4435/20000 train_loss: 2.4354 train_time: 8.7m tok/s: 6688465 +4436/20000 train_loss: 2.4063 train_time: 8.7m tok/s: 6688172 +4437/20000 train_loss: 2.2345 train_time: 8.7m tok/s: 6687883 +4438/20000 train_loss: 2.5254 train_time: 8.7m tok/s: 6687608 +4439/20000 train_loss: 2.4846 train_time: 8.7m tok/s: 6687301 +4440/20000 train_loss: 2.4771 train_time: 8.7m tok/s: 6687042 +4441/20000 train_loss: 2.3540 train_time: 8.7m tok/s: 6686768 +4442/20000 train_loss: 2.4533 train_time: 8.7m tok/s: 6686480 +4443/20000 train_loss: 2.5233 train_time: 8.7m tok/s: 6686222 +4444/20000 train_loss: 2.3749 train_time: 8.7m tok/s: 6685905 +4445/20000 train_loss: 2.3885 train_time: 8.7m tok/s: 6685629 +4446/20000 train_loss: 2.3158 train_time: 8.7m tok/s: 6685360 +4447/20000 train_loss: 2.3641 train_time: 8.7m tok/s: 6685070 +4448/20000 train_loss: 2.3101 train_time: 8.7m tok/s: 6684775 +4449/20000 train_loss: 2.5113 train_time: 8.7m tok/s: 6684489 +4450/20000 train_loss: 2.3496 train_time: 8.7m tok/s: 6684219 +4451/20000 train_loss: 2.4483 train_time: 8.7m tok/s: 6683941 +4452/20000 train_loss: 2.3737 train_time: 8.7m tok/s: 6683665 +4453/20000 train_loss: 2.4793 train_time: 8.7m tok/s: 6683373 +4454/20000 train_loss: 2.3342 train_time: 8.7m tok/s: 6683079 +4455/20000 train_loss: 2.4071 train_time: 8.7m tok/s: 6682800 +4456/20000 train_loss: 2.4193 train_time: 8.7m tok/s: 6682521 +4457/20000 train_loss: 2.6128 train_time: 8.7m tok/s: 6682250 +4458/20000 train_loss: 2.3908 train_time: 8.7m tok/s: 6681968 +4459/20000 train_loss: 2.2680 train_time: 8.7m tok/s: 6681666 +4460/20000 train_loss: 2.3231 train_time: 8.7m tok/s: 6681361 +4461/20000 train_loss: 2.3513 train_time: 8.8m tok/s: 6681089 +4462/20000 train_loss: 2.4796 train_time: 8.8m tok/s: 6680807 +4463/20000 train_loss: 2.2415 train_time: 8.8m tok/s: 6680526 +4464/20000 train_loss: 2.4919 train_time: 8.8m tok/s: 6680256 +4465/20000 train_loss: 2.4584 train_time: 8.8m tok/s: 6679959 +4466/20000 train_loss: 2.5148 train_time: 8.8m tok/s: 6679689 +4467/20000 train_loss: 2.4569 train_time: 8.8m tok/s: 6679421 +4468/20000 train_loss: 2.2138 train_time: 8.8m tok/s: 6679097 +4469/20000 train_loss: 2.3598 train_time: 8.8m tok/s: 6678778 +4470/20000 train_loss: 2.3528 train_time: 8.8m tok/s: 6678501 +4471/20000 train_loss: 2.4346 train_time: 8.8m tok/s: 6678233 +4472/20000 train_loss: 2.4024 train_time: 8.8m tok/s: 6677954 +4473/20000 train_loss: 2.5180 train_time: 8.8m tok/s: 6677684 +4474/20000 train_loss: 2.3633 train_time: 8.8m tok/s: 6677376 +4475/20000 train_loss: 2.4749 train_time: 8.8m tok/s: 6677097 +4476/20000 train_loss: 2.4247 train_time: 8.8m tok/s: 6676831 +4477/20000 train_loss: 2.3418 train_time: 8.8m tok/s: 6676555 +4478/20000 train_loss: 2.3649 train_time: 8.8m tok/s: 6676289 +4479/20000 train_loss: 2.4315 train_time: 8.8m tok/s: 6676004 +4480/20000 train_loss: 2.3858 train_time: 8.8m tok/s: 6675731 +4481/20000 train_loss: 2.4196 train_time: 8.8m tok/s: 6675474 +4482/20000 train_loss: 2.3784 train_time: 8.8m tok/s: 6675186 +4483/20000 train_loss: 2.4186 train_time: 8.8m tok/s: 6674913 +4484/20000 train_loss: 2.5020 train_time: 8.8m tok/s: 6674611 +4485/20000 train_loss: 2.3376 train_time: 8.8m tok/s: 6674339 +4486/20000 train_loss: 2.3928 train_time: 8.8m tok/s: 6674060 +4487/20000 train_loss: 2.3820 train_time: 8.8m tok/s: 6673783 +4488/20000 train_loss: 2.3775 train_time: 8.8m tok/s: 6673480 +4489/20000 train_loss: 2.2819 train_time: 8.8m tok/s: 6673198 +4490/20000 train_loss: 2.4630 train_time: 8.8m tok/s: 6672918 +4491/20000 train_loss: 2.3961 train_time: 8.8m tok/s: 6672639 +4492/20000 train_loss: 2.4524 train_time: 8.8m tok/s: 6672376 +4493/20000 train_loss: 2.4646 train_time: 8.8m tok/s: 6672098 +4494/20000 train_loss: 2.4927 train_time: 8.8m tok/s: 6671818 +4495/20000 train_loss: 2.4502 train_time: 8.8m tok/s: 6671553 +4496/20000 train_loss: 2.3313 train_time: 8.8m tok/s: 6671268 +4497/20000 train_loss: 2.4382 train_time: 8.8m tok/s: 6670978 +4498/20000 train_loss: 2.3732 train_time: 8.8m tok/s: 6670694 +4499/20000 train_loss: 2.3653 train_time: 8.8m tok/s: 6670421 +4500/20000 train_loss: 2.3950 train_time: 8.8m tok/s: 6670160 +4501/20000 train_loss: 2.0644 train_time: 8.8m tok/s: 6669833 +4502/20000 train_loss: 2.3517 train_time: 8.8m tok/s: 6669556 +4503/20000 train_loss: 2.2344 train_time: 8.8m tok/s: 6669282 +4504/20000 train_loss: 2.3176 train_time: 8.9m tok/s: 6669005 +4505/20000 train_loss: 2.3202 train_time: 8.9m tok/s: 6668722 +4506/20000 train_loss: 2.2690 train_time: 8.9m tok/s: 6668444 +4507/20000 train_loss: 2.3560 train_time: 8.9m tok/s: 6668186 +4508/20000 train_loss: 2.4659 train_time: 8.9m tok/s: 6667922 +4509/20000 train_loss: 2.4248 train_time: 8.9m tok/s: 6667638 +4510/20000 train_loss: 2.4551 train_time: 8.9m tok/s: 6667370 +4511/20000 train_loss: 2.2570 train_time: 8.9m tok/s: 6667103 +4512/20000 train_loss: 2.3594 train_time: 8.9m tok/s: 6666833 +4513/20000 train_loss: 2.3782 train_time: 8.9m tok/s: 6666540 +4514/20000 train_loss: 2.3909 train_time: 8.9m tok/s: 6666258 +4515/20000 train_loss: 2.3164 train_time: 8.9m tok/s: 6665984 +4516/20000 train_loss: 2.5687 train_time: 8.9m tok/s: 6665703 +4517/20000 train_loss: 2.6622 train_time: 8.9m tok/s: 6665430 +4518/20000 train_loss: 2.4496 train_time: 8.9m tok/s: 6665172 +4519/20000 train_loss: 2.3671 train_time: 8.9m tok/s: 6664890 +4520/20000 train_loss: 2.3096 train_time: 8.9m tok/s: 6664612 +4521/20000 train_loss: 2.4264 train_time: 8.9m tok/s: 6664342 +4522/20000 train_loss: 2.3487 train_time: 8.9m tok/s: 6664080 +4523/20000 train_loss: 2.4198 train_time: 8.9m tok/s: 6663809 +4524/20000 train_loss: 2.3123 train_time: 8.9m tok/s: 6663535 +4525/20000 train_loss: 2.4865 train_time: 8.9m tok/s: 6663269 +4526/20000 train_loss: 2.4302 train_time: 8.9m tok/s: 6662994 +4527/20000 train_loss: 2.4189 train_time: 8.9m tok/s: 6662727 +4528/20000 train_loss: 2.3302 train_time: 8.9m tok/s: 6662438 +4529/20000 train_loss: 2.3827 train_time: 8.9m tok/s: 6662156 +4530/20000 train_loss: 2.2939 train_time: 8.9m tok/s: 6661898 +4531/20000 train_loss: 2.5299 train_time: 8.9m tok/s: 6661618 +4532/20000 train_loss: 2.4027 train_time: 8.9m tok/s: 6661345 +4533/20000 train_loss: 2.2834 train_time: 8.9m tok/s: 6661084 +4534/20000 train_loss: 2.4274 train_time: 8.9m tok/s: 6660822 +4535/20000 train_loss: 2.4919 train_time: 8.9m tok/s: 6660547 +4536/20000 train_loss: 2.2871 train_time: 8.9m tok/s: 6660264 +4537/20000 train_loss: 2.3841 train_time: 8.9m tok/s: 6659991 +4538/20000 train_loss: 2.1628 train_time: 8.9m tok/s: 6659707 +4539/20000 train_loss: 2.4336 train_time: 8.9m tok/s: 6659429 +4540/20000 train_loss: 2.3688 train_time: 8.9m tok/s: 6659156 +4541/20000 train_loss: 2.2972 train_time: 8.9m tok/s: 6658880 +4542/20000 train_loss: 2.3931 train_time: 8.9m tok/s: 6658622 +4543/20000 train_loss: 2.2963 train_time: 8.9m tok/s: 6658342 +4544/20000 train_loss: 2.6369 train_time: 8.9m tok/s: 6658053 +4545/20000 train_loss: 2.3818 train_time: 8.9m tok/s: 6657795 +4546/20000 train_loss: 2.3842 train_time: 9.0m tok/s: 6657535 +4547/20000 train_loss: 2.3422 train_time: 9.0m tok/s: 6657263 +4548/20000 train_loss: 2.2307 train_time: 9.0m tok/s: 6656991 +4549/20000 train_loss: 2.4269 train_time: 9.0m tok/s: 6656723 +4550/20000 train_loss: 2.4537 train_time: 9.0m tok/s: 6656454 +4551/20000 train_loss: 2.3520 train_time: 9.0m tok/s: 6656198 +4552/20000 train_loss: 2.3447 train_time: 9.0m tok/s: 6655925 +4553/20000 train_loss: 2.2882 train_time: 9.0m tok/s: 6655663 +4554/20000 train_loss: 2.4526 train_time: 9.0m tok/s: 6655368 +4555/20000 train_loss: 2.4530 train_time: 9.0m tok/s: 6655062 +4556/20000 train_loss: 2.4152 train_time: 9.0m tok/s: 6654802 +4557/20000 train_loss: 2.4609 train_time: 9.0m tok/s: 6654558 +4558/20000 train_loss: 2.5769 train_time: 9.0m tok/s: 6654293 +4559/20000 train_loss: 2.4101 train_time: 9.0m tok/s: 6654022 +4560/20000 train_loss: 2.4288 train_time: 9.0m tok/s: 6653761 +4561/20000 train_loss: 2.4439 train_time: 9.0m tok/s: 6653503 +4562/20000 train_loss: 2.3877 train_time: 9.0m tok/s: 6653233 +4563/20000 train_loss: 2.5011 train_time: 9.0m tok/s: 6652959 +4564/20000 train_loss: 2.3212 train_time: 9.0m tok/s: 6652675 +4565/20000 train_loss: 2.3975 train_time: 9.0m tok/s: 6652418 +4566/20000 train_loss: 2.3637 train_time: 9.0m tok/s: 6652149 +4567/20000 train_loss: 2.2315 train_time: 9.0m tok/s: 6651888 +4568/20000 train_loss: 2.2608 train_time: 9.0m tok/s: 6651597 +4569/20000 train_loss: 2.4764 train_time: 9.0m tok/s: 6651327 +4570/20000 train_loss: 2.3482 train_time: 9.0m tok/s: 6651079 +4571/20000 train_loss: 2.4160 train_time: 9.0m tok/s: 6650818 +4572/20000 train_loss: 2.3996 train_time: 9.0m tok/s: 6650566 +4573/20000 train_loss: 2.3932 train_time: 9.0m tok/s: 6650303 +4574/20000 train_loss: 2.4027 train_time: 9.0m tok/s: 6650019 +4575/20000 train_loss: 2.2857 train_time: 9.0m tok/s: 6649716 +4576/20000 train_loss: 2.4186 train_time: 9.0m tok/s: 6649444 +4577/20000 train_loss: 2.4146 train_time: 9.0m tok/s: 6649169 +4578/20000 train_loss: 2.3690 train_time: 9.0m tok/s: 6648910 +4579/20000 train_loss: 2.4699 train_time: 9.0m tok/s: 6648630 +4580/20000 train_loss: 2.2528 train_time: 9.0m tok/s: 6648364 +4581/20000 train_loss: 1.9318 train_time: 9.0m tok/s: 6648067 +4582/20000 train_loss: 2.3707 train_time: 9.0m tok/s: 6647794 +4583/20000 train_loss: 2.4074 train_time: 9.0m tok/s: 6647576 +4584/20000 train_loss: 2.4615 train_time: 9.0m tok/s: 6647322 +4585/20000 train_loss: 2.3161 train_time: 9.0m tok/s: 6647050 +4586/20000 train_loss: 2.3606 train_time: 9.0m tok/s: 6646807 +4587/20000 train_loss: 2.5063 train_time: 9.0m tok/s: 6646553 +4588/20000 train_loss: 2.3980 train_time: 9.0m tok/s: 6646287 +4589/20000 train_loss: 2.4595 train_time: 9.1m tok/s: 6646037 +4590/20000 train_loss: 2.3099 train_time: 9.1m tok/s: 6645773 +4591/20000 train_loss: 2.3671 train_time: 9.1m tok/s: 6645524 +4592/20000 train_loss: 2.2558 train_time: 9.1m tok/s: 6645274 +4593/20000 train_loss: 2.4547 train_time: 9.1m tok/s: 6645019 +4594/20000 train_loss: 2.3219 train_time: 9.1m tok/s: 6644754 +4595/20000 train_loss: 2.1884 train_time: 9.1m tok/s: 6644426 +4596/20000 train_loss: 2.4317 train_time: 9.1m tok/s: 6644154 +4597/20000 train_loss: 2.4143 train_time: 9.1m tok/s: 6643872 +4598/20000 train_loss: 2.4631 train_time: 9.1m tok/s: 6643598 +4599/20000 train_loss: 2.3784 train_time: 9.1m tok/s: 6643350 +4600/20000 train_loss: 2.5586 train_time: 9.1m tok/s: 6643090 +4601/20000 train_loss: 2.4844 train_time: 9.1m tok/s: 6642822 +4602/20000 train_loss: 2.5096 train_time: 9.1m tok/s: 6642560 +4603/20000 train_loss: 2.3824 train_time: 9.1m tok/s: 6642293 +4604/20000 train_loss: 2.3457 train_time: 9.1m tok/s: 6642026 +4605/20000 train_loss: 2.3507 train_time: 9.1m tok/s: 6641752 +4606/20000 train_loss: 2.3465 train_time: 9.1m tok/s: 6641486 +4607/20000 train_loss: 2.3530 train_time: 9.1m tok/s: 6641243 +4608/20000 train_loss: 2.3721 train_time: 9.1m tok/s: 6640982 +4609/20000 train_loss: 2.3216 train_time: 9.1m tok/s: 6640717 +4610/20000 train_loss: 2.4233 train_time: 9.1m tok/s: 6640454 +4611/20000 train_loss: 2.5809 train_time: 9.1m tok/s: 6640171 +4612/20000 train_loss: 2.4514 train_time: 9.1m tok/s: 6639926 +4613/20000 train_loss: 2.5024 train_time: 9.1m tok/s: 6639658 +4614/20000 train_loss: 2.4580 train_time: 9.1m tok/s: 6639402 +4615/20000 train_loss: 2.3091 train_time: 9.1m tok/s: 6639144 +4616/20000 train_loss: 2.2770 train_time: 9.1m tok/s: 6638886 +4617/20000 train_loss: 2.3153 train_time: 9.1m tok/s: 6638628 +4618/20000 train_loss: 2.3728 train_time: 9.1m tok/s: 6638386 +4619/20000 train_loss: 2.2854 train_time: 9.1m tok/s: 6638136 +4620/20000 train_loss: 2.3506 train_time: 9.1m tok/s: 6637859 +4621/20000 train_loss: 2.3922 train_time: 9.1m tok/s: 6637587 +4622/20000 train_loss: 2.4159 train_time: 9.1m tok/s: 6637314 +4623/20000 train_loss: 2.3734 train_time: 9.1m tok/s: 6637061 +4624/20000 train_loss: 2.5494 train_time: 9.1m tok/s: 6636795 +4625/20000 train_loss: 2.5872 train_time: 9.1m tok/s: 6636531 +4626/20000 train_loss: 2.5001 train_time: 9.1m tok/s: 6636264 +4627/20000 train_loss: 2.3993 train_time: 9.1m tok/s: 6636020 +4628/20000 train_loss: 2.4512 train_time: 9.1m tok/s: 6635752 +4629/20000 train_loss: 2.3709 train_time: 9.1m tok/s: 6635494 +4630/20000 train_loss: 2.1550 train_time: 9.1m tok/s: 6635235 +4631/20000 train_loss: 2.3894 train_time: 9.1m tok/s: 6634993 +4632/20000 train_loss: 2.3094 train_time: 9.2m tok/s: 6634725 +4633/20000 train_loss: 2.2341 train_time: 9.2m tok/s: 6634475 +4634/20000 train_loss: 2.3453 train_time: 9.2m tok/s: 6634217 +4635/20000 train_loss: 2.4732 train_time: 9.2m tok/s: 6633954 +4636/20000 train_loss: 2.3893 train_time: 9.2m tok/s: 6633692 +4637/20000 train_loss: 2.4637 train_time: 9.2m tok/s: 6633444 +4638/20000 train_loss: 2.4544 train_time: 9.2m tok/s: 6633201 +4639/20000 train_loss: 2.3905 train_time: 9.2m tok/s: 6632947 +4640/20000 train_loss: 2.4066 train_time: 9.2m tok/s: 6632697 +4641/20000 train_loss: 2.3503 train_time: 9.2m tok/s: 6632445 +4642/20000 train_loss: 2.3108 train_time: 9.2m tok/s: 6632200 +4643/20000 train_loss: 2.4191 train_time: 9.2m tok/s: 6631955 +4644/20000 train_loss: 2.2541 train_time: 9.2m tok/s: 6631700 +4645/20000 train_loss: 2.3491 train_time: 9.2m tok/s: 6631444 +4646/20000 train_loss: 2.2912 train_time: 9.2m tok/s: 6631195 +4647/20000 train_loss: 2.2010 train_time: 9.2m tok/s: 6630945 +4648/20000 train_loss: 2.3714 train_time: 9.2m tok/s: 6630707 +4649/20000 train_loss: 2.3939 train_time: 9.2m tok/s: 6630464 +4650/20000 train_loss: 2.4119 train_time: 9.2m tok/s: 6630228 +4651/20000 train_loss: 2.4866 train_time: 9.2m tok/s: 6629999 +4652/20000 train_loss: 2.4629 train_time: 9.2m tok/s: 6629747 +4653/20000 train_loss: 2.4359 train_time: 9.2m tok/s: 6629497 +4654/20000 train_loss: 2.3003 train_time: 9.2m tok/s: 6629248 +4655/20000 train_loss: 2.2771 train_time: 9.2m tok/s: 6628995 +4656/20000 train_loss: 2.4664 train_time: 9.2m tok/s: 6628764 +4657/20000 train_loss: 2.3533 train_time: 9.2m tok/s: 6628495 +4658/20000 train_loss: 2.2944 train_time: 9.2m tok/s: 6628252 +4659/20000 train_loss: 2.3303 train_time: 9.2m tok/s: 6628016 +4660/20000 train_loss: 2.3425 train_time: 9.2m tok/s: 6627756 +4661/20000 train_loss: 2.0352 train_time: 9.2m tok/s: 6627447 +4662/20000 train_loss: 2.3568 train_time: 9.2m tok/s: 6627194 +4663/20000 train_loss: 2.3865 train_time: 9.2m tok/s: 6626965 +4664/20000 train_loss: 2.2944 train_time: 9.2m tok/s: 6626709 +4665/20000 train_loss: 2.4455 train_time: 9.2m tok/s: 6626454 +4666/20000 train_loss: 2.3456 train_time: 9.2m tok/s: 6626193 +4667/20000 train_loss: 2.4316 train_time: 9.2m tok/s: 6625938 +4668/20000 train_loss: 2.4001 train_time: 9.2m tok/s: 6625699 +4669/20000 train_loss: 2.6147 train_time: 9.2m tok/s: 6625444 +4670/20000 train_loss: 2.2820 train_time: 9.2m tok/s: 6625197 +4671/20000 train_loss: 2.4562 train_time: 9.2m tok/s: 6624952 +4672/20000 train_loss: 2.3529 train_time: 9.2m tok/s: 6624686 +4673/20000 train_loss: 2.3304 train_time: 9.2m tok/s: 6624426 +4674/20000 train_loss: 2.4379 train_time: 9.2m tok/s: 6624166 +4675/20000 train_loss: 2.3725 train_time: 9.3m tok/s: 6623915 +4676/20000 train_loss: 2.3621 train_time: 9.3m tok/s: 6623670 +4677/20000 train_loss: 2.4199 train_time: 9.3m tok/s: 6623416 +4678/20000 train_loss: 2.3886 train_time: 9.3m tok/s: 6623166 +4679/20000 train_loss: 2.3009 train_time: 9.3m tok/s: 6622917 +4680/20000 train_loss: 2.3070 train_time: 9.3m tok/s: 6622670 +4681/20000 train_loss: 2.3792 train_time: 9.3m tok/s: 6622414 +4682/20000 train_loss: 2.2922 train_time: 9.3m tok/s: 6622153 +4683/20000 train_loss: 2.3632 train_time: 9.3m tok/s: 6621890 +4684/20000 train_loss: 2.3499 train_time: 9.3m tok/s: 6621638 +4685/20000 train_loss: 2.3248 train_time: 9.3m tok/s: 6621368 +4686/20000 train_loss: 2.2573 train_time: 9.3m tok/s: 6621106 +4687/20000 train_loss: 2.3874 train_time: 9.3m tok/s: 6620861 +4688/20000 train_loss: 2.4494 train_time: 9.3m tok/s: 6620604 +4689/20000 train_loss: 2.4416 train_time: 9.3m tok/s: 6620353 +4690/20000 train_loss: 2.4997 train_time: 9.3m tok/s: 6620119 +4691/20000 train_loss: 2.4036 train_time: 9.3m tok/s: 6619868 +4692/20000 train_loss: 2.4231 train_time: 9.3m tok/s: 6619620 +4693/20000 train_loss: 2.3706 train_time: 9.3m tok/s: 6619361 +4694/20000 train_loss: 2.4534 train_time: 9.3m tok/s: 6619109 +4695/20000 train_loss: 2.4427 train_time: 9.3m tok/s: 6618859 +4696/20000 train_loss: 2.3429 train_time: 9.3m tok/s: 6618617 +4697/20000 train_loss: 2.3422 train_time: 9.3m tok/s: 6618387 +4698/20000 train_loss: 2.3717 train_time: 9.3m tok/s: 6618134 +4699/20000 train_loss: 2.3659 train_time: 9.3m tok/s: 6617887 +4700/20000 train_loss: 2.1605 train_time: 9.3m tok/s: 6617640 +4701/20000 train_loss: 2.4151 train_time: 9.3m tok/s: 6617393 +4702/20000 train_loss: 2.5048 train_time: 9.3m tok/s: 6617108 +4703/20000 train_loss: 2.4500 train_time: 9.3m tok/s: 6616873 +4704/20000 train_loss: 2.3683 train_time: 9.3m tok/s: 6616622 +4705/20000 train_loss: 2.3354 train_time: 9.3m tok/s: 6616393 +4706/20000 train_loss: 2.3323 train_time: 9.3m tok/s: 6616141 +4707/20000 train_loss: 2.4488 train_time: 9.3m tok/s: 6615885 +4708/20000 train_loss: 2.3587 train_time: 9.3m tok/s: 6615607 +4709/20000 train_loss: 2.3520 train_time: 9.3m tok/s: 6615352 +4710/20000 train_loss: 2.3980 train_time: 9.3m tok/s: 6615113 +4711/20000 train_loss: 2.3642 train_time: 9.3m tok/s: 6614860 +4712/20000 train_loss: 2.3221 train_time: 9.3m tok/s: 6614607 +4713/20000 train_loss: 2.4418 train_time: 9.3m tok/s: 6614371 +4714/20000 train_loss: 2.3435 train_time: 9.3m tok/s: 6614099 +4715/20000 train_loss: 2.4675 train_time: 9.3m tok/s: 6613846 +4716/20000 train_loss: 2.4935 train_time: 9.3m tok/s: 6613603 +4717/20000 train_loss: 2.3264 train_time: 9.3m tok/s: 6613356 +4718/20000 train_loss: 2.3116 train_time: 9.4m tok/s: 6613090 +4719/20000 train_loss: 2.3550 train_time: 9.4m tok/s: 6612854 +4720/20000 train_loss: 2.3015 train_time: 9.4m tok/s: 6612600 +4721/20000 train_loss: 2.2834 train_time: 9.4m tok/s: 6612349 +4722/20000 train_loss: 2.4777 train_time: 9.4m tok/s: 6612116 +4723/20000 train_loss: 2.3203 train_time: 9.4m tok/s: 6611874 +4724/20000 train_loss: 2.3666 train_time: 9.4m tok/s: 6611632 +4725/20000 train_loss: 2.2594 train_time: 9.4m tok/s: 6611373 +4726/20000 train_loss: 2.4021 train_time: 9.4m tok/s: 6611131 +4727/20000 train_loss: 2.3799 train_time: 9.4m tok/s: 6610890 +4728/20000 train_loss: 2.3647 train_time: 9.4m tok/s: 6610640 +4729/20000 train_loss: 2.3689 train_time: 9.4m tok/s: 6610384 +4730/20000 train_loss: 2.2038 train_time: 9.4m tok/s: 6610136 +4731/20000 train_loss: 2.3653 train_time: 9.4m tok/s: 6609891 +4732/20000 train_loss: 2.3175 train_time: 9.4m tok/s: 6609652 +4733/20000 train_loss: 2.4170 train_time: 9.4m tok/s: 6609405 +4734/20000 train_loss: 2.3977 train_time: 9.4m tok/s: 6609164 +4735/20000 train_loss: 2.1970 train_time: 9.4m tok/s: 6608888 +4736/20000 train_loss: 2.3385 train_time: 9.4m tok/s: 6608642 +4737/20000 train_loss: 2.4767 train_time: 9.4m tok/s: 6608404 +4738/20000 train_loss: 2.3494 train_time: 9.4m tok/s: 6608165 +4739/20000 train_loss: 2.4865 train_time: 9.4m tok/s: 6607873 +4740/20000 train_loss: 2.4580 train_time: 9.4m tok/s: 6607617 +4741/20000 train_loss: 2.3785 train_time: 9.4m tok/s: 6607380 +4742/20000 train_loss: 2.3383 train_time: 9.4m tok/s: 6607140 +4743/20000 train_loss: 2.2986 train_time: 9.4m tok/s: 6606886 +4744/20000 train_loss: 2.3717 train_time: 9.4m tok/s: 6606648 +4745/20000 train_loss: 2.2579 train_time: 9.4m tok/s: 6606409 +4746/20000 train_loss: 2.2244 train_time: 9.4m tok/s: 6606170 +4747/20000 train_loss: 2.4595 train_time: 9.4m tok/s: 6605945 +4748/20000 train_loss: 2.4019 train_time: 9.4m tok/s: 6605674 +4749/20000 train_loss: 2.3708 train_time: 9.4m tok/s: 6605426 +4750/20000 train_loss: 2.4064 train_time: 9.4m tok/s: 6605193 +4751/20000 train_loss: 2.3283 train_time: 9.4m tok/s: 6604958 +4752/20000 train_loss: 2.3816 train_time: 9.4m tok/s: 6604736 +4753/20000 train_loss: 2.2606 train_time: 9.4m tok/s: 6604478 +4754/20000 train_loss: 2.2257 train_time: 9.4m tok/s: 6604233 +4755/20000 train_loss: 2.4110 train_time: 9.4m tok/s: 6604003 +4756/20000 train_loss: 2.3799 train_time: 9.4m tok/s: 6603762 +4757/20000 train_loss: 2.3298 train_time: 9.4m tok/s: 6603499 +4758/20000 train_loss: 2.5531 train_time: 9.4m tok/s: 6603254 +4759/20000 train_loss: 2.3864 train_time: 9.4m tok/s: 6603018 +4760/20000 train_loss: 2.2650 train_time: 9.4m tok/s: 6602794 +4761/20000 train_loss: 2.4430 train_time: 9.5m tok/s: 6602547 +4762/20000 train_loss: 2.3488 train_time: 9.5m tok/s: 6602297 +4763/20000 train_loss: 2.4830 train_time: 9.5m tok/s: 6602053 +4764/20000 train_loss: 2.4408 train_time: 9.5m tok/s: 6601803 +4765/20000 train_loss: 2.4300 train_time: 9.5m tok/s: 6601563 +4766/20000 train_loss: 2.3526 train_time: 9.5m tok/s: 6601326 +4767/20000 train_loss: 2.3114 train_time: 9.5m tok/s: 6601082 +4768/20000 train_loss: 2.3788 train_time: 9.5m tok/s: 6600833 +4769/20000 train_loss: 2.3524 train_time: 9.5m tok/s: 6600590 +4770/20000 train_loss: 2.4241 train_time: 9.5m tok/s: 6600363 +4771/20000 train_loss: 2.3510 train_time: 9.5m tok/s: 6600135 +4772/20000 train_loss: 2.4267 train_time: 9.5m tok/s: 6599876 +4773/20000 train_loss: 2.4416 train_time: 9.5m tok/s: 6599626 +4774/20000 train_loss: 2.5463 train_time: 9.5m tok/s: 6599376 +4775/20000 train_loss: 2.4286 train_time: 9.5m tok/s: 6599134 +4776/20000 train_loss: 2.5284 train_time: 9.5m tok/s: 6598914 +4777/20000 train_loss: 2.3986 train_time: 9.5m tok/s: 6598677 +4778/20000 train_loss: 2.3653 train_time: 9.5m tok/s: 6598438 +4779/20000 train_loss: 2.1994 train_time: 9.5m tok/s: 6598192 +4780/20000 train_loss: 2.3140 train_time: 9.5m tok/s: 6597948 +4781/20000 train_loss: 2.3111 train_time: 9.5m tok/s: 6597711 +4782/20000 train_loss: 2.7084 train_time: 9.5m tok/s: 6597447 +4783/20000 train_loss: 2.4233 train_time: 9.5m tok/s: 6597193 +4784/20000 train_loss: 2.3772 train_time: 9.5m tok/s: 6596968 +4785/20000 train_loss: 2.4244 train_time: 9.5m tok/s: 6596731 +4786/20000 train_loss: 2.4241 train_time: 9.5m tok/s: 6596508 +4787/20000 train_loss: 2.3613 train_time: 9.5m tok/s: 6596258 +4788/20000 train_loss: 2.3749 train_time: 9.5m tok/s: 6596024 +4789/20000 train_loss: 2.1519 train_time: 9.5m tok/s: 6595782 +4790/20000 train_loss: 2.4103 train_time: 9.5m tok/s: 6595541 +4791/20000 train_loss: 2.3498 train_time: 9.5m tok/s: 6595297 +4792/20000 train_loss: 2.2465 train_time: 9.5m tok/s: 6595041 +4793/20000 train_loss: 2.3518 train_time: 9.5m tok/s: 6594809 +4794/20000 train_loss: 2.2254 train_time: 9.5m tok/s: 6594584 +4795/20000 train_loss: 2.3972 train_time: 9.5m tok/s: 6594345 +4796/20000 train_loss: 2.2546 train_time: 9.5m tok/s: 6594116 +4797/20000 train_loss: 2.4077 train_time: 9.5m tok/s: 6593867 +4798/20000 train_loss: 2.5517 train_time: 9.5m tok/s: 6593616 +4799/20000 train_loss: 2.4754 train_time: 9.5m tok/s: 6593376 +4800/20000 train_loss: 2.4456 train_time: 9.5m tok/s: 6593163 +4801/20000 train_loss: 2.4197 train_time: 9.5m tok/s: 6592913 +4802/20000 train_loss: 2.3262 train_time: 9.5m tok/s: 6592671 +4803/20000 train_loss: 2.5160 train_time: 9.5m tok/s: 6592429 +4804/20000 train_loss: 1.9498 train_time: 9.6m tok/s: 6592133 +4805/20000 train_loss: 2.3894 train_time: 9.6m tok/s: 6591900 +4806/20000 train_loss: 2.3754 train_time: 9.6m tok/s: 6591677 +4807/20000 train_loss: 2.3658 train_time: 9.6m tok/s: 6591457 +4808/20000 train_loss: 2.2983 train_time: 9.6m tok/s: 6591240 +4809/20000 train_loss: 2.4076 train_time: 9.6m tok/s: 6591001 +4810/20000 train_loss: 2.4581 train_time: 9.6m tok/s: 6590773 +4811/20000 train_loss: 2.4040 train_time: 9.6m tok/s: 6590536 +4812/20000 train_loss: 2.3559 train_time: 9.6m tok/s: 6590311 +4813/20000 train_loss: 2.3541 train_time: 9.6m tok/s: 6590073 +4814/20000 train_loss: 2.3842 train_time: 9.6m tok/s: 6589857 +4815/20000 train_loss: 2.4448 train_time: 9.6m tok/s: 6589631 +4816/20000 train_loss: 2.3632 train_time: 9.6m tok/s: 6589397 +4817/20000 train_loss: 2.4957 train_time: 9.6m tok/s: 6589159 +4818/20000 train_loss: 2.2857 train_time: 9.6m tok/s: 6588925 +4819/20000 train_loss: 2.4361 train_time: 9.6m tok/s: 6588692 +4820/20000 train_loss: 2.3091 train_time: 9.6m tok/s: 6588438 +4821/20000 train_loss: 2.3911 train_time: 9.6m tok/s: 6588182 +4822/20000 train_loss: 2.4185 train_time: 9.6m tok/s: 6587940 +4823/20000 train_loss: 2.3737 train_time: 9.6m tok/s: 6587689 +4824/20000 train_loss: 2.4813 train_time: 9.6m tok/s: 6587441 +4825/20000 train_loss: 2.5626 train_time: 9.6m tok/s: 6587208 +4826/20000 train_loss: 2.3041 train_time: 9.6m tok/s: 6586972 +4827/20000 train_loss: 2.2831 train_time: 9.6m tok/s: 6586737 +4828/20000 train_loss: 2.3220 train_time: 9.6m tok/s: 6586502 +4829/20000 train_loss: 2.3607 train_time: 9.6m tok/s: 6586269 +4830/20000 train_loss: 2.3450 train_time: 9.6m tok/s: 6586050 +4831/20000 train_loss: 2.2964 train_time: 9.6m tok/s: 6585806 +4832/20000 train_loss: 2.2531 train_time: 9.6m tok/s: 6585579 +4833/20000 train_loss: 2.2718 train_time: 9.6m tok/s: 6585338 +4834/20000 train_loss: 2.3919 train_time: 9.6m tok/s: 6585104 +4835/20000 train_loss: 2.3391 train_time: 9.6m tok/s: 6584891 +4836/20000 train_loss: 2.3539 train_time: 9.6m tok/s: 6584639 +4837/20000 train_loss: 2.5280 train_time: 9.6m tok/s: 6584398 +4838/20000 train_loss: 2.2809 train_time: 9.6m tok/s: 6584168 +4839/20000 train_loss: 2.4352 train_time: 9.6m tok/s: 6583947 +4840/20000 train_loss: 2.2695 train_time: 9.6m tok/s: 6583715 +4841/20000 train_loss: 2.4242 train_time: 9.6m tok/s: 6583477 +4842/20000 train_loss: 2.6819 train_time: 9.6m tok/s: 6583233 +4843/20000 train_loss: 2.3050 train_time: 9.6m tok/s: 6583012 +4844/20000 train_loss: 2.2924 train_time: 9.6m tok/s: 6582774 +4845/20000 train_loss: 2.3218 train_time: 9.6m tok/s: 6582543 +4846/20000 train_loss: 2.3417 train_time: 9.6m tok/s: 6582288 +4847/20000 train_loss: 2.5504 train_time: 9.7m tok/s: 6582042 +4848/20000 train_loss: 2.3142 train_time: 9.7m tok/s: 6581817 +4849/20000 train_loss: 2.4529 train_time: 9.7m tok/s: 6581597 +4850/20000 train_loss: 2.2955 train_time: 9.7m tok/s: 6581372 +4851/20000 train_loss: 2.3363 train_time: 9.7m tok/s: 6581139 +4852/20000 train_loss: 2.3532 train_time: 9.7m tok/s: 6580906 +4853/20000 train_loss: 2.2921 train_time: 9.7m tok/s: 6580665 +4854/20000 train_loss: 2.4601 train_time: 9.7m tok/s: 6580431 +4855/20000 train_loss: 2.3503 train_time: 9.7m tok/s: 6580204 +4856/20000 train_loss: 2.2825 train_time: 9.7m tok/s: 6579973 +4857/20000 train_loss: 2.2879 train_time: 9.7m tok/s: 6579739 +4858/20000 train_loss: 2.2688 train_time: 9.7m tok/s: 6579496 +4859/20000 train_loss: 2.3118 train_time: 9.7m tok/s: 6579257 +4860/20000 train_loss: 2.4731 train_time: 9.7m tok/s: 6579042 +4861/20000 train_loss: 2.4071 train_time: 9.7m tok/s: 6578818 +4862/20000 train_loss: 2.3123 train_time: 9.7m tok/s: 6578578 +4863/20000 train_loss: 2.5817 train_time: 9.7m tok/s: 6578346 +4864/20000 train_loss: 2.3998 train_time: 9.7m tok/s: 6578120 +4865/20000 train_loss: 2.5060 train_time: 9.7m tok/s: 6577877 +4866/20000 train_loss: 2.2712 train_time: 9.7m tok/s: 6577634 +4867/20000 train_loss: 2.2445 train_time: 9.7m tok/s: 6577423 +4868/20000 train_loss: 2.3159 train_time: 9.7m tok/s: 6577203 +4869/20000 train_loss: 2.4389 train_time: 9.7m tok/s: 6576947 +4870/20000 train_loss: 2.3591 train_time: 9.7m tok/s: 6576719 +4871/20000 train_loss: 2.3657 train_time: 9.7m tok/s: 6576484 +4872/20000 train_loss: 2.2724 train_time: 9.7m tok/s: 6576206 +4873/20000 train_loss: 2.5457 train_time: 9.7m tok/s: 6575977 +4874/20000 train_loss: 2.4191 train_time: 9.7m tok/s: 6575769 +4875/20000 train_loss: 2.4048 train_time: 9.7m tok/s: 6575560 +4876/20000 train_loss: 2.4029 train_time: 9.7m tok/s: 6575338 +4877/20000 train_loss: 2.3022 train_time: 9.7m tok/s: 6575112 +4878/20000 train_loss: 2.4043 train_time: 9.7m tok/s: 6574870 +4879/20000 train_loss: 2.3621 train_time: 9.7m tok/s: 6574642 +4880/20000 train_loss: 2.4352 train_time: 9.7m tok/s: 6574418 +4881/20000 train_loss: 2.3415 train_time: 9.7m tok/s: 6574187 +4882/20000 train_loss: 2.2759 train_time: 9.7m tok/s: 6573983 +4883/20000 train_loss: 2.2529 train_time: 9.7m tok/s: 6573761 +4884/20000 train_loss: 2.6038 train_time: 9.7m tok/s: 6573522 +4885/20000 train_loss: 2.4663 train_time: 9.7m tok/s: 6573298 +4886/20000 train_loss: 2.4370 train_time: 9.7m tok/s: 6573080 +4887/20000 train_loss: 2.4063 train_time: 9.7m tok/s: 6572857 +4888/20000 train_loss: 2.4064 train_time: 9.7m tok/s: 6572626 +4889/20000 train_loss: 2.3502 train_time: 9.8m tok/s: 6572402 +4890/20000 train_loss: 2.4054 train_time: 9.8m tok/s: 6572170 +4891/20000 train_loss: 2.1725 train_time: 9.8m tok/s: 6571937 +4892/20000 train_loss: 1.9582 train_time: 9.8m tok/s: 6571638 +4893/20000 train_loss: 2.4374 train_time: 9.8m tok/s: 6571414 +4894/20000 train_loss: 2.3811 train_time: 9.8m tok/s: 6571200 +4895/20000 train_loss: 2.3791 train_time: 9.8m tok/s: 6571002 +4895/20000 val_loss: 2.3588 val_bpb: 1.0778 +stopping_early: wallclock_cap train_time: 585887ms step: 4895/20000 +peak memory allocated: 41707 MiB reserved: 47048 MiB +ema:applying EMA weights +diagnostic pre-quantization post-ema val_loss:2.33484962 val_bpb:1.06686712 eval_time:7856ms +Serialized model: 135418111 bytes +Code size (uncompressed): 182796 bytes +Code size (compressed): 45910 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 4.1s +Quantized weights: + gate_int8_row: blocks.attn.attn_gate_w + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int6)+lqer_asym: blocks.mlp.fc.weight + gptq (int7)+awqgrpint8+lqer_asym: tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, parallel_post_lambdas, parallel_resid_lambdas, skip_gates, skip_weights, smear_gate.weight, smear_lambda, softcap_neg, softcap_pos +Serialize: per-group lrzip compression... +Serialize: per-group compression done in 122.7s +Serialized model quantized+pergroup: 15945116 bytes +Total submission size quantized+pergroup: 15991026 bytes +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 21.0s +diagnostic quantized val_loss:2.35268328 val_bpb:1.07501589 eval_time:10604ms +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 20.9s +ttt_lora:warming up compile (random tokens, no val data) +ttt_lora:compile warmup done (105.6s) +v5:precomputing ngram hints OUTSIDE eval timer +ngram_tilt:hints total=47851520 gated=13023303 token_gate=628130 within_gate=9866847 word_gate=2891588 agree2plus=303177 +ngram_tilt:precompute_outside_timer_done elapsed=164.70s total_targets=47851520 + +beginning TTT eval timer +ngram_tilt:using_precomputed_hints total_targets=47851520 (precompute time excluded from eval) +ttt_phased: total_docs:50000 prefix_docs:2500 suffix_docs:47500 num_phases:3 boundaries:[833, 1666, 2500] +ttp: b781/782 bl:2.1447 bb:1.0494 rl:2.1447 rb:1.0494 dl:17258-30330 gd:0 +ttpp: phase:1/3 pd:1296 gd:833 t:224.2s +tttg: c1/131 lr:0.001000 t:0.3s +tttg: c2/131 lr:0.001000 t:0.4s +tttg: c3/131 lr:0.000999 t:0.5s +tttg: c4/131 lr:0.000999 t:0.6s +tttg: c5/131 lr:0.000998 t:0.6s +tttg: c6/131 lr:0.000996 t:0.7s +tttg: c7/131 lr:0.000995 t:0.8s +tttg: c8/131 lr:0.000993 t:0.9s +tttg: c9/131 lr:0.000991 t:1.0s +tttg: c10/131 lr:0.000988 t:1.0s +tttg: c11/131 lr:0.000985 t:1.1s +tttg: c12/131 lr:0.000982 t:1.2s +tttg: c13/131 lr:0.000979 t:1.3s +tttg: c14/131 lr:0.000976 t:1.3s +tttg: c15/131 lr:0.000972 t:1.4s +tttg: c16/131 lr:0.000968 t:1.5s +tttg: c17/131 lr:0.000963 t:1.6s +tttg: c18/131 lr:0.000958 t:1.6s +tttg: c19/131 lr:0.000953 t:1.7s +tttg: c20/131 lr:0.000948 t:1.8s +tttg: c21/131 lr:0.000943 t:1.9s +tttg: c22/131 lr:0.000937 t:1.9s +tttg: c23/131 lr:0.000931 t:2.0s +tttg: c24/131 lr:0.000925 t:2.1s +tttg: c25/131 lr:0.000918 t:2.2s +tttg: c26/131 lr:0.000911 t:2.2s +tttg: c27/131 lr:0.000905 t:2.3s +tttg: c28/131 lr:0.000897 t:2.4s +tttg: c29/131 lr:0.000890 t:2.5s +tttg: c30/131 lr:0.000882 t:2.5s +tttg: c31/131 lr:0.000874 t:2.6s +tttg: c32/131 lr:0.000866 t:2.7s +tttg: c33/131 lr:0.000858 t:2.8s +tttg: c34/131 lr:0.000849 t:2.8s +tttg: c35/131 lr:0.000841 t:2.9s +tttg: c36/131 lr:0.000832 t:3.0s +tttg: c37/131 lr:0.000822 t:3.1s +tttg: c38/131 lr:0.000813 t:3.2s +tttg: c39/131 lr:0.000804 t:3.2s +tttg: c40/131 lr:0.000794 t:3.3s +tttg: c41/131 lr:0.000784 t:3.4s +tttg: c42/131 lr:0.000774 t:3.4s +tttg: c43/131 lr:0.000764 t:3.5s +tttg: c44/131 lr:0.000753 t:3.6s +tttg: c45/131 lr:0.000743 t:3.7s +tttg: c46/131 lr:0.000732 t:3.8s +tttg: c47/131 lr:0.000722 t:3.8s +tttg: c48/131 lr:0.000711 t:3.9s +tttg: c49/131 lr:0.000700 t:4.0s +tttg: c50/131 lr:0.000689 t:4.1s +tttg: c51/131 lr:0.000677 t:4.1s +tttg: c52/131 lr:0.000666 t:4.2s +tttg: c53/131 lr:0.000655 t:4.3s +tttg: c54/131 lr:0.000643 t:4.4s +tttg: c55/131 lr:0.000631 t:4.4s +tttg: c56/131 lr:0.000620 t:4.5s +tttg: c57/131 lr:0.000608 t:4.6s +tttg: c58/131 lr:0.000596 t:4.7s +tttg: c59/131 lr:0.000584 t:4.7s +tttg: c60/131 lr:0.000572 t:4.8s +tttg: c61/131 lr:0.000560 t:4.9s +tttg: c62/131 lr:0.000548 t:5.0s +tttg: c63/131 lr:0.000536 t:5.1s +tttg: c64/131 lr:0.000524 t:5.1s +tttg: c65/131 lr:0.000512 t:5.2s +tttg: c66/131 lr:0.000500 t:5.3s +tttg: c67/131 lr:0.000488 t:5.4s +tttg: c68/131 lr:0.000476 t:5.4s +tttg: c69/131 lr:0.000464 t:5.5s +tttg: c70/131 lr:0.000452 t:5.6s +tttg: c71/131 lr:0.000440 t:5.7s +tttg: c72/131 lr:0.000428 t:5.7s +tttg: c73/131 lr:0.000416 t:5.8s +tttg: c74/131 lr:0.000404 t:5.9s +tttg: c75/131 lr:0.000392 t:6.0s +tttg: c76/131 lr:0.000380 t:6.1s +tttg: c77/131 lr:0.000369 t:6.1s +tttg: c78/131 lr:0.000357 t:6.2s +tttg: c79/131 lr:0.000345 t:6.3s +tttg: c80/131 lr:0.000334 t:6.3s +tttg: c81/131 lr:0.000323 t:6.4s +tttg: c82/131 lr:0.000311 t:6.5s +tttg: c83/131 lr:0.000300 t:6.6s +tttg: c84/131 lr:0.000289 t:6.6s +tttg: c85/131 lr:0.000278 t:6.7s +tttg: c86/131 lr:0.000268 t:6.8s +tttg: c87/131 lr:0.000257 t:6.9s +tttg: c88/131 lr:0.000247 t:6.9s +tttg: c89/131 lr:0.000236 t:7.0s +tttg: c90/131 lr:0.000226 t:7.1s +tttg: c91/131 lr:0.000216 t:7.2s +tttg: c92/131 lr:0.000206 t:7.3s +tttg: c93/131 lr:0.000196 t:7.3s +tttg: c94/131 lr:0.000187 t:7.4s +tttg: c95/131 lr:0.000178 t:7.5s +tttg: c96/131 lr:0.000168 t:7.6s +tttg: c97/131 lr:0.000159 t:7.6s +tttg: c98/131 lr:0.000151 t:7.7s +tttg: c99/131 lr:0.000142 t:7.8s +tttg: c100/131 lr:0.000134 t:7.9s +tttg: c101/131 lr:0.000126 t:7.9s +tttg: c102/131 lr:0.000118 t:8.0s +tttg: c103/131 lr:0.000110 t:8.1s +tttg: c104/131 lr:0.000103 t:8.2s +tttg: c105/131 lr:0.000095 t:8.2s +tttg: c106/131 lr:0.000089 t:8.3s +tttg: c107/131 lr:0.000082 t:8.4s +tttg: c108/131 lr:0.000075 t:8.5s +tttg: c109/131 lr:0.000069 t:8.5s +tttg: c110/131 lr:0.000063 t:8.6s +tttg: c111/131 lr:0.000057 t:8.7s +tttg: c112/131 lr:0.000052 t:8.8s +tttg: c113/131 lr:0.000047 t:8.8s +tttg: c114/131 lr:0.000042 t:8.9s +tttg: c115/131 lr:0.000037 t:9.0s +tttg: c116/131 lr:0.000032 t:9.1s +tttg: c117/131 lr:0.000028 t:9.1s +tttg: c118/131 lr:0.000024 t:9.2s +tttg: c119/131 lr:0.000021 t:9.3s +tttg: c120/131 lr:0.000018 t:9.4s +tttg: c121/131 lr:0.000015 t:9.5s +tttg: c122/131 lr:0.000012 t:9.5s +tttg: c123/131 lr:0.000009 t:9.6s +tttg: c124/131 lr:0.000007 t:9.7s +tttg: c125/131 lr:0.000005 t:9.8s +tttg: c126/131 lr:0.000004 t:9.8s +tttg: c127/131 lr:0.000002 t:9.9s +tttg: c128/131 lr:0.000001 t:10.0s +tttg: c129/131 lr:0.000001 t:10.1s +tttg: c130/131 lr:0.000000 t:10.1s +ttpr: phase:1/3 t:236.1s +ttp: b755/782 bl:2.3830 bb:1.0764 rl:2.1768 rb:1.0533 dl:3397-3466 gd:0 +ttp: b749/782 bl:2.3934 bb:1.0860 rl:2.2001 rb:1.0570 dl:3039-3089 gd:0 +ttpp: phase:2/3 pd:2128 gd:1666 t:393.1s +tttg: c1/219 lr:0.001000 t:0.1s +tttg: c2/219 lr:0.001000 t:0.2s +tttg: c3/219 lr:0.001000 t:0.3s +tttg: c4/219 lr:0.001000 t:0.3s +tttg: c5/219 lr:0.000999 t:0.4s +tttg: c6/219 lr:0.000999 t:0.5s +tttg: c7/219 lr:0.000998 t:0.6s +tttg: c8/219 lr:0.000997 t:0.6s +tttg: c9/219 lr:0.000997 t:0.7s +tttg: c10/219 lr:0.000996 t:0.8s +tttg: c11/219 lr:0.000995 t:0.9s +tttg: c12/219 lr:0.000994 t:1.0s +tttg: c13/219 lr:0.000993 t:1.0s +tttg: c14/219 lr:0.000991 t:1.1s +tttg: c15/219 lr:0.000990 t:1.2s +tttg: c16/219 lr:0.000988 t:1.3s +tttg: c17/219 lr:0.000987 t:1.4s +tttg: c18/219 lr:0.000985 t:1.4s +tttg: c19/219 lr:0.000983 t:1.5s +tttg: c20/219 lr:0.000981 t:1.6s +tttg: c21/219 lr:0.000979 t:1.7s +tttg: c22/219 lr:0.000977 t:1.7s +tttg: c23/219 lr:0.000975 t:1.8s +tttg: c24/219 lr:0.000973 t:1.9s +tttg: c25/219 lr:0.000970 t:2.0s +tttg: c26/219 lr:0.000968 t:2.0s +tttg: c27/219 lr:0.000965 t:2.1s +tttg: c28/219 lr:0.000963 t:2.2s +tttg: c29/219 lr:0.000960 t:2.3s +tttg: c30/219 lr:0.000957 t:2.4s +tttg: c31/219 lr:0.000954 t:2.4s +tttg: c32/219 lr:0.000951 t:2.5s +tttg: c33/219 lr:0.000948 t:2.6s +tttg: c34/219 lr:0.000945 t:2.7s +tttg: c35/219 lr:0.000941 t:2.7s +tttg: c36/219 lr:0.000938 t:2.8s +tttg: c37/219 lr:0.000934 t:2.9s +tttg: c38/219 lr:0.000931 t:3.0s +tttg: c39/219 lr:0.000927 t:3.0s +tttg: c40/219 lr:0.000923 t:3.1s +tttg: c41/219 lr:0.000919 t:3.2s +tttg: c42/219 lr:0.000915 t:3.3s +tttg: c43/219 lr:0.000911 t:3.3s +tttg: c44/219 lr:0.000907 t:3.4s +tttg: c45/219 lr:0.000903 t:3.5s +tttg: c46/219 lr:0.000898 t:3.6s +tttg: c47/219 lr:0.000894 t:3.7s +tttg: c48/219 lr:0.000890 t:3.7s +tttg: c49/219 lr:0.000885 t:3.8s +tttg: c50/219 lr:0.000880 t:3.9s +tttg: c51/219 lr:0.000876 t:4.0s +tttg: c52/219 lr:0.000871 t:4.1s +tttg: c53/219 lr:0.000866 t:4.1s +tttg: c54/219 lr:0.000861 t:4.2s +tttg: c55/219 lr:0.000856 t:4.3s +tttg: c56/219 lr:0.000851 t:4.4s +tttg: c57/219 lr:0.000846 t:4.4s +tttg: c58/219 lr:0.000841 t:4.5s +tttg: c59/219 lr:0.000835 t:4.6s +tttg: c60/219 lr:0.000830 t:4.7s +tttg: c61/219 lr:0.000824 t:4.8s +tttg: c62/219 lr:0.000819 t:4.8s +tttg: c63/219 lr:0.000813 t:4.9s +tttg: c64/219 lr:0.000808 t:5.0s +tttg: c65/219 lr:0.000802 t:5.1s +tttg: c66/219 lr:0.000796 t:5.1s +tttg: c67/219 lr:0.000790 t:5.2s +tttg: c68/219 lr:0.000784 t:5.3s +tttg: c69/219 lr:0.000779 t:5.4s +tttg: c70/219 lr:0.000773 t:5.4s +tttg: c71/219 lr:0.000766 t:5.5s +tttg: c72/219 lr:0.000760 t:5.6s +tttg: c73/219 lr:0.000754 t:5.7s +tttg: c74/219 lr:0.000748 t:5.8s +tttg: c75/219 lr:0.000742 t:5.8s +tttg: c76/219 lr:0.000735 t:5.9s +tttg: c77/219 lr:0.000729 t:6.0s +tttg: c78/219 lr:0.000722 t:6.1s +tttg: c79/219 lr:0.000716 t:6.1s +tttg: c80/219 lr:0.000709 t:6.2s +tttg: c81/219 lr:0.000703 t:6.3s +tttg: c82/219 lr:0.000696 t:6.4s +tttg: c83/219 lr:0.000690 t:6.4s +tttg: c84/219 lr:0.000683 t:6.5s +tttg: c85/219 lr:0.000676 t:6.6s +tttg: c86/219 lr:0.000670 t:6.7s +tttg: c87/219 lr:0.000663 t:6.8s +tttg: c88/219 lr:0.000656 t:6.8s +tttg: c89/219 lr:0.000649 t:6.9s +tttg: c90/219 lr:0.000642 t:7.0s +tttg: c91/219 lr:0.000635 t:7.1s +tttg: c92/219 lr:0.000628 t:7.1s +tttg: c93/219 lr:0.000621 t:7.2s +tttg: c94/219 lr:0.000614 t:7.3s +tttg: c95/219 lr:0.000607 t:7.4s +tttg: c96/219 lr:0.000600 t:7.4s +tttg: c97/219 lr:0.000593 t:7.5s +tttg: c98/219 lr:0.000586 t:7.6s +tttg: c99/219 lr:0.000579 t:7.7s +tttg: c100/219 lr:0.000572 t:7.7s +tttg: c101/219 lr:0.000565 t:7.8s +tttg: c102/219 lr:0.000558 t:7.9s +tttg: c103/219 lr:0.000550 t:8.0s +tttg: c104/219 lr:0.000543 t:8.1s +tttg: c105/219 lr:0.000536 t:8.1s +tttg: c106/219 lr:0.000529 t:8.2s +tttg: c107/219 lr:0.000522 t:8.3s +tttg: c108/219 lr:0.000514 t:8.4s +tttg: c109/219 lr:0.000507 t:8.4s +tttg: c110/219 lr:0.000500 t:8.5s +tttg: c111/219 lr:0.000493 t:8.6s +tttg: c112/219 lr:0.000486 t:8.7s +tttg: c113/219 lr:0.000478 t:8.8s +tttg: c114/219 lr:0.000471 t:8.8s +tttg: c115/219 lr:0.000464 t:8.9s +tttg: c116/219 lr:0.000457 t:9.0s +tttg: c117/219 lr:0.000450 t:9.1s +tttg: c118/219 lr:0.000442 t:9.1s +tttg: c119/219 lr:0.000435 t:9.2s +tttg: c120/219 lr:0.000428 t:9.3s +tttg: c121/219 lr:0.000421 t:9.4s +tttg: c122/219 lr:0.000414 t:9.5s +tttg: c123/219 lr:0.000407 t:9.5s +tttg: c124/219 lr:0.000400 t:9.6s +tttg: c125/219 lr:0.000393 t:9.7s +tttg: c126/219 lr:0.000386 t:9.8s +tttg: c127/219 lr:0.000379 t:9.8s +tttg: c128/219 lr:0.000372 t:9.9s +tttg: c129/219 lr:0.000365 t:10.0s +tttg: c130/219 lr:0.000358 t:10.1s +tttg: c131/219 lr:0.000351 t:10.2s +tttg: c132/219 lr:0.000344 t:10.2s +tttg: c133/219 lr:0.000337 t:10.3s +tttg: c134/219 lr:0.000330 t:10.4s +tttg: c135/219 lr:0.000324 t:10.5s +tttg: c136/219 lr:0.000317 t:10.5s +tttg: c137/219 lr:0.000310 t:10.6s +tttg: c138/219 lr:0.000304 t:10.7s +tttg: c139/219 lr:0.000297 t:10.8s +tttg: c140/219 lr:0.000291 t:10.9s +tttg: c141/219 lr:0.000284 t:10.9s +tttg: c142/219 lr:0.000278 t:11.0s +tttg: c143/219 lr:0.000271 t:11.1s +tttg: c144/219 lr:0.000265 t:11.2s +tttg: c145/219 lr:0.000258 t:11.2s +tttg: c146/219 lr:0.000252 t:11.3s +tttg: c147/219 lr:0.000246 t:11.4s +tttg: c148/219 lr:0.000240 t:11.5s +tttg: c149/219 lr:0.000234 t:11.6s +tttg: c150/219 lr:0.000227 t:11.6s +tttg: c151/219 lr:0.000221 t:11.7s +tttg: c152/219 lr:0.000216 t:11.8s +tttg: c153/219 lr:0.000210 t:11.9s +tttg: c154/219 lr:0.000204 t:11.9s +tttg: c155/219 lr:0.000198 t:12.0s +tttg: c156/219 lr:0.000192 t:12.1s +tttg: c157/219 lr:0.000187 t:12.2s +tttg: c158/219 lr:0.000181 t:12.3s +tttg: c159/219 lr:0.000176 t:12.3s +tttg: c160/219 lr:0.000170 t:12.4s +tttg: c161/219 lr:0.000165 t:12.5s +tttg: c162/219 lr:0.000159 t:12.6s +tttg: c163/219 lr:0.000154 t:12.6s +tttg: c164/219 lr:0.000149 t:12.7s +tttg: c165/219 lr:0.000144 t:12.8s +tttg: c166/219 lr:0.000139 t:12.9s +tttg: c167/219 lr:0.000134 t:12.9s +tttg: c168/219 lr:0.000129 t:13.0s +tttg: c169/219 lr:0.000124 t:13.1s +tttg: c170/219 lr:0.000120 t:13.2s +tttg: c171/219 lr:0.000115 t:13.3s +tttg: c172/219 lr:0.000110 t:13.3s +tttg: c173/219 lr:0.000106 t:13.4s +tttg: c174/219 lr:0.000102 t:13.5s +tttg: c175/219 lr:0.000097 t:13.6s +tttg: c176/219 lr:0.000093 t:13.6s +tttg: c177/219 lr:0.000089 t:13.7s +tttg: c178/219 lr:0.000085 t:13.8s +tttg: c179/219 lr:0.000081 t:13.9s +tttg: c180/219 lr:0.000077 t:13.9s +tttg: c181/219 lr:0.000073 t:14.0s +tttg: c182/219 lr:0.000069 t:14.1s +tttg: c183/219 lr:0.000066 t:14.2s +tttg: c184/219 lr:0.000062 t:14.3s +tttg: c185/219 lr:0.000059 t:14.3s +tttg: c186/219 lr:0.000055 t:14.4s +tttg: c187/219 lr:0.000052 t:14.5s +tttg: c188/219 lr:0.000049 t:14.6s +tttg: c189/219 lr:0.000046 t:14.6s +tttg: c190/219 lr:0.000043 t:14.7s +tttg: c191/219 lr:0.000040 t:14.8s +tttg: c192/219 lr:0.000037 t:14.9s +tttg: c193/219 lr:0.000035 t:15.0s +tttg: c194/219 lr:0.000032 t:15.0s +tttg: c195/219 lr:0.000030 t:15.1s +tttg: c196/219 lr:0.000027 t:15.2s +tttg: c197/219 lr:0.000025 t:15.3s +tttg: c198/219 lr:0.000023 t:15.3s +tttg: c199/219 lr:0.000021 t:15.4s +tttg: c200/219 lr:0.000019 t:15.5s +tttg: c201/219 lr:0.000017 t:15.6s +tttg: c202/219 lr:0.000015 t:15.7s +tttg: c203/219 lr:0.000013 t:15.7s +tttg: c204/219 lr:0.000012 t:15.8s +tttg: c205/219 lr:0.000010 t:15.9s +tttg: c206/219 lr:0.000009 t:16.0s +tttg: c207/219 lr:0.000007 t:16.0s +tttg: c208/219 lr:0.000006 t:16.1s +tttg: c209/219 lr:0.000005 t:16.2s +tttg: c210/219 lr:0.000004 t:16.3s +tttg: c211/219 lr:0.000003 t:16.3s +tttg: c212/219 lr:0.000003 t:16.4s +tttg: c213/219 lr:0.000002 t:16.5s +tttg: c214/219 lr:0.000001 t:16.6s +tttg: c215/219 lr:0.000001 t:16.7s +tttg: c216/219 lr:0.000000 t:16.7s +tttg: c217/219 lr:0.000000 t:16.8s +tttg: c218/219 lr:0.000000 t:16.9s +ttpr: phase:2/3 t:411.7s +ttp: b748/782 bl:2.3181 bb:1.0818 rl:2.2114 rb:1.0594 dl:2992-3039 gd:0 +ttpp: phase:3/3 pd:2960 gd:2500 t:427.8s +tttg: c1/289 lr:0.001000 t:0.1s +tttg: c2/289 lr:0.001000 t:0.2s +tttg: c3/289 lr:0.001000 t:0.2s +tttg: c4/289 lr:0.001000 t:0.3s +tttg: c5/289 lr:0.001000 t:0.4s +tttg: c6/289 lr:0.000999 t:0.5s +tttg: c7/289 lr:0.000999 t:0.5s +tttg: c8/289 lr:0.000999 t:0.6s +tttg: c9/289 lr:0.000998 t:0.7s +tttg: c10/289 lr:0.000998 t:0.8s +tttg: c11/289 lr:0.000997 t:0.8s +tttg: c12/289 lr:0.000996 t:0.9s +tttg: c13/289 lr:0.000996 t:1.0s +tttg: c14/289 lr:0.000995 t:1.1s +tttg: c15/289 lr:0.000994 t:1.1s +tttg: c16/289 lr:0.000993 t:1.2s +tttg: c17/289 lr:0.000992 t:1.3s +tttg: c18/289 lr:0.000991 t:1.4s +tttg: c19/289 lr:0.000990 t:1.5s +tttg: c20/289 lr:0.000989 t:1.6s +tttg: c21/289 lr:0.000988 t:1.6s +tttg: c22/289 lr:0.000987 t:1.7s +tttg: c23/289 lr:0.000986 t:1.8s +tttg: c24/289 lr:0.000984 t:1.9s +tttg: c25/289 lr:0.000983 t:1.9s +tttg: c26/289 lr:0.000982 t:2.0s +tttg: c27/289 lr:0.000980 t:2.1s +tttg: c28/289 lr:0.000978 t:2.2s +tttg: c29/289 lr:0.000977 t:2.2s +tttg: c30/289 lr:0.000975 t:2.3s +tttg: c31/289 lr:0.000973 t:2.4s +tttg: c32/289 lr:0.000972 t:2.5s +tttg: c33/289 lr:0.000970 t:2.5s +tttg: c34/289 lr:0.000968 t:2.6s +tttg: c35/289 lr:0.000966 t:2.7s +tttg: c36/289 lr:0.000964 t:2.8s +tttg: c37/289 lr:0.000962 t:2.9s +tttg: c38/289 lr:0.000960 t:2.9s +tttg: c39/289 lr:0.000958 t:3.0s +tttg: c40/289 lr:0.000955 t:3.1s +tttg: c41/289 lr:0.000953 t:3.2s +tttg: c42/289 lr:0.000951 t:3.3s +tttg: c43/289 lr:0.000948 t:3.3s +tttg: c44/289 lr:0.000946 t:3.4s +tttg: c45/289 lr:0.000944 t:3.5s +tttg: c46/289 lr:0.000941 t:3.6s +tttg: c47/289 lr:0.000938 t:3.6s +tttg: c48/289 lr:0.000936 t:3.7s +tttg: c49/289 lr:0.000933 t:3.8s +tttg: c50/289 lr:0.000930 t:3.9s +tttg: c51/289 lr:0.000927 t:4.0s +tttg: c52/289 lr:0.000925 t:4.0s +tttg: c53/289 lr:0.000922 t:4.1s +tttg: c54/289 lr:0.000919 t:4.2s +tttg: c55/289 lr:0.000916 t:4.3s +tttg: c56/289 lr:0.000913 t:4.4s +tttg: c57/289 lr:0.000910 t:4.4s +tttg: c58/289 lr:0.000906 t:4.5s +tttg: c59/289 lr:0.000903 t:4.6s +tttg: c60/289 lr:0.000900 t:4.7s +tttg: c61/289 lr:0.000897 t:4.7s +tttg: c62/289 lr:0.000893 t:4.8s +tttg: c63/289 lr:0.000890 t:4.9s +tttg: c64/289 lr:0.000887 t:5.0s +tttg: c65/289 lr:0.000883 t:5.1s +tttg: c66/289 lr:0.000879 t:5.1s +tttg: c67/289 lr:0.000876 t:5.2s +tttg: c68/289 lr:0.000872 t:5.3s +tttg: c69/289 lr:0.000869 t:5.4s +tttg: c70/289 lr:0.000865 t:5.4s +tttg: c71/289 lr:0.000861 t:5.5s +tttg: c72/289 lr:0.000857 t:5.6s +tttg: c73/289 lr:0.000854 t:5.7s +tttg: c74/289 lr:0.000850 t:5.7s +tttg: c75/289 lr:0.000846 t:5.8s +tttg: c76/289 lr:0.000842 t:5.9s +tttg: c77/289 lr:0.000838 t:6.0s +tttg: c78/289 lr:0.000834 t:6.0s +tttg: c79/289 lr:0.000830 t:6.1s +tttg: c80/289 lr:0.000826 t:6.2s +tttg: c81/289 lr:0.000821 t:6.3s +tttg: c82/289 lr:0.000817 t:6.3s +tttg: c83/289 lr:0.000813 t:6.4s +tttg: c84/289 lr:0.000809 t:6.5s +tttg: c85/289 lr:0.000804 t:6.6s +tttg: c86/289 lr:0.000800 t:6.6s +tttg: c87/289 lr:0.000796 t:6.7s +tttg: c88/289 lr:0.000791 t:6.8s +tttg: c89/289 lr:0.000787 t:6.9s +tttg: c90/289 lr:0.000782 t:7.0s +tttg: c91/289 lr:0.000778 t:7.0s +tttg: c92/289 lr:0.000773 t:7.1s +tttg: c93/289 lr:0.000769 t:7.2s +tttg: c94/289 lr:0.000764 t:7.3s +tttg: c95/289 lr:0.000759 t:7.4s +tttg: c96/289 lr:0.000755 t:7.5s +tttg: c97/289 lr:0.000750 t:7.5s +tttg: c98/289 lr:0.000745 t:7.6s +tttg: c99/289 lr:0.000740 t:7.7s +tttg: c100/289 lr:0.000736 t:7.8s +tttg: c101/289 lr:0.000731 t:7.9s +tttg: c102/289 lr:0.000726 t:8.0s +tttg: c103/289 lr:0.000721 t:8.0s +tttg: c104/289 lr:0.000716 t:8.1s +tttg: c105/289 lr:0.000711 t:8.2s +tttg: c106/289 lr:0.000706 t:8.3s +tttg: c107/289 lr:0.000701 t:8.3s +tttg: c108/289 lr:0.000696 t:8.4s +tttg: c109/289 lr:0.000691 t:8.5s +tttg: c110/289 lr:0.000686 t:8.6s +tttg: c111/289 lr:0.000681 t:8.6s +tttg: c112/289 lr:0.000676 t:8.7s +tttg: c113/289 lr:0.000671 t:8.8s +tttg: c114/289 lr:0.000666 t:8.9s +tttg: c115/289 lr:0.000661 t:9.0s +tttg: c116/289 lr:0.000656 t:9.0s +tttg: c117/289 lr:0.000650 t:9.1s +tttg: c118/289 lr:0.000645 t:9.2s +tttg: c119/289 lr:0.000640 t:9.3s +tttg: c120/289 lr:0.000635 t:9.3s +tttg: c121/289 lr:0.000629 t:9.4s +tttg: c122/289 lr:0.000624 t:9.5s +tttg: c123/289 lr:0.000619 t:9.6s +tttg: c124/289 lr:0.000614 t:9.6s +tttg: c125/289 lr:0.000608 t:9.7s +tttg: c126/289 lr:0.000603 t:9.8s +tttg: c127/289 lr:0.000598 t:9.9s +tttg: c128/289 lr:0.000592 t:10.0s +tttg: c129/289 lr:0.000587 t:10.0s +tttg: c130/289 lr:0.000581 t:10.1s +tttg: c131/289 lr:0.000576 t:10.2s +tttg: c132/289 lr:0.000571 t:10.3s +tttg: c133/289 lr:0.000565 t:10.3s +tttg: c134/289 lr:0.000560 t:10.4s +tttg: c135/289 lr:0.000554 t:10.5s +tttg: c136/289 lr:0.000549 t:10.6s +tttg: c137/289 lr:0.000544 t:10.6s +tttg: c138/289 lr:0.000538 t:10.7s +tttg: c139/289 lr:0.000533 t:10.8s +tttg: c140/289 lr:0.000527 t:10.9s +tttg: c141/289 lr:0.000522 t:11.0s +tttg: c142/289 lr:0.000516 t:11.0s +tttg: c143/289 lr:0.000511 t:11.1s +tttg: c144/289 lr:0.000505 t:11.2s +tttg: c145/289 lr:0.000500 t:11.3s +tttg: c146/289 lr:0.000495 t:11.3s +tttg: c147/289 lr:0.000489 t:11.4s +tttg: c148/289 lr:0.000484 t:11.5s +tttg: c149/289 lr:0.000478 t:11.6s +tttg: c150/289 lr:0.000473 t:11.6s +tttg: c151/289 lr:0.000467 t:11.7s +tttg: c152/289 lr:0.000462 t:11.8s +tttg: c153/289 lr:0.000456 t:11.9s +tttg: c154/289 lr:0.000451 t:12.0s +tttg: c155/289 lr:0.000446 t:12.0s +tttg: c156/289 lr:0.000440 t:12.1s +tttg: c157/289 lr:0.000435 t:12.2s +tttg: c158/289 lr:0.000429 t:12.3s +tttg: c159/289 lr:0.000424 t:12.3s +tttg: c160/289 lr:0.000419 t:12.4s +tttg: c161/289 lr:0.000413 t:12.5s +tttg: c162/289 lr:0.000408 t:12.6s +tttg: c163/289 lr:0.000402 t:12.7s +tttg: c164/289 lr:0.000397 t:12.8s +tttg: c165/289 lr:0.000392 t:12.8s +tttg: c166/289 lr:0.000386 t:12.9s +tttg: c167/289 lr:0.000381 t:13.0s +tttg: c168/289 lr:0.000376 t:13.1s +tttg: c169/289 lr:0.000371 t:13.1s +tttg: c170/289 lr:0.000365 t:13.2s +tttg: c171/289 lr:0.000360 t:13.3s +tttg: c172/289 lr:0.000355 t:13.4s +tttg: c173/289 lr:0.000350 t:13.5s +tttg: c174/289 lr:0.000344 t:13.5s +tttg: c175/289 lr:0.000339 t:13.6s +tttg: c176/289 lr:0.000334 t:13.7s +tttg: c177/289 lr:0.000329 t:13.8s +tttg: c178/289 lr:0.000324 t:13.9s +tttg: c179/289 lr:0.000319 t:14.0s +tttg: c180/289 lr:0.000314 t:14.0s +tttg: c181/289 lr:0.000309 t:14.1s +tttg: c182/289 lr:0.000304 t:14.2s +tttg: c183/289 lr:0.000299 t:14.3s +tttg: c184/289 lr:0.000294 t:14.4s +tttg: c185/289 lr:0.000289 t:14.4s +tttg: c186/289 lr:0.000284 t:14.5s +tttg: c187/289 lr:0.000279 t:14.6s +tttg: c188/289 lr:0.000274 t:14.7s +tttg: c189/289 lr:0.000269 t:14.7s +tttg: c190/289 lr:0.000264 t:14.8s +tttg: c191/289 lr:0.000260 t:14.9s +tttg: c192/289 lr:0.000255 t:15.0s +tttg: c193/289 lr:0.000250 t:15.1s +tttg: c194/289 lr:0.000245 t:15.1s +tttg: c195/289 lr:0.000241 t:15.2s +tttg: c196/289 lr:0.000236 t:15.3s +tttg: c197/289 lr:0.000231 t:15.4s +tttg: c198/289 lr:0.000227 t:15.4s +tttg: c199/289 lr:0.000222 t:15.5s +tttg: c200/289 lr:0.000218 t:15.6s +tttg: c201/289 lr:0.000213 t:15.7s +tttg: c202/289 lr:0.000209 t:15.8s +tttg: c203/289 lr:0.000204 t:15.8s +tttg: c204/289 lr:0.000200 t:15.9s +tttg: c205/289 lr:0.000196 t:16.0s +tttg: c206/289 lr:0.000191 t:16.1s +tttg: c207/289 lr:0.000187 t:16.1s +tttg: c208/289 lr:0.000183 t:16.2s +tttg: c209/289 lr:0.000179 t:16.3s +tttg: c210/289 lr:0.000174 t:16.4s +tttg: c211/289 lr:0.000170 t:16.5s +tttg: c212/289 lr:0.000166 t:16.5s +tttg: c213/289 lr:0.000162 t:16.6s +tttg: c214/289 lr:0.000158 t:16.7s +tttg: c215/289 lr:0.000154 t:16.8s +tttg: c216/289 lr:0.000150 t:16.9s +tttg: c217/289 lr:0.000146 t:16.9s +tttg: c218/289 lr:0.000143 t:17.0s +tttg: c219/289 lr:0.000139 t:17.1s +tttg: c220/289 lr:0.000135 t:17.2s +tttg: c221/289 lr:0.000131 t:17.3s +tttg: c222/289 lr:0.000128 t:17.3s +tttg: c223/289 lr:0.000124 t:17.4s +tttg: c224/289 lr:0.000121 t:17.5s +tttg: c225/289 lr:0.000117 t:17.6s +tttg: c226/289 lr:0.000113 t:17.7s +tttg: c227/289 lr:0.000110 t:17.7s +tttg: c228/289 lr:0.000107 t:17.8s +tttg: c229/289 lr:0.000103 t:17.9s +tttg: c230/289 lr:0.000100 t:18.0s +tttg: c231/289 lr:0.000097 t:18.1s +tttg: c232/289 lr:0.000094 t:18.1s +tttg: c233/289 lr:0.000090 t:18.2s +tttg: c234/289 lr:0.000087 t:18.3s +tttg: c235/289 lr:0.000084 t:18.4s +tttg: c236/289 lr:0.000081 t:18.4s +tttg: c237/289 lr:0.000078 t:18.5s +tttg: c238/289 lr:0.000075 t:18.6s +tttg: c239/289 lr:0.000073 t:18.7s +tttg: c240/289 lr:0.000070 t:18.8s +tttg: c241/289 lr:0.000067 t:18.8s +tttg: c242/289 lr:0.000064 t:18.9s +tttg: c243/289 lr:0.000062 t:19.0s +tttg: c244/289 lr:0.000059 t:19.1s +tttg: c245/289 lr:0.000056 t:19.1s +tttg: c246/289 lr:0.000054 t:19.2s +tttg: c247/289 lr:0.000052 t:19.3s +tttg: c248/289 lr:0.000049 t:19.4s +tttg: c249/289 lr:0.000047 t:19.5s +tttg: c250/289 lr:0.000045 t:19.5s +tttg: c251/289 lr:0.000042 t:19.6s +tttg: c252/289 lr:0.000040 t:19.7s +tttg: c253/289 lr:0.000038 t:19.8s +tttg: c254/289 lr:0.000036 t:19.9s +tttg: c255/289 lr:0.000034 t:19.9s +tttg: c256/289 lr:0.000032 t:20.0s +tttg: c257/289 lr:0.000030 t:20.1s +tttg: c258/289 lr:0.000028 t:20.2s +tttg: c259/289 lr:0.000027 t:20.3s +tttg: c260/289 lr:0.000025 t:20.3s +tttg: c261/289 lr:0.000023 t:20.4s +tttg: c262/289 lr:0.000022 t:20.5s +tttg: c263/289 lr:0.000020 t:20.6s +tttg: c264/289 lr:0.000018 t:20.7s +tttg: c265/289 lr:0.000017 t:20.7s +tttg: c266/289 lr:0.000016 t:20.8s +tttg: c267/289 lr:0.000014 t:20.9s +tttg: c268/289 lr:0.000013 t:21.0s +tttg: c269/289 lr:0.000012 t:21.1s +tttg: c270/289 lr:0.000011 t:21.1s +tttg: c271/289 lr:0.000010 t:21.2s +tttg: c272/289 lr:0.000009 t:21.3s +tttg: c273/289 lr:0.000008 t:21.4s +tttg: c274/289 lr:0.000007 t:21.4s +tttg: c275/289 lr:0.000006 t:21.5s +tttg: c276/289 lr:0.000005 t:21.6s +tttg: c277/289 lr:0.000004 t:21.7s +tttg: c278/289 lr:0.000004 t:21.8s +tttg: c279/289 lr:0.000003 t:21.8s +tttg: c280/289 lr:0.000002 t:21.9s +tttg: c281/289 lr:0.000002 t:22.0s +tttg: c282/289 lr:0.000001 t:22.1s +tttg: c283/289 lr:0.000001 t:22.1s +tttg: c284/289 lr:0.000001 t:22.2s +tttg: c285/289 lr:0.000000 t:22.3s +tttg: c286/289 lr:0.000000 t:22.4s +tttg: c287/289 lr:0.000000 t:22.5s +tttg: c288/289 lr:0.000000 t:22.5s +ttpr: phase:3/3 t:452.1s +ttp: b732/782 bl:2.3722 bb:1.0924 rl:2.2229 rb:1.0619 dl:2416-2441 gd:1 +ttp: b724/782 bl:2.3161 bb:1.0575 rl:2.2286 rb:1.0616 dl:2203-2231 gd:1 +ttp: b714/782 bl:2.3061 bb:1.0215 rl:2.2327 rb:1.0593 dl:2018-2035 gd:1 +ttp: b709/782 bl:2.4416 bb:1.0922 rl:2.2428 rb:1.0610 dl:1937-1952 gd:1 +ttp: b700/782 bl:2.2712 bb:1.0142 rl:2.2440 rb:1.0588 dl:1824-1834 gd:1 +ttp: b689/782 bl:2.3865 bb:1.0745 rl:2.2496 rb:1.0595 dl:1706-1715 gd:1 +ttp: b686/782 bl:2.4400 bb:1.0742 rl:2.2566 rb:1.0601 dl:1675-1685 gd:1 +ttp: b673/782 bl:2.3581 bb:1.0585 rl:2.2600 rb:1.0600 dl:1562-1571 gd:1 +ttp: b666/782 bl:2.4083 bb:1.0631 rl:2.2646 rb:1.0601 dl:1507-1514 gd:1 +ttp: b660/782 bl:2.3686 bb:1.0471 rl:2.2677 rb:1.0597 dl:1466-1474 gd:1 +ttp: b653/782 bl:2.2868 bb:1.0368 rl:2.2682 rb:1.0591 dl:1419-1425 gd:1 +ttp: b645/782 bl:2.2988 bb:1.0285 rl:2.2690 rb:1.0582 dl:1367-1375 gd:1 +ttp: b637/782 bl:2.3641 bb:1.0781 rl:2.2713 rb:1.0587 dl:1320-1325 gd:1 +ttp: b629/782 bl:2.3467 bb:1.0099 rl:2.2731 rb:1.0575 dl:1276-1280 gd:1 +ttp: b621/782 bl:2.2885 bb:1.0451 rl:2.2734 rb:1.0572 dl:1231-1237 gd:1 +ttp: b613/782 bl:2.3329 bb:1.0387 rl:2.2746 rb:1.0568 dl:1190-1195 gd:1 +ttp: b606/782 bl:2.3549 bb:1.0641 rl:2.2762 rb:1.0570 dl:1159-1164 gd:1 +ttp: b596/782 bl:2.2875 bb:1.0458 rl:2.2764 rb:1.0568 dl:1115-1119 gd:1 +ttp: b589/782 bl:2.2749 bb:1.0103 rl:2.2764 rb:1.0559 dl:1086-1089 gd:1 +ttp: b582/782 bl:2.3464 bb:1.0306 rl:2.2776 rb:1.0554 dl:1056-1060 gd:1 +ttp: b573/782 bl:2.3663 bb:1.0667 rl:2.2790 rb:1.0556 dl:1021-1025 gd:1 +ttp: b565/782 bl:2.3769 bb:1.0297 rl:2.2805 rb:1.0552 dl:993-997 gd:1 +ttp: b558/782 bl:2.3730 bb:1.0613 rl:2.2819 rb:1.0553 dl:968-972 gd:1 +ttp: b551/782 bl:2.3327 bb:1.0543 rl:2.2826 rb:1.0553 dl:946-949 gd:1 +ttp: b544/782 bl:2.3448 bb:1.0685 rl:2.2835 rb:1.0555 dl:924-927 gd:1 +ttp: b536/782 bl:2.3126 bb:1.0414 rl:2.2839 rb:1.0553 dl:899-902 gd:1 +ttp: b528/782 bl:2.3310 bb:1.0419 rl:2.2845 rb:1.0551 dl:875-878 gd:1 +ttp: b520/782 bl:2.3217 bb:1.0011 rl:2.2849 rb:1.0544 dl:852-854 gd:1 +ttp: b512/782 bl:2.3012 bb:1.0627 rl:2.2851 rb:1.0545 dl:829-832 gd:1 +ttp: b504/782 bl:2.3134 bb:1.0321 rl:2.2855 rb:1.0542 dl:807-809 gd:1 +ttp: b496/782 bl:2.4151 bb:1.0455 rl:2.2869 rb:1.0541 dl:785-788 gd:1 +ttp: b488/782 bl:2.2926 bb:1.0088 rl:2.2869 rb:1.0536 dl:766-769 gd:1 +ttp: b483/782 bl:2.2531 bb:1.0279 rl:2.2866 rb:1.0534 dl:754-756 gd:1 +ttp: b462/782 bl:2.3334 bb:1.0357 rl:2.2870 rb:1.0532 dl:706-708 gd:1 +ttp: b454/782 bl:2.3806 bb:1.0812 rl:2.2879 rb:1.0535 dl:689-691 gd:1 +ttp: b447/782 bl:2.3221 bb:1.0667 rl:2.2882 rb:1.0536 dl:674-676 gd:1 +ttp: b439/782 bl:2.3233 bb:1.0367 rl:2.2885 rb:1.0534 dl:657-659 gd:1 +ttp: b431/782 bl:2.3682 bb:1.0506 rl:2.2892 rb:1.0534 dl:642-643 gd:1 +ttp: b423/782 bl:2.3096 bb:1.0538 rl:2.2893 rb:1.0534 dl:626-629 gd:1 +ttp: b415/782 bl:2.2813 bb:1.0567 rl:2.2893 rb:1.0534 dl:611-613 gd:1 +ttp: b408/782 bl:2.2936 bb:1.0665 rl:2.2893 rb:1.0535 dl:597-598 gd:1 +ttp: b401/782 bl:2.2418 bb:1.0300 rl:2.2889 rb:1.0533 dl:584-586 gd:1 +ttp: b393/782 bl:2.2960 bb:1.0545 rl:2.2890 rb:1.0534 dl:570-571 gd:1 +ttp: b385/782 bl:2.4035 bb:1.0718 rl:2.2898 rb:1.0535 dl:555-557 gd:1 +ttp: b377/782 bl:2.2230 bb:1.0184 rl:2.2893 rb:1.0533 dl:542-544 gd:1 +ttp: b369/782 bl:2.3443 bb:1.0591 rl:2.2897 rb:1.0533 dl:528-530 gd:1 +ttp: b361/782 bl:2.3467 bb:1.0956 rl:2.2901 rb:1.0536 dl:515-517 gd:1 +ttp: b353/782 bl:2.1936 bb:1.0031 rl:2.2895 rb:1.0532 dl:501-503 gd:1 +ttp: b345/782 bl:2.3542 bb:1.0717 rl:2.2898 rb:1.0534 dl:489-491 gd:1 +ttp: b337/782 bl:2.3085 bb:1.0505 rl:2.2900 rb:1.0533 dl:477-478 gd:1 +ttp: b329/782 bl:2.2847 bb:1.0826 rl:2.2899 rb:1.0535 dl:465-466 gd:1 +ttp: b321/782 bl:2.3379 bb:1.0673 rl:2.2902 rb:1.0536 dl:453-455 gd:1 +ttp: b313/782 bl:2.2830 bb:1.0757 rl:2.2901 rb:1.0537 dl:440-442 gd:1 +ttp: b305/782 bl:2.3327 bb:1.0843 rl:2.2904 rb:1.0538 dl:429-430 gd:1 +ttp: b297/782 bl:2.3983 bb:1.0836 rl:2.2909 rb:1.0540 dl:417-418 gd:1 +ttp: b289/782 bl:2.3228 bb:1.0803 rl:2.2910 rb:1.0541 dl:405-406 gd:1 +ttp: b281/782 bl:2.2856 bb:1.0835 rl:2.2910 rb:1.0542 dl:394-395 gd:1 +ttp: b273/782 bl:2.3298 bb:1.0739 rl:2.2912 rb:1.0543 dl:383-384 gd:1 +ttp: b265/782 bl:2.3695 bb:1.1025 rl:2.2915 rb:1.0545 dl:372-374 gd:1 +ttp: b258/782 bl:2.4245 bb:1.0879 rl:2.2921 rb:1.0547 dl:364-365 gd:1 +ttp: b250/782 bl:2.3067 bb:1.0695 rl:2.2921 rb:1.0547 dl:354-355 gd:1 +ttp: b243/782 bl:2.3446 bb:1.0758 rl:2.2923 rb:1.0548 dl:345-346 gd:1 +ttp: b234/782 bl:2.4130 bb:1.1434 rl:2.2928 rb:1.0551 dl:334-335 gd:1 +ttp: b232/782 bl:2.2974 bb:1.0829 rl:2.2928 rb:1.0552 dl:331-333 gd:1 +ttp: b223/782 bl:2.3196 bb:1.1199 rl:2.2929 rb:1.0555 dl:321-322 gd:1 +ttp: b214/782 bl:2.3376 bb:1.1186 rl:2.2930 rb:1.0557 dl:310-312 gd:1 +ttp: b208/782 bl:2.3897 bb:1.1312 rl:2.2934 rb:1.0559 dl:304-305 gd:1 +ttp: b200/782 bl:2.3567 bb:1.0896 rl:2.2936 rb:1.0560 dl:296-297 gd:1 +ttp: b190/782 bl:2.3390 bb:1.0754 rl:2.2937 rb:1.0561 dl:284-285 gd:1 +ttp: b181/782 bl:2.3283 bb:1.1243 rl:2.2938 rb:1.0563 dl:275-276 gd:1 +ttp: b174/782 bl:2.4400 bb:1.1508 rl:2.2943 rb:1.0565 dl:268-269 gd:1 +ttp: b164/782 bl:2.4316 bb:1.1503 rl:2.2946 rb:1.0568 dl:259-260 gd:1 +ttp: b157/782 bl:2.3562 bb:1.1285 rl:2.2948 rb:1.0570 dl:252-253 gd:1 +ttp: b148/782 bl:2.3307 bb:1.1028 rl:2.2949 rb:1.0571 dl:243-244 gd:1 +ttp: b140/782 bl:2.4267 bb:1.1330 rl:2.2952 rb:1.0573 dl:235-236 gd:1 +ttp: b132/782 bl:2.4398 bb:1.1587 rl:2.2956 rb:1.0575 dl:228-229 gd:1 +ttp: b126/782 bl:2.3880 bb:1.1376 rl:2.2958 rb:1.0577 dl:222-223 gd:1 +ttp: b122/782 bl:2.4099 bb:1.1409 rl:2.2961 rb:1.0579 dl:219-219 gd:1 +ttp: b114/782 bl:2.4589 bb:1.1402 rl:2.2965 rb:1.0581 dl:211-212 gd:1 +ttp: b106/782 bl:2.4236 bb:1.1666 rl:2.2967 rb:1.0583 dl:204-205 gd:1 +ttp: b99/782 bl:2.4970 bb:1.1760 rl:2.2972 rb:1.0586 dl:198-199 gd:1 +ttp: b89/782 bl:2.4852 bb:1.1484 rl:2.2975 rb:1.0588 dl:189-190 gd:1 +ttp: b82/782 bl:2.4909 bb:1.1856 rl:2.2979 rb:1.0590 dl:183-183 gd:1 +ttp: b73/782 bl:2.5360 bb:1.2448 rl:2.2984 rb:1.0593 dl:174-175 gd:1 +ttp: b65/782 bl:2.4611 bb:1.1673 rl:2.2986 rb:1.0595 dl:167-169 gd:1 +ttp: b59/782 bl:2.4824 bb:1.1826 rl:2.2990 rb:1.0597 dl:162-163 gd:1 +ttp: b51/782 bl:2.4830 bb:1.1879 rl:2.2993 rb:1.0599 dl:154-155 gd:1 +ttp: b45/782 bl:2.4500 bb:1.1723 rl:2.2995 rb:1.0601 dl:148-149 gd:1 +ttp: b38/782 bl:2.5963 bb:1.1908 rl:2.2999 rb:1.0603 dl:141-142 gd:1 +ttp: b31/782 bl:2.4286 bb:1.1521 rl:2.3001 rb:1.0604 dl:134-135 gd:1 +ttp: b24/782 bl:2.4516 bb:1.1562 rl:2.3003 rb:1.0605 dl:127-128 gd:1 +ttp: b15/782 bl:2.6514 bb:1.2313 rl:2.3007 rb:1.0607 dl:115-117 gd:1 +ttp: b6/782 bl:2.6955 bb:1.2018 rl:2.3012 rb:1.0609 dl:99-101 gd:1 +quantized_ttt_phased val_loss:2.31756847 val_bpb:1.05903968 eval_time:554723ms +total_eval_time:554.7s diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed42.log b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed42.log new file mode 100644 index 0000000000..5187ee3210 --- /dev/null +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/train_seed42.log @@ -0,0 +1,5847 @@ +nohup: ignoring input +==================================================== + v5 PRIMARY noLC fulltilt + precompute outside timer: V21 + #1953 + #1948 + fulltilt-tilt SEED=42 Thu Apr 30 05:59:22 UTC 2026 + LeakyReLU slope 0.3 (code patch + v5 hint-precompute-outside-timer), EVAL_SEQ_LEN 2048 (no long-ctx for cap), no_qv, fulltilt-tilt +==================================================== +W0430 05:59:24.045000 943476 torch/distributed/run.py:803] +W0430 05:59:24.045000 943476 torch/distributed/run.py:803] ***************************************** +W0430 05:59:24.045000 943476 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0430 05:59:24.045000 943476 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + agree_add_boost: 0.5 + artifact_dir: + attn_clip_sigmas: 13.0 + attn_out_gate_enabled: False + attn_out_gate_src: proj + awq_lite_bits: 8 + awq_lite_enabled: True + awq_lite_group_size: 64 + awq_lite_group_top_k: 1 + beta1: 0.9 + beta2: 0.99 + caseops_enabled: True + compressor: pergroup + data_dir: /runpod-volume/caseops_data/datasets + datasets_dir: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved + distributed: True + ema_decay: 0.9965 + embed_bits: 7 + embed_clip_sigmas: 14.0 + embed_lr: 0.6 + embed_wd: 0.085 + enable_looping_at: 0.35 + eval_seq_len: 2048 + eval_stride: 64 + fused_ce_enabled: True + gate_window: 12 + gated_attn_enabled: False + gated_attn_init_std: 0.01 + gated_attn_quant_gate: True + global_ttt_batch_seqs: 32 + global_ttt_chunk_tokens: 32768 + global_ttt_epochs: 1 + global_ttt_grad_clip: 1.0 + global_ttt_lr: 0.001 + global_ttt_momentum: 0.9 + global_ttt_respect_doc_boundaries: True + global_ttt_warmup_chunks: 0 + global_ttt_warmup_start_lr: 0.0 + gptq_calibration_batches: 16 + gptq_reserve_seconds: 0.5 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/eda0247f-74dc-42c3-bca1-899aa80e6c11.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + lqer_asym_enabled: True + lqer_asym_group: 64 + lqer_enabled: True + lqer_factor_bits: 4 + lqer_gain_select: False + lqer_rank: 4 + lqer_scope: all + lqer_top_k: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.026 + max_wallclock_seconds: 600.0 + min_lr: 0.1 + mlp_clip_sigmas: 11.5 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + ngram_hint_precompute_outside: True + ngram_tilt_enabled: True + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_final_lane: mean + parallel_start_layer: 8 + phased_ttt_num_phases: 3 + phased_ttt_prefix_docs: 2500 + qk_gain_init: 5.25 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: eda0247f-74dc-42c3-bca1-899aa80e6c11 + scalar_lr: 0.02 + seed: 42 + skip_gates_enabled: True + smear_gate_enabled: True + sparse_attn_gate_enabled: True + sparse_attn_gate_init_std: 0.0 + sparse_attn_gate_scale: 0.5 + temperature_scale: 1.0 + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + token_boost: 2.625 + token_order: 16 + token_threshold: 0.8 + tokenizer_path: /runpod-volume/caseops_data/datasets/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model + train_batch_tokens: 786432 + train_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.99 + ttt_chunk_size: 48 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 80 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_bytes_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_bytes_*.bin + val_doc_fraction: 1.0 + val_files: /runpod-volume/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved/fineweb_val_*.bin + val_loss_every: 0 + vocab_size: 8192 + warmdown_frac: 0.85 + warmup_steps: 20 + within_boost: 0.75 + within_tau: 0.45 + word_boost: 0.75 + word_normalize: strip_punct_lower + word_order: 4 + word_tau: 0.65 + world_size: 8 + xsa_last_n: 11 +train_shards: 1499 +val_tokens: 47851520 +model_params:35945673 +gptq:reserving 0s, effective=599500ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +1/20000 train_loss: 9.0087 train_time: 0.0m tok/s: 17813757 +2/20000 train_loss: 12.8396 train_time: 0.0m tok/s: 11353557 +3/20000 train_loss: 10.2198 train_time: 0.0m tok/s: 10164607 +4/20000 train_loss: 8.6819 train_time: 0.0m tok/s: 9670678 +5/20000 train_loss: 7.9295 train_time: 0.0m tok/s: 9401870 +6/20000 train_loss: 7.5654 train_time: 0.0m tok/s: 9223922 +7/20000 train_loss: 7.2991 train_time: 0.0m tok/s: 9106273 +8/20000 train_loss: 6.9412 train_time: 0.0m tok/s: 9019275 +9/20000 train_loss: 6.6091 train_time: 0.0m tok/s: 8953788 +10/20000 train_loss: 6.5120 train_time: 0.0m tok/s: 8893055 +11/20000 train_loss: 6.1863 train_time: 0.0m tok/s: 8750805 +12/20000 train_loss: 5.8676 train_time: 0.0m tok/s: 8702513 +13/20000 train_loss: 5.7070 train_time: 0.0m tok/s: 8669440 +14/20000 train_loss: 5.3298 train_time: 0.0m tok/s: 8640628 +15/20000 train_loss: 5.2682 train_time: 0.0m tok/s: 8621953 +16/20000 train_loss: 5.2408 train_time: 0.0m tok/s: 8604580 +17/20000 train_loss: 5.1147 train_time: 0.0m tok/s: 8593361 +18/20000 train_loss: 5.0961 train_time: 0.0m tok/s: 8588455 +19/20000 train_loss: 4.9911 train_time: 0.0m tok/s: 8583699 +20/20000 train_loss: 4.8820 train_time: 0.0m tok/s: 8577087 +21/20000 train_loss: 4.8061 train_time: 0.0m tok/s: 8553787 +22/20000 train_loss: 4.8512 train_time: 0.0m tok/s: 8532718 +23/20000 train_loss: 4.8027 train_time: 0.0m tok/s: 8518229 +24/20000 train_loss: 4.9048 train_time: 0.0m tok/s: 8507379 +25/20000 train_loss: 4.6799 train_time: 0.0m tok/s: 8503538 +26/20000 train_loss: 4.7073 train_time: 0.0m tok/s: 8496215 +27/20000 train_loss: 4.5864 train_time: 0.0m tok/s: 8491111 +28/20000 train_loss: 4.6475 train_time: 0.0m tok/s: 8491100 +29/20000 train_loss: 4.5763 train_time: 0.0m tok/s: 8489289 +30/20000 train_loss: 4.5564 train_time: 0.0m tok/s: 8483863 +31/20000 train_loss: 4.5506 train_time: 0.0m tok/s: 8479038 +32/20000 train_loss: 4.5139 train_time: 0.0m tok/s: 8470053 +33/20000 train_loss: 4.4774 train_time: 0.1m tok/s: 8466505 +34/20000 train_loss: 4.4157 train_time: 0.1m tok/s: 8460762 +35/20000 train_loss: 4.3554 train_time: 0.1m tok/s: 8454917 +36/20000 train_loss: 4.5061 train_time: 0.1m tok/s: 8446453 +37/20000 train_loss: 4.4410 train_time: 0.1m tok/s: 8444580 +38/20000 train_loss: 4.3638 train_time: 0.1m tok/s: 8444645 +39/20000 train_loss: 4.4932 train_time: 0.1m tok/s: 8443041 +40/20000 train_loss: 4.4562 train_time: 0.1m tok/s: 8440786 +41/20000 train_loss: 4.3357 train_time: 0.1m tok/s: 8438705 +42/20000 train_loss: 4.2650 train_time: 0.1m tok/s: 8437099 +43/20000 train_loss: 4.2950 train_time: 0.1m tok/s: 8433654 +44/20000 train_loss: 4.2303 train_time: 0.1m tok/s: 8428557 +45/20000 train_loss: 4.3550 train_time: 0.1m tok/s: 8426695 +46/20000 train_loss: 4.2711 train_time: 0.1m tok/s: 8423335 +47/20000 train_loss: 4.1348 train_time: 0.1m tok/s: 8421384 +48/20000 train_loss: 4.1720 train_time: 0.1m tok/s: 8419541 +49/20000 train_loss: 4.1124 train_time: 0.1m tok/s: 8418157 +50/20000 train_loss: 4.0770 train_time: 0.1m tok/s: 8415646 +51/20000 train_loss: 4.2929 train_time: 0.1m tok/s: 8415021 +52/20000 train_loss: 4.2221 train_time: 0.1m tok/s: 8410471 +53/20000 train_loss: 4.1508 train_time: 0.1m tok/s: 8409071 +54/20000 train_loss: 4.1554 train_time: 0.1m tok/s: 8407667 +55/20000 train_loss: 4.1609 train_time: 0.1m tok/s: 8407699 +56/20000 train_loss: 4.0900 train_time: 0.1m tok/s: 8404234 +57/20000 train_loss: 4.1306 train_time: 0.1m tok/s: 8403971 +58/20000 train_loss: 4.0633 train_time: 0.1m tok/s: 8401762 +59/20000 train_loss: 4.0296 train_time: 0.1m tok/s: 8400594 +60/20000 train_loss: 3.9405 train_time: 0.1m tok/s: 8398508 +61/20000 train_loss: 3.9449 train_time: 0.1m tok/s: 8400351 +62/20000 train_loss: 4.0539 train_time: 0.1m tok/s: 8400275 +63/20000 train_loss: 4.1396 train_time: 0.1m tok/s: 8400582 +64/20000 train_loss: 3.9402 train_time: 0.1m tok/s: 8400382 +65/20000 train_loss: 4.0522 train_time: 0.1m tok/s: 8400438 +66/20000 train_loss: 4.0038 train_time: 0.1m tok/s: 8400514 +67/20000 train_loss: 3.9151 train_time: 0.1m tok/s: 8398036 +68/20000 train_loss: 3.9464 train_time: 0.1m tok/s: 8397950 +69/20000 train_loss: 3.8502 train_time: 0.1m tok/s: 8396709 +70/20000 train_loss: 3.9585 train_time: 0.1m tok/s: 8395402 +71/20000 train_loss: 3.8774 train_time: 0.1m tok/s: 8395004 +72/20000 train_loss: 4.0561 train_time: 0.1m tok/s: 8395002 +73/20000 train_loss: 3.8692 train_time: 0.1m tok/s: 8394867 +74/20000 train_loss: 3.8879 train_time: 0.1m tok/s: 8393030 +75/20000 train_loss: 3.8760 train_time: 0.1m tok/s: 8390432 +76/20000 train_loss: 3.8444 train_time: 0.1m tok/s: 8389896 +77/20000 train_loss: 3.7994 train_time: 0.1m tok/s: 8390289 +78/20000 train_loss: 3.7165 train_time: 0.1m tok/s: 8388436 +79/20000 train_loss: 3.8736 train_time: 0.1m tok/s: 8388948 +80/20000 train_loss: 3.7745 train_time: 0.1m tok/s: 8388136 +81/20000 train_loss: 3.7167 train_time: 0.1m tok/s: 8386692 +82/20000 train_loss: 3.7345 train_time: 0.1m tok/s: 8385416 +83/20000 train_loss: 3.6099 train_time: 0.1m tok/s: 8385015 +84/20000 train_loss: 3.6539 train_time: 0.1m tok/s: 8383756 +85/20000 train_loss: 3.6271 train_time: 0.1m tok/s: 8383049 +86/20000 train_loss: 3.4179 train_time: 0.1m tok/s: 8382227 +87/20000 train_loss: 3.6514 train_time: 0.1m tok/s: 8382745 +88/20000 train_loss: 3.5347 train_time: 0.1m tok/s: 8381423 +89/20000 train_loss: 3.5722 train_time: 0.1m tok/s: 8379753 +90/20000 train_loss: 3.5772 train_time: 0.1m tok/s: 8379248 +91/20000 train_loss: 3.6206 train_time: 0.1m tok/s: 8379503 +92/20000 train_loss: 3.6984 train_time: 0.1m tok/s: 8379383 +93/20000 train_loss: 3.6010 train_time: 0.1m tok/s: 8378002 +94/20000 train_loss: 3.6329 train_time: 0.1m tok/s: 8378283 +95/20000 train_loss: 3.6013 train_time: 0.1m tok/s: 8378156 +96/20000 train_loss: 3.5712 train_time: 0.2m tok/s: 8377287 +97/20000 train_loss: 3.4690 train_time: 0.2m tok/s: 8376168 +98/20000 train_loss: 3.5339 train_time: 0.2m tok/s: 8375759 +99/20000 train_loss: 3.4887 train_time: 0.2m tok/s: 8376182 +100/20000 train_loss: 3.4266 train_time: 0.2m tok/s: 8375856 +101/20000 train_loss: 3.4329 train_time: 0.2m tok/s: 8374768 +102/20000 train_loss: 3.4855 train_time: 0.2m tok/s: 8374887 +103/20000 train_loss: 3.3708 train_time: 0.2m tok/s: 8374595 +104/20000 train_loss: 3.4741 train_time: 0.2m tok/s: 8374031 +105/20000 train_loss: 3.3666 train_time: 0.2m tok/s: 8373488 +106/20000 train_loss: 3.4951 train_time: 0.2m tok/s: 8372742 +107/20000 train_loss: 3.2530 train_time: 0.2m tok/s: 8371606 +108/20000 train_loss: 3.4156 train_time: 0.2m tok/s: 8370583 +109/20000 train_loss: 3.4078 train_time: 0.2m tok/s: 8369722 +110/20000 train_loss: 3.4401 train_time: 0.2m tok/s: 8369148 +111/20000 train_loss: 3.4264 train_time: 0.2m tok/s: 8368163 +112/20000 train_loss: 3.4336 train_time: 0.2m tok/s: 8368393 +113/20000 train_loss: 3.3479 train_time: 0.2m tok/s: 8368018 +114/20000 train_loss: 3.3913 train_time: 0.2m tok/s: 8368129 +115/20000 train_loss: 3.4235 train_time: 0.2m tok/s: 8367765 +116/20000 train_loss: 3.2332 train_time: 0.2m tok/s: 8367664 +117/20000 train_loss: 3.4513 train_time: 0.2m tok/s: 8367640 +118/20000 train_loss: 3.3891 train_time: 0.2m tok/s: 8366970 +119/20000 train_loss: 3.3608 train_time: 0.2m tok/s: 8366379 +120/20000 train_loss: 3.3482 train_time: 0.2m tok/s: 8366141 +121/20000 train_loss: 3.2975 train_time: 0.2m tok/s: 8366335 +122/20000 train_loss: 3.3100 train_time: 0.2m tok/s: 8366383 +123/20000 train_loss: 3.2921 train_time: 0.2m tok/s: 8366013 +124/20000 train_loss: 3.3437 train_time: 0.2m tok/s: 8365151 +125/20000 train_loss: 3.2407 train_time: 0.2m tok/s: 8364900 +126/20000 train_loss: 3.2555 train_time: 0.2m tok/s: 8364912 +127/20000 train_loss: 3.2823 train_time: 0.2m tok/s: 8364609 +128/20000 train_loss: 3.3313 train_time: 0.2m tok/s: 8363963 +129/20000 train_loss: 3.2968 train_time: 0.2m tok/s: 8363208 +130/20000 train_loss: 3.2729 train_time: 0.2m tok/s: 8362601 +131/20000 train_loss: 3.2279 train_time: 0.2m tok/s: 8362199 +132/20000 train_loss: 3.1761 train_time: 0.2m tok/s: 8362559 +133/20000 train_loss: 3.2326 train_time: 0.2m tok/s: 8361786 +134/20000 train_loss: 3.1373 train_time: 0.2m tok/s: 8361650 +135/20000 train_loss: 2.9806 train_time: 0.2m tok/s: 8360382 +136/20000 train_loss: 3.2406 train_time: 0.2m tok/s: 8359233 +137/20000 train_loss: 3.0948 train_time: 0.2m tok/s: 8358260 +138/20000 train_loss: 3.2821 train_time: 0.2m tok/s: 8357671 +139/20000 train_loss: 3.2456 train_time: 0.2m tok/s: 8358062 +140/20000 train_loss: 3.1692 train_time: 0.2m tok/s: 8357845 +141/20000 train_loss: 3.0977 train_time: 0.2m tok/s: 8357074 +142/20000 train_loss: 3.2961 train_time: 0.2m tok/s: 8357053 +143/20000 train_loss: 3.3638 train_time: 0.2m tok/s: 8356667 +144/20000 train_loss: 3.2911 train_time: 0.2m tok/s: 8356261 +145/20000 train_loss: 3.2585 train_time: 0.2m tok/s: 8355984 +146/20000 train_loss: 3.2680 train_time: 0.2m tok/s: 8356015 +147/20000 train_loss: 3.1713 train_time: 0.2m tok/s: 8355837 +148/20000 train_loss: 3.1924 train_time: 0.2m tok/s: 8355539 +149/20000 train_loss: 3.2620 train_time: 0.2m tok/s: 8355103 +150/20000 train_loss: 3.1983 train_time: 0.2m tok/s: 8354823 +151/20000 train_loss: 3.5557 train_time: 0.2m tok/s: 8354274 +152/20000 train_loss: 3.1759 train_time: 0.2m tok/s: 8353418 +153/20000 train_loss: 3.2964 train_time: 0.2m tok/s: 8352295 +154/20000 train_loss: 3.2022 train_time: 0.2m tok/s: 8352619 +155/20000 train_loss: 3.1404 train_time: 0.2m tok/s: 8352635 +156/20000 train_loss: 3.0424 train_time: 0.2m tok/s: 8351886 +157/20000 train_loss: 3.0913 train_time: 0.2m tok/s: 8351593 +158/20000 train_loss: 3.1904 train_time: 0.2m tok/s: 8351467 +159/20000 train_loss: 3.0521 train_time: 0.2m tok/s: 8351143 +160/20000 train_loss: 3.1740 train_time: 0.3m tok/s: 8351057 +161/20000 train_loss: 3.1329 train_time: 0.3m tok/s: 8351128 +162/20000 train_loss: 3.0726 train_time: 0.3m tok/s: 8350005 +163/20000 train_loss: 3.1473 train_time: 0.3m tok/s: 8349350 +164/20000 train_loss: 3.0306 train_time: 0.3m tok/s: 8348909 +165/20000 train_loss: 3.2067 train_time: 0.3m tok/s: 8348720 +166/20000 train_loss: 3.1356 train_time: 0.3m tok/s: 8348859 +167/20000 train_loss: 3.1349 train_time: 0.3m tok/s: 8348034 +168/20000 train_loss: 3.1907 train_time: 0.3m tok/s: 8347737 +169/20000 train_loss: 3.1070 train_time: 0.3m tok/s: 8347296 +170/20000 train_loss: 2.8128 train_time: 0.3m tok/s: 8346560 +171/20000 train_loss: 3.1329 train_time: 0.3m tok/s: 8346074 +172/20000 train_loss: 3.0910 train_time: 0.3m tok/s: 8345964 +173/20000 train_loss: 3.2297 train_time: 0.3m tok/s: 8345421 +174/20000 train_loss: 3.1141 train_time: 0.3m tok/s: 8345533 +175/20000 train_loss: 3.1411 train_time: 0.3m tok/s: 8345439 +176/20000 train_loss: 3.1559 train_time: 0.3m tok/s: 8345618 +177/20000 train_loss: 3.1297 train_time: 0.3m tok/s: 8345302 +178/20000 train_loss: 2.9611 train_time: 0.3m tok/s: 8345277 +179/20000 train_loss: 3.3190 train_time: 0.3m tok/s: 8344937 +180/20000 train_loss: 2.9711 train_time: 0.3m tok/s: 8344615 +181/20000 train_loss: 2.9547 train_time: 0.3m tok/s: 8344803 +182/20000 train_loss: 3.0487 train_time: 0.3m tok/s: 8344401 +183/20000 train_loss: 2.9941 train_time: 0.3m tok/s: 8343265 +184/20000 train_loss: 3.0015 train_time: 0.3m tok/s: 8342762 +185/20000 train_loss: 2.7213 train_time: 0.3m tok/s: 8341850 +186/20000 train_loss: 3.1062 train_time: 0.3m tok/s: 8341403 +187/20000 train_loss: 3.0427 train_time: 0.3m tok/s: 8341237 +188/20000 train_loss: 3.1932 train_time: 0.3m tok/s: 8341450 +189/20000 train_loss: 3.5065 train_time: 0.3m tok/s: 8341598 +190/20000 train_loss: 3.0809 train_time: 0.3m tok/s: 8341553 +191/20000 train_loss: 3.0474 train_time: 0.3m tok/s: 8341245 +192/20000 train_loss: 3.0148 train_time: 0.3m tok/s: 8341545 +193/20000 train_loss: 2.9913 train_time: 0.3m tok/s: 8341274 +194/20000 train_loss: 2.9980 train_time: 0.3m tok/s: 8340937 +195/20000 train_loss: 2.8926 train_time: 0.3m tok/s: 8340506 +196/20000 train_loss: 3.1216 train_time: 0.3m tok/s: 8338554 +197/20000 train_loss: 3.0418 train_time: 0.3m tok/s: 8340210 +198/20000 train_loss: 3.0488 train_time: 0.3m tok/s: 8339926 +199/20000 train_loss: 3.0536 train_time: 0.3m tok/s: 8340171 +200/20000 train_loss: 3.0576 train_time: 0.3m tok/s: 8340439 +201/20000 train_loss: 3.1083 train_time: 0.3m tok/s: 8340537 +202/20000 train_loss: 3.3213 train_time: 0.3m tok/s: 8340171 +203/20000 train_loss: 3.0699 train_time: 0.3m tok/s: 8339993 +204/20000 train_loss: 3.0792 train_time: 0.3m tok/s: 8339747 +205/20000 train_loss: 3.0570 train_time: 0.3m tok/s: 8339991 +206/20000 train_loss: 2.9469 train_time: 0.3m tok/s: 8339671 +207/20000 train_loss: 3.0879 train_time: 0.3m tok/s: 8339355 +208/20000 train_loss: 2.9330 train_time: 0.3m tok/s: 8339224 +209/20000 train_loss: 2.9950 train_time: 0.3m tok/s: 8338784 +210/20000 train_loss: 3.0690 train_time: 0.3m tok/s: 8338616 +211/20000 train_loss: 3.2682 train_time: 0.3m tok/s: 8338362 +212/20000 train_loss: 3.0158 train_time: 0.3m tok/s: 8338314 +213/20000 train_loss: 2.9386 train_time: 0.3m tok/s: 8337687 +214/20000 train_loss: 3.0900 train_time: 0.3m tok/s: 8336971 +215/20000 train_loss: 3.0341 train_time: 0.3m tok/s: 8335571 +216/20000 train_loss: 3.0879 train_time: 0.3m tok/s: 8336933 +217/20000 train_loss: 3.0184 train_time: 0.3m tok/s: 8336852 +218/20000 train_loss: 3.0276 train_time: 0.3m tok/s: 8337100 +219/20000 train_loss: 3.1140 train_time: 0.3m tok/s: 8336767 +220/20000 train_loss: 3.3308 train_time: 0.3m tok/s: 8336391 +221/20000 train_loss: 2.9277 train_time: 0.3m tok/s: 8335196 +222/20000 train_loss: 2.9755 train_time: 0.3m tok/s: 8335461 +223/20000 train_loss: 2.9996 train_time: 0.4m tok/s: 8335620 +224/20000 train_loss: 2.9995 train_time: 0.4m tok/s: 8335263 +225/20000 train_loss: 3.0679 train_time: 0.4m tok/s: 8334371 +226/20000 train_loss: 3.0370 train_time: 0.4m tok/s: 8334772 +227/20000 train_loss: 3.0707 train_time: 0.4m tok/s: 8334325 +228/20000 train_loss: 3.0860 train_time: 0.4m tok/s: 8334159 +229/20000 train_loss: 3.0855 train_time: 0.4m tok/s: 8334184 +230/20000 train_loss: 2.9516 train_time: 0.4m tok/s: 8334552 +231/20000 train_loss: 3.1039 train_time: 0.4m tok/s: 8334332 +232/20000 train_loss: 2.9822 train_time: 0.4m tok/s: 8334116 +233/20000 train_loss: 3.0155 train_time: 0.4m tok/s: 8334114 +234/20000 train_loss: 3.0169 train_time: 0.4m tok/s: 8332981 +235/20000 train_loss: 2.9340 train_time: 0.4m tok/s: 8334118 +236/20000 train_loss: 3.0109 train_time: 0.4m tok/s: 8333887 +237/20000 train_loss: 2.8947 train_time: 0.4m tok/s: 8333605 +238/20000 train_loss: 3.0771 train_time: 0.4m tok/s: 8333279 +239/20000 train_loss: 3.0042 train_time: 0.4m tok/s: 8333014 +240/20000 train_loss: 3.1526 train_time: 0.4m tok/s: 8332792 +241/20000 train_loss: 3.0189 train_time: 0.4m tok/s: 8332808 +242/20000 train_loss: 3.0885 train_time: 0.4m tok/s: 8333043 +243/20000 train_loss: 3.0007 train_time: 0.4m tok/s: 8333273 +244/20000 train_loss: 3.0391 train_time: 0.4m tok/s: 8333479 +245/20000 train_loss: 2.9846 train_time: 0.4m tok/s: 8333159 +246/20000 train_loss: 3.0338 train_time: 0.4m tok/s: 8333217 +247/20000 train_loss: 2.9753 train_time: 0.4m tok/s: 8332871 +248/20000 train_loss: 2.8992 train_time: 0.4m tok/s: 8333017 +249/20000 train_loss: 2.9800 train_time: 0.4m tok/s: 8332803 +250/20000 train_loss: 2.9859 train_time: 0.4m tok/s: 8332863 +251/20000 train_loss: 2.9328 train_time: 0.4m tok/s: 8332695 +252/20000 train_loss: 2.9331 train_time: 0.4m tok/s: 8332541 +253/20000 train_loss: 3.0358 train_time: 0.4m tok/s: 8332566 +254/20000 train_loss: 3.0861 train_time: 0.4m tok/s: 8332325 +255/20000 train_loss: 3.1017 train_time: 0.4m tok/s: 8332127 +256/20000 train_loss: 2.9651 train_time: 0.4m tok/s: 8332143 +257/20000 train_loss: 2.9613 train_time: 0.4m tok/s: 8332171 +258/20000 train_loss: 3.0156 train_time: 0.4m tok/s: 8331260 +259/20000 train_loss: 2.9484 train_time: 0.4m tok/s: 8331211 +260/20000 train_loss: 3.1488 train_time: 0.4m tok/s: 8331242 +261/20000 train_loss: 2.9399 train_time: 0.4m tok/s: 8330621 +262/20000 train_loss: 2.7864 train_time: 0.4m tok/s: 8330479 +263/20000 train_loss: 2.7959 train_time: 0.4m tok/s: 8330379 +264/20000 train_loss: 2.9745 train_time: 0.4m tok/s: 8330305 +265/20000 train_loss: 2.9932 train_time: 0.4m tok/s: 8330181 +266/20000 train_loss: 2.9231 train_time: 0.4m tok/s: 8329836 +267/20000 train_loss: 2.9311 train_time: 0.4m tok/s: 8329800 +268/20000 train_loss: 3.0145 train_time: 0.4m tok/s: 8329781 +269/20000 train_loss: 2.9962 train_time: 0.4m tok/s: 8329907 +270/20000 train_loss: 2.9980 train_time: 0.4m tok/s: 8329722 +271/20000 train_loss: 3.0016 train_time: 0.4m tok/s: 8329809 +272/20000 train_loss: 3.0594 train_time: 0.4m tok/s: 8329559 +273/20000 train_loss: 2.9292 train_time: 0.4m tok/s: 8329830 +274/20000 train_loss: 3.0246 train_time: 0.4m tok/s: 8329830 +275/20000 train_loss: 2.9425 train_time: 0.4m tok/s: 8329905 +276/20000 train_loss: 2.8729 train_time: 0.4m tok/s: 8330087 +277/20000 train_loss: 2.8671 train_time: 0.4m tok/s: 8329803 +278/20000 train_loss: 2.8325 train_time: 0.4m tok/s: 8328977 +279/20000 train_loss: 2.9669 train_time: 0.4m tok/s: 8328393 +280/20000 train_loss: 3.0117 train_time: 0.4m tok/s: 8328377 +281/20000 train_loss: 2.7592 train_time: 0.4m tok/s: 8328443 +282/20000 train_loss: 3.0679 train_time: 0.4m tok/s: 8328279 +283/20000 train_loss: 2.8644 train_time: 0.4m tok/s: 8328146 +284/20000 train_loss: 2.9139 train_time: 0.4m tok/s: 8326561 +285/20000 train_loss: 2.9631 train_time: 0.4m tok/s: 8327736 +286/20000 train_loss: 2.9969 train_time: 0.5m tok/s: 8327540 +287/20000 train_loss: 2.8249 train_time: 0.5m tok/s: 8327518 +288/20000 train_loss: 2.9647 train_time: 0.5m tok/s: 8327410 +289/20000 train_loss: 2.8874 train_time: 0.5m tok/s: 8327067 +290/20000 train_loss: 2.9129 train_time: 0.5m tok/s: 8326979 +291/20000 train_loss: 2.8766 train_time: 0.5m tok/s: 8326835 +292/20000 train_loss: 2.7213 train_time: 0.5m tok/s: 8326555 +293/20000 train_loss: 2.9355 train_time: 0.5m tok/s: 8326422 +294/20000 train_loss: 3.0614 train_time: 0.5m tok/s: 8326024 +295/20000 train_loss: 3.0062 train_time: 0.5m tok/s: 8326180 +296/20000 train_loss: 3.0598 train_time: 0.5m tok/s: 8326122 +297/20000 train_loss: 2.9419 train_time: 0.5m tok/s: 8325805 +298/20000 train_loss: 2.9812 train_time: 0.5m tok/s: 8325823 +299/20000 train_loss: 2.8210 train_time: 0.5m tok/s: 8325807 +300/20000 train_loss: 3.0208 train_time: 0.5m tok/s: 8325652 +301/20000 train_loss: 2.9634 train_time: 0.5m tok/s: 8325865 +302/20000 train_loss: 2.8640 train_time: 0.5m tok/s: 8325882 +303/20000 train_loss: 2.9272 train_time: 0.5m tok/s: 8325838 +304/20000 train_loss: 2.9339 train_time: 0.5m tok/s: 8325687 +305/20000 train_loss: 2.9361 train_time: 0.5m tok/s: 8325436 +306/20000 train_loss: 3.0064 train_time: 0.5m tok/s: 8325389 +307/20000 train_loss: 2.9115 train_time: 0.5m tok/s: 8325390 +308/20000 train_loss: 2.8993 train_time: 0.5m tok/s: 8325252 +309/20000 train_loss: 3.0366 train_time: 0.5m tok/s: 8324593 +310/20000 train_loss: 2.8584 train_time: 0.5m tok/s: 8325114 +311/20000 train_loss: 2.9297 train_time: 0.5m tok/s: 8325095 +312/20000 train_loss: 2.8386 train_time: 0.5m tok/s: 8324901 +313/20000 train_loss: 2.8361 train_time: 0.5m tok/s: 8324801 +314/20000 train_loss: 2.8713 train_time: 0.5m tok/s: 8324676 +315/20000 train_loss: 2.9604 train_time: 0.5m tok/s: 8324185 +316/20000 train_loss: 2.6961 train_time: 0.5m tok/s: 8324127 +317/20000 train_loss: 2.8048 train_time: 0.5m tok/s: 8323795 +318/20000 train_loss: 2.9182 train_time: 0.5m tok/s: 8323765 +319/20000 train_loss: 2.8973 train_time: 0.5m tok/s: 8323611 +320/20000 train_loss: 3.0276 train_time: 0.5m tok/s: 8322991 +321/20000 train_loss: 3.0035 train_time: 0.5m tok/s: 8323188 +322/20000 train_loss: 2.9657 train_time: 0.5m tok/s: 8323153 +323/20000 train_loss: 3.0083 train_time: 0.5m tok/s: 8323023 +324/20000 train_loss: 2.9081 train_time: 0.5m tok/s: 8322938 +325/20000 train_loss: 2.8969 train_time: 0.5m tok/s: 8322398 +326/20000 train_loss: 2.8953 train_time: 0.5m tok/s: 8322688 +327/20000 train_loss: 2.8332 train_time: 0.5m tok/s: 8322637 +328/20000 train_loss: 2.8604 train_time: 0.5m tok/s: 8322712 +329/20000 train_loss: 2.8223 train_time: 0.5m tok/s: 8322630 +330/20000 train_loss: 2.7780 train_time: 0.5m tok/s: 8322781 +331/20000 train_loss: 2.8899 train_time: 0.5m tok/s: 8322418 +332/20000 train_loss: 2.9727 train_time: 0.5m tok/s: 8322081 +333/20000 train_loss: 2.8749 train_time: 0.5m tok/s: 8321993 +334/20000 train_loss: 3.0885 train_time: 0.5m tok/s: 8322059 +335/20000 train_loss: 2.8467 train_time: 0.5m tok/s: 8321755 +336/20000 train_loss: 2.9329 train_time: 0.5m tok/s: 8321719 +337/20000 train_loss: 2.8175 train_time: 0.5m tok/s: 8321781 +338/20000 train_loss: 2.8875 train_time: 0.5m tok/s: 8321875 +339/20000 train_loss: 2.9367 train_time: 0.5m tok/s: 8321947 +340/20000 train_loss: 2.9688 train_time: 0.5m tok/s: 8321799 +341/20000 train_loss: 2.9263 train_time: 0.5m tok/s: 8321725 +342/20000 train_loss: 2.8094 train_time: 0.5m tok/s: 8321677 +343/20000 train_loss: 2.9143 train_time: 0.5m tok/s: 8321646 +344/20000 train_loss: 2.8264 train_time: 0.5m tok/s: 8321479 +345/20000 train_loss: 2.8572 train_time: 0.5m tok/s: 8321297 +346/20000 train_loss: 2.8709 train_time: 0.5m tok/s: 8321213 +347/20000 train_loss: 2.8884 train_time: 0.5m tok/s: 8321148 +348/20000 train_loss: 2.8568 train_time: 0.5m tok/s: 8321013 +349/20000 train_loss: 2.9316 train_time: 0.5m tok/s: 8321070 +350/20000 train_loss: 2.7782 train_time: 0.6m tok/s: 8321058 +351/20000 train_loss: 2.7981 train_time: 0.6m tok/s: 8321142 +352/20000 train_loss: 2.7647 train_time: 0.6m tok/s: 8320978 +353/20000 train_loss: 2.6155 train_time: 0.6m tok/s: 8320685 +354/20000 train_loss: 2.9856 train_time: 0.6m tok/s: 8320417 +355/20000 train_loss: 2.9201 train_time: 0.6m tok/s: 8320309 +356/20000 train_loss: 2.8311 train_time: 0.6m tok/s: 8320034 +357/20000 train_loss: 2.7778 train_time: 0.6m tok/s: 8319629 +358/20000 train_loss: 2.7863 train_time: 0.6m tok/s: 8319566 +359/20000 train_loss: 2.8946 train_time: 0.6m tok/s: 8319542 +360/20000 train_loss: 2.8906 train_time: 0.6m tok/s: 8319427 +361/20000 train_loss: 2.9662 train_time: 0.6m tok/s: 8319130 +362/20000 train_loss: 2.8707 train_time: 0.6m tok/s: 8318978 +363/20000 train_loss: 2.9528 train_time: 0.6m tok/s: 8318867 +364/20000 train_loss: 2.8166 train_time: 0.6m tok/s: 8318392 +365/20000 train_loss: 2.8019 train_time: 0.6m tok/s: 8318405 +366/20000 train_loss: 2.8067 train_time: 0.6m tok/s: 8318201 +367/20000 train_loss: 2.9206 train_time: 0.6m tok/s: 8318214 +368/20000 train_loss: 2.7269 train_time: 0.6m tok/s: 8317967 +369/20000 train_loss: 2.8811 train_time: 0.6m tok/s: 8317919 +370/20000 train_loss: 2.8711 train_time: 0.6m tok/s: 8317879 +371/20000 train_loss: 2.8781 train_time: 0.6m tok/s: 8317929 +372/20000 train_loss: 2.8298 train_time: 0.6m tok/s: 8317786 +373/20000 train_loss: 2.7126 train_time: 0.6m tok/s: 8318007 +374/20000 train_loss: 2.7281 train_time: 0.6m tok/s: 8318016 +375/20000 train_loss: 2.6767 train_time: 0.6m tok/s: 8317858 +376/20000 train_loss: 2.9141 train_time: 0.6m tok/s: 8317740 +377/20000 train_loss: 2.7271 train_time: 0.6m tok/s: 8317852 +378/20000 train_loss: 2.8231 train_time: 0.6m tok/s: 8317857 +379/20000 train_loss: 2.8799 train_time: 0.6m tok/s: 8317751 +380/20000 train_loss: 2.8882 train_time: 0.6m tok/s: 8317532 +381/20000 train_loss: 2.9014 train_time: 0.6m tok/s: 8317431 +382/20000 train_loss: 2.9543 train_time: 0.6m tok/s: 8317344 +383/20000 train_loss: 2.9433 train_time: 0.6m tok/s: 8317490 +384/20000 train_loss: 2.8102 train_time: 0.6m tok/s: 8317269 +385/20000 train_loss: 2.8273 train_time: 0.6m tok/s: 8317080 +386/20000 train_loss: 2.8751 train_time: 0.6m tok/s: 8317092 +387/20000 train_loss: 3.0446 train_time: 0.6m tok/s: 8316158 +388/20000 train_loss: 2.8779 train_time: 0.6m tok/s: 8316816 +389/20000 train_loss: 2.9085 train_time: 0.6m tok/s: 8316926 +390/20000 train_loss: 2.7441 train_time: 0.6m tok/s: 8316754 +391/20000 train_loss: 2.7159 train_time: 0.6m tok/s: 8316713 +392/20000 train_loss: 2.7758 train_time: 0.6m tok/s: 8316594 +393/20000 train_loss: 2.8452 train_time: 0.6m tok/s: 8316843 +394/20000 train_loss: 2.8291 train_time: 0.6m tok/s: 8316693 +395/20000 train_loss: 2.9269 train_time: 0.6m tok/s: 8316377 +396/20000 train_loss: 2.8220 train_time: 0.6m tok/s: 8316141 +397/20000 train_loss: 2.8282 train_time: 0.6m tok/s: 8316093 +398/20000 train_loss: 2.8551 train_time: 0.6m tok/s: 8315985 +399/20000 train_loss: 2.7685 train_time: 0.6m tok/s: 8315874 +400/20000 train_loss: 2.8592 train_time: 0.6m tok/s: 8315853 +401/20000 train_loss: 2.8565 train_time: 0.6m tok/s: 8316107 +402/20000 train_loss: 2.7225 train_time: 0.6m tok/s: 8316121 +403/20000 train_loss: 2.9507 train_time: 0.6m tok/s: 8316004 +404/20000 train_loss: 2.9292 train_time: 0.6m tok/s: 8315904 +405/20000 train_loss: 2.9241 train_time: 0.6m tok/s: 8315804 +406/20000 train_loss: 2.8062 train_time: 0.6m tok/s: 8315767 +407/20000 train_loss: 2.8308 train_time: 0.6m tok/s: 8315343 +408/20000 train_loss: 2.8341 train_time: 0.6m tok/s: 8315222 +409/20000 train_loss: 2.8022 train_time: 0.6m tok/s: 8315183 +410/20000 train_loss: 2.8703 train_time: 0.6m tok/s: 8315171 +411/20000 train_loss: 2.8135 train_time: 0.6m tok/s: 8315082 +412/20000 train_loss: 2.8144 train_time: 0.6m tok/s: 8315275 +413/20000 train_loss: 2.7057 train_time: 0.7m tok/s: 8315177 +414/20000 train_loss: 2.7273 train_time: 0.7m tok/s: 8315152 +415/20000 train_loss: 2.7018 train_time: 0.7m tok/s: 8315071 +416/20000 train_loss: 2.7630 train_time: 0.7m tok/s: 8315104 +417/20000 train_loss: 2.7665 train_time: 0.7m tok/s: 8315036 +418/20000 train_loss: 2.7874 train_time: 0.7m tok/s: 8314931 +419/20000 train_loss: 2.8066 train_time: 0.7m tok/s: 8314916 +420/20000 train_loss: 2.7979 train_time: 0.7m tok/s: 8314897 +421/20000 train_loss: 2.8659 train_time: 0.7m tok/s: 8315051 +422/20000 train_loss: 2.8414 train_time: 0.7m tok/s: 8315020 +423/20000 train_loss: 2.8364 train_time: 0.7m tok/s: 8314955 +424/20000 train_loss: 2.9020 train_time: 0.7m tok/s: 8314715 +425/20000 train_loss: 2.8008 train_time: 0.7m tok/s: 8314679 +426/20000 train_loss: 2.8228 train_time: 0.7m tok/s: 8314004 +427/20000 train_loss: 2.8269 train_time: 0.7m tok/s: 8314176 +428/20000 train_loss: 2.7908 train_time: 0.7m tok/s: 8314169 +429/20000 train_loss: 2.7270 train_time: 0.7m tok/s: 8314242 +430/20000 train_loss: 2.8591 train_time: 0.7m tok/s: 8314148 +431/20000 train_loss: 2.6819 train_time: 0.7m tok/s: 8314005 +432/20000 train_loss: 2.7330 train_time: 0.7m tok/s: 8314004 +433/20000 train_loss: 2.6762 train_time: 0.7m tok/s: 8314043 +434/20000 train_loss: 2.6429 train_time: 0.7m tok/s: 8313725 +435/20000 train_loss: 2.8633 train_time: 0.7m tok/s: 8313267 +436/20000 train_loss: 2.4870 train_time: 0.7m tok/s: 8313056 +437/20000 train_loss: 2.7377 train_time: 0.7m tok/s: 8313060 +438/20000 train_loss: 2.8543 train_time: 0.7m tok/s: 8313230 +439/20000 train_loss: 2.7657 train_time: 0.7m tok/s: 8312519 +440/20000 train_loss: 2.6723 train_time: 0.7m tok/s: 8312566 +441/20000 train_loss: 2.9090 train_time: 0.7m tok/s: 8312498 +442/20000 train_loss: 2.9619 train_time: 0.7m tok/s: 8312471 +443/20000 train_loss: 2.9135 train_time: 0.7m tok/s: 8312548 +444/20000 train_loss: 2.9309 train_time: 0.7m tok/s: 8312575 +445/20000 train_loss: 2.8900 train_time: 0.7m tok/s: 8312426 +446/20000 train_loss: 2.7687 train_time: 0.7m tok/s: 8312654 +447/20000 train_loss: 2.7951 train_time: 0.7m tok/s: 8312541 +448/20000 train_loss: 2.8127 train_time: 0.7m tok/s: 8312831 +449/20000 train_loss: 2.7771 train_time: 0.7m tok/s: 8312591 +450/20000 train_loss: 2.8180 train_time: 0.7m tok/s: 8312718 +451/20000 train_loss: 2.5212 train_time: 0.7m tok/s: 8312486 +452/20000 train_loss: 2.7493 train_time: 0.7m tok/s: 8312205 +453/20000 train_loss: 2.6864 train_time: 0.7m tok/s: 8311986 +454/20000 train_loss: 2.6903 train_time: 0.7m tok/s: 8311729 +455/20000 train_loss: 2.7531 train_time: 0.7m tok/s: 8311628 +456/20000 train_loss: 2.7666 train_time: 0.7m tok/s: 8311696 +457/20000 train_loss: 2.6783 train_time: 0.7m tok/s: 8311469 +458/20000 train_loss: 2.7570 train_time: 0.7m tok/s: 8311507 +459/20000 train_loss: 2.8813 train_time: 0.7m tok/s: 8310666 +460/20000 train_loss: 2.8033 train_time: 0.7m tok/s: 8311006 +461/20000 train_loss: 2.8661 train_time: 0.7m tok/s: 8311107 +462/20000 train_loss: 2.9150 train_time: 0.7m tok/s: 8311145 +463/20000 train_loss: 2.8069 train_time: 0.7m tok/s: 8311213 +464/20000 train_loss: 2.7663 train_time: 0.7m tok/s: 8311417 +465/20000 train_loss: 2.9416 train_time: 0.7m tok/s: 8311225 +466/20000 train_loss: 2.8469 train_time: 0.7m tok/s: 8311148 +467/20000 train_loss: 2.8367 train_time: 0.7m tok/s: 8311163 +468/20000 train_loss: 2.9814 train_time: 0.7m tok/s: 8311023 +469/20000 train_loss: 2.7276 train_time: 0.7m tok/s: 8310674 +470/20000 train_loss: 2.7628 train_time: 0.7m tok/s: 8310762 +471/20000 train_loss: 2.8858 train_time: 0.7m tok/s: 8310412 +472/20000 train_loss: 2.9824 train_time: 0.7m tok/s: 8310005 +473/20000 train_loss: 2.7126 train_time: 0.7m tok/s: 8309616 +474/20000 train_loss: 2.6865 train_time: 0.7m tok/s: 8309650 +475/20000 train_loss: 2.8525 train_time: 0.7m tok/s: 8309686 +476/20000 train_loss: 2.6161 train_time: 0.8m tok/s: 8309297 +477/20000 train_loss: 2.7182 train_time: 0.8m tok/s: 8309275 +478/20000 train_loss: 2.8182 train_time: 0.8m tok/s: 8309285 +479/20000 train_loss: 2.7912 train_time: 0.8m tok/s: 8309196 +480/20000 train_loss: 3.0399 train_time: 0.8m tok/s: 8309076 +481/20000 train_loss: 2.8450 train_time: 0.8m tok/s: 8309086 +482/20000 train_loss: 2.7815 train_time: 0.8m tok/s: 8309044 +483/20000 train_loss: 2.8191 train_time: 0.8m tok/s: 8309004 +484/20000 train_loss: 2.8777 train_time: 0.8m tok/s: 8308939 +485/20000 train_loss: 2.7558 train_time: 0.8m tok/s: 8308935 +486/20000 train_loss: 2.7538 train_time: 0.8m tok/s: 8309007 +487/20000 train_loss: 2.8186 train_time: 0.8m tok/s: 8308980 +488/20000 train_loss: 2.7534 train_time: 0.8m tok/s: 8308836 +489/20000 train_loss: 2.3565 train_time: 0.8m tok/s: 8308651 +490/20000 train_loss: 2.8571 train_time: 0.8m tok/s: 8308609 +491/20000 train_loss: 2.7728 train_time: 0.8m tok/s: 8308671 +492/20000 train_loss: 2.7752 train_time: 0.8m tok/s: 8308680 +493/20000 train_loss: 2.6710 train_time: 0.8m tok/s: 8308673 +494/20000 train_loss: 2.6763 train_time: 0.8m tok/s: 8308566 +495/20000 train_loss: 2.7887 train_time: 0.8m tok/s: 8308473 +496/20000 train_loss: 2.6865 train_time: 0.8m tok/s: 8308195 +497/20000 train_loss: 2.9280 train_time: 0.8m tok/s: 8308275 +498/20000 train_loss: 2.8232 train_time: 0.8m tok/s: 8308297 +499/20000 train_loss: 2.9264 train_time: 0.8m tok/s: 8308394 +500/20000 train_loss: 2.7415 train_time: 0.8m tok/s: 8308360 +501/20000 train_loss: 2.9090 train_time: 0.8m tok/s: 8308509 +502/20000 train_loss: 2.7082 train_time: 0.8m tok/s: 8308528 +503/20000 train_loss: 2.7864 train_time: 0.8m tok/s: 8308354 +504/20000 train_loss: 2.6837 train_time: 0.8m tok/s: 8308240 +505/20000 train_loss: 2.8765 train_time: 0.8m tok/s: 8308086 +506/20000 train_loss: 2.7881 train_time: 0.8m tok/s: 8308065 +507/20000 train_loss: 2.7592 train_time: 0.8m tok/s: 8308005 +508/20000 train_loss: 2.9152 train_time: 0.8m tok/s: 8308100 +509/20000 train_loss: 2.8975 train_time: 0.8m tok/s: 8308172 +510/20000 train_loss: 2.6950 train_time: 0.8m tok/s: 8308205 +511/20000 train_loss: 2.8679 train_time: 0.8m tok/s: 8307947 +512/20000 train_loss: 2.8523 train_time: 0.8m tok/s: 8307925 +513/20000 train_loss: 2.8902 train_time: 0.8m tok/s: 8307941 +514/20000 train_loss: 2.8456 train_time: 0.8m tok/s: 8307848 +515/20000 train_loss: 2.8484 train_time: 0.8m tok/s: 8307424 +516/20000 train_loss: 2.7200 train_time: 0.8m tok/s: 8307382 +517/20000 train_loss: 2.8059 train_time: 0.8m tok/s: 8307449 +518/20000 train_loss: 2.9380 train_time: 0.8m tok/s: 8307561 +519/20000 train_loss: 2.7365 train_time: 0.8m tok/s: 8307477 +520/20000 train_loss: 2.6585 train_time: 0.8m tok/s: 8307486 +521/20000 train_loss: 2.7645 train_time: 0.8m tok/s: 8307430 +522/20000 train_loss: 2.7411 train_time: 0.8m tok/s: 8307471 +523/20000 train_loss: 2.7244 train_time: 0.8m tok/s: 8307383 +524/20000 train_loss: 2.7952 train_time: 0.8m tok/s: 8307405 +525/20000 train_loss: 2.7181 train_time: 0.8m tok/s: 8307043 +526/20000 train_loss: 2.8226 train_time: 0.8m tok/s: 8307152 +527/20000 train_loss: 2.8789 train_time: 0.8m tok/s: 8306749 +528/20000 train_loss: 2.8669 train_time: 0.8m tok/s: 8306605 +529/20000 train_loss: 2.8805 train_time: 0.8m tok/s: 8306536 +530/20000 train_loss: 2.8998 train_time: 0.8m tok/s: 8306383 +531/20000 train_loss: 3.2048 train_time: 0.8m tok/s: 8306310 +532/20000 train_loss: 3.1061 train_time: 0.8m tok/s: 8305992 +533/20000 train_loss: 2.6725 train_time: 0.8m tok/s: 8306018 +534/20000 train_loss: 2.8683 train_time: 0.8m tok/s: 8306094 +535/20000 train_loss: 2.7898 train_time: 0.8m tok/s: 8305996 +536/20000 train_loss: 2.6734 train_time: 0.8m tok/s: 8305657 +537/20000 train_loss: 2.8922 train_time: 0.8m tok/s: 8305362 +538/20000 train_loss: 2.7378 train_time: 0.8m tok/s: 8305328 +539/20000 train_loss: 2.8475 train_time: 0.9m tok/s: 8305424 +540/20000 train_loss: 2.8540 train_time: 0.9m tok/s: 8305267 +541/20000 train_loss: 2.2827 train_time: 0.9m tok/s: 8305051 +542/20000 train_loss: 2.8403 train_time: 0.9m tok/s: 8304830 +543/20000 train_loss: 2.8078 train_time: 0.9m tok/s: 8304863 +544/20000 train_loss: 2.8363 train_time: 0.9m tok/s: 8304875 +545/20000 train_loss: 2.7789 train_time: 0.9m tok/s: 8304837 +546/20000 train_loss: 2.8094 train_time: 0.9m tok/s: 8304845 +547/20000 train_loss: 2.7682 train_time: 0.9m tok/s: 8304970 +548/20000 train_loss: 2.7376 train_time: 0.9m tok/s: 8304624 +549/20000 train_loss: 2.6815 train_time: 0.9m tok/s: 8304422 +550/20000 train_loss: 2.7734 train_time: 0.9m tok/s: 8304351 +551/20000 train_loss: 2.7257 train_time: 0.9m tok/s: 8304311 +552/20000 train_loss: 2.8900 train_time: 0.9m tok/s: 8303824 +553/20000 train_loss: 2.7458 train_time: 0.9m tok/s: 8303474 +554/20000 train_loss: 2.5926 train_time: 0.9m tok/s: 8303460 +555/20000 train_loss: 2.6650 train_time: 0.9m tok/s: 8303556 +556/20000 train_loss: 2.7554 train_time: 0.9m tok/s: 8303659 +557/20000 train_loss: 2.8711 train_time: 0.9m tok/s: 8303756 +558/20000 train_loss: 2.8283 train_time: 0.9m tok/s: 8303526 +559/20000 train_loss: 2.7465 train_time: 0.9m tok/s: 8303731 +560/20000 train_loss: 2.7803 train_time: 0.9m tok/s: 8303558 +561/20000 train_loss: 2.7912 train_time: 0.9m tok/s: 8303536 +562/20000 train_loss: 2.8427 train_time: 0.9m tok/s: 8303559 +563/20000 train_loss: 2.8295 train_time: 0.9m tok/s: 8303601 +564/20000 train_loss: 2.9396 train_time: 0.9m tok/s: 8303538 +565/20000 train_loss: 2.8431 train_time: 0.9m tok/s: 8303338 +566/20000 train_loss: 2.7443 train_time: 0.9m tok/s: 8303263 +567/20000 train_loss: 2.6899 train_time: 0.9m tok/s: 8303353 +568/20000 train_loss: 2.8259 train_time: 0.9m tok/s: 8303435 +569/20000 train_loss: 2.6689 train_time: 0.9m tok/s: 8303411 +570/20000 train_loss: 2.6824 train_time: 0.9m tok/s: 8303272 +571/20000 train_loss: 2.7864 train_time: 0.9m tok/s: 8303098 +572/20000 train_loss: 2.6360 train_time: 0.9m tok/s: 8302952 +573/20000 train_loss: 2.6437 train_time: 0.9m tok/s: 8303003 +574/20000 train_loss: 2.7343 train_time: 0.9m tok/s: 8303107 +575/20000 train_loss: 2.5123 train_time: 0.9m tok/s: 8303195 +576/20000 train_loss: 2.7701 train_time: 0.9m tok/s: 8302931 +577/20000 train_loss: 2.8539 train_time: 0.9m tok/s: 8302634 +578/20000 train_loss: 2.8321 train_time: 0.9m tok/s: 8302643 +579/20000 train_loss: 2.7331 train_time: 0.9m tok/s: 8302745 +580/20000 train_loss: 2.8064 train_time: 0.9m tok/s: 8302889 +581/20000 train_loss: 2.7787 train_time: 0.9m tok/s: 8302723 +582/20000 train_loss: 2.7719 train_time: 0.9m tok/s: 8302499 +583/20000 train_loss: 2.7269 train_time: 0.9m tok/s: 8302717 +584/20000 train_loss: 2.7871 train_time: 0.9m tok/s: 8302759 +585/20000 train_loss: 2.8013 train_time: 0.9m tok/s: 8302767 +586/20000 train_loss: 2.6355 train_time: 0.9m tok/s: 8302706 +587/20000 train_loss: 2.7267 train_time: 0.9m tok/s: 8302674 +588/20000 train_loss: 2.7054 train_time: 0.9m tok/s: 8302707 +589/20000 train_loss: 2.7467 train_time: 0.9m tok/s: 8302532 +590/20000 train_loss: 2.7655 train_time: 0.9m tok/s: 8302700 +591/20000 train_loss: 2.7554 train_time: 0.9m tok/s: 8302719 +592/20000 train_loss: 2.7378 train_time: 0.9m tok/s: 8302804 +593/20000 train_loss: 2.7391 train_time: 0.9m tok/s: 8302773 +594/20000 train_loss: 2.6362 train_time: 0.9m tok/s: 8302780 +595/20000 train_loss: 2.8036 train_time: 0.9m tok/s: 8302789 +596/20000 train_loss: 2.6761 train_time: 0.9m tok/s: 8302842 +597/20000 train_loss: 2.7522 train_time: 0.9m tok/s: 8302926 +598/20000 train_loss: 2.7887 train_time: 0.9m tok/s: 8302947 +599/20000 train_loss: 2.6972 train_time: 0.9m tok/s: 8302596 +600/20000 train_loss: 2.7420 train_time: 0.9m tok/s: 8302782 +601/20000 train_loss: 2.7255 train_time: 0.9m tok/s: 8302690 +602/20000 train_loss: 2.7601 train_time: 1.0m tok/s: 8302625 +603/20000 train_loss: 2.7557 train_time: 1.0m tok/s: 8302632 +604/20000 train_loss: 2.7439 train_time: 1.0m tok/s: 8302469 +605/20000 train_loss: 2.6520 train_time: 1.0m tok/s: 8302464 +606/20000 train_loss: 2.6518 train_time: 1.0m tok/s: 8302559 +607/20000 train_loss: 2.7337 train_time: 1.0m tok/s: 8302665 +608/20000 train_loss: 2.6602 train_time: 1.0m tok/s: 8302493 +609/20000 train_loss: 2.7255 train_time: 1.0m tok/s: 8302490 +610/20000 train_loss: 2.7681 train_time: 1.0m tok/s: 8302524 +611/20000 train_loss: 2.8850 train_time: 1.0m tok/s: 8302421 +612/20000 train_loss: 2.8319 train_time: 1.0m tok/s: 8302218 +613/20000 train_loss: 2.8064 train_time: 1.0m tok/s: 8302200 +614/20000 train_loss: 2.7984 train_time: 1.0m tok/s: 8302113 +615/20000 train_loss: 2.7475 train_time: 1.0m tok/s: 8302221 +616/20000 train_loss: 2.7714 train_time: 1.0m tok/s: 8302063 +617/20000 train_loss: 2.7297 train_time: 1.0m tok/s: 8301959 +618/20000 train_loss: 2.7396 train_time: 1.0m tok/s: 8301972 +619/20000 train_loss: 2.7908 train_time: 1.0m tok/s: 8301747 +620/20000 train_loss: 2.8857 train_time: 1.0m tok/s: 8301952 +621/20000 train_loss: 2.6886 train_time: 1.0m tok/s: 8302000 +622/20000 train_loss: 2.7241 train_time: 1.0m tok/s: 8301980 +623/20000 train_loss: 2.7303 train_time: 1.0m tok/s: 8301922 +624/20000 train_loss: 2.4400 train_time: 1.0m tok/s: 8301763 +625/20000 train_loss: 2.7518 train_time: 1.0m tok/s: 8301811 +626/20000 train_loss: 2.8937 train_time: 1.0m tok/s: 8301753 +627/20000 train_loss: 2.6930 train_time: 1.0m tok/s: 8301578 +628/20000 train_loss: 2.8563 train_time: 1.0m tok/s: 8301499 +629/20000 train_loss: 2.8459 train_time: 1.0m tok/s: 8301471 +630/20000 train_loss: 2.6994 train_time: 1.0m tok/s: 8301487 +631/20000 train_loss: 2.8224 train_time: 1.0m tok/s: 8301441 +632/20000 train_loss: 2.8377 train_time: 1.0m tok/s: 8301459 +633/20000 train_loss: 2.7188 train_time: 1.0m tok/s: 8301591 +634/20000 train_loss: 2.9365 train_time: 1.0m tok/s: 8301696 +635/20000 train_loss: 2.7398 train_time: 1.0m tok/s: 8301572 +636/20000 train_loss: 2.8711 train_time: 1.0m tok/s: 8301419 +637/20000 train_loss: 2.7615 train_time: 1.0m tok/s: 8301402 +638/20000 train_loss: 2.5716 train_time: 1.0m tok/s: 8301358 +639/20000 train_loss: 2.7245 train_time: 1.0m tok/s: 8301265 +640/20000 train_loss: 2.7088 train_time: 1.0m tok/s: 8301065 +641/20000 train_loss: 2.7861 train_time: 1.0m tok/s: 8301032 +642/20000 train_loss: 2.7937 train_time: 1.0m tok/s: 8300986 +643/20000 train_loss: 2.7564 train_time: 1.0m tok/s: 8301067 +644/20000 train_loss: 2.7995 train_time: 1.0m tok/s: 8301135 +645/20000 train_loss: 2.8742 train_time: 1.0m tok/s: 8301249 +646/20000 train_loss: 2.7762 train_time: 1.0m tok/s: 8301345 +647/20000 train_loss: 2.8247 train_time: 1.0m tok/s: 8301089 +648/20000 train_loss: 2.7391 train_time: 1.0m tok/s: 8300882 +649/20000 train_loss: 2.8735 train_time: 1.0m tok/s: 8301025 +650/20000 train_loss: 2.7573 train_time: 1.0m tok/s: 8301094 +651/20000 train_loss: 2.7429 train_time: 1.0m tok/s: 8301007 +652/20000 train_loss: 2.6974 train_time: 1.0m tok/s: 8300953 +653/20000 train_loss: 2.6555 train_time: 1.0m tok/s: 8300952 +654/20000 train_loss: 2.7097 train_time: 1.0m tok/s: 8300979 +655/20000 train_loss: 2.6993 train_time: 1.0m tok/s: 8300733 +656/20000 train_loss: 2.6450 train_time: 1.0m tok/s: 8300553 +657/20000 train_loss: 2.6565 train_time: 1.0m tok/s: 8300554 +658/20000 train_loss: 2.6944 train_time: 1.0m tok/s: 8300561 +659/20000 train_loss: 2.7425 train_time: 1.0m tok/s: 8300678 +660/20000 train_loss: 2.7526 train_time: 1.0m tok/s: 8300578 +661/20000 train_loss: 2.7961 train_time: 1.0m tok/s: 8300656 +662/20000 train_loss: 2.6871 train_time: 1.0m tok/s: 8300712 +663/20000 train_loss: 2.7855 train_time: 1.0m tok/s: 8300710 +664/20000 train_loss: 2.7906 train_time: 1.0m tok/s: 8300368 +665/20000 train_loss: 2.8353 train_time: 1.1m tok/s: 8300204 +666/20000 train_loss: 2.8224 train_time: 1.1m tok/s: 8300259 +667/20000 train_loss: 2.7619 train_time: 1.1m tok/s: 8300003 +668/20000 train_loss: 2.7443 train_time: 1.1m tok/s: 8300171 +669/20000 train_loss: 2.6291 train_time: 1.1m tok/s: 8300104 +670/20000 train_loss: 2.6398 train_time: 1.1m tok/s: 8300126 +671/20000 train_loss: 2.6513 train_time: 1.1m tok/s: 8300095 +672/20000 train_loss: 2.7804 train_time: 1.1m tok/s: 8300019 +673/20000 train_loss: 2.6199 train_time: 1.1m tok/s: 8300003 +674/20000 train_loss: 2.8544 train_time: 1.1m tok/s: 8299919 +675/20000 train_loss: 2.6126 train_time: 1.1m tok/s: 8299743 +676/20000 train_loss: 2.8452 train_time: 1.1m tok/s: 8299680 +677/20000 train_loss: 2.6643 train_time: 1.1m tok/s: 8299801 +678/20000 train_loss: 2.7601 train_time: 1.1m tok/s: 8299801 +679/20000 train_loss: 2.6939 train_time: 1.1m tok/s: 8299822 +680/20000 train_loss: 2.9021 train_time: 1.1m tok/s: 8299697 +681/20000 train_loss: 2.7749 train_time: 1.1m tok/s: 8299658 +682/20000 train_loss: 2.8693 train_time: 1.1m tok/s: 8299538 +683/20000 train_loss: 2.8522 train_time: 1.1m tok/s: 8299474 +684/20000 train_loss: 2.7840 train_time: 1.1m tok/s: 8299431 +685/20000 train_loss: 2.6533 train_time: 1.1m tok/s: 8299422 +686/20000 train_loss: 2.8806 train_time: 1.1m tok/s: 8299320 +687/20000 train_loss: 2.7717 train_time: 1.1m tok/s: 8299462 +688/20000 train_loss: 2.7766 train_time: 1.1m tok/s: 8298888 +689/20000 train_loss: 2.8100 train_time: 1.1m tok/s: 8299109 +690/20000 train_loss: 2.7376 train_time: 1.1m tok/s: 8299038 +691/20000 train_loss: 2.8502 train_time: 1.1m tok/s: 8299063 +692/20000 train_loss: 2.9082 train_time: 1.1m tok/s: 8299104 +693/20000 train_loss: 2.8087 train_time: 1.1m tok/s: 8298815 +694/20000 train_loss: 2.8139 train_time: 1.1m tok/s: 8298749 +695/20000 train_loss: 2.7989 train_time: 1.1m tok/s: 8298795 +696/20000 train_loss: 2.8001 train_time: 1.1m tok/s: 8298833 +697/20000 train_loss: 2.6668 train_time: 1.1m tok/s: 8298802 +698/20000 train_loss: 2.8350 train_time: 1.1m tok/s: 8298736 +699/20000 train_loss: 2.6846 train_time: 1.1m tok/s: 8298721 +700/20000 train_loss: 2.6287 train_time: 1.1m tok/s: 8298823 +701/20000 train_loss: 2.6323 train_time: 1.1m tok/s: 8298815 +702/20000 train_loss: 2.6233 train_time: 1.1m tok/s: 8298707 +703/20000 train_loss: 2.5130 train_time: 1.1m tok/s: 8298596 +704/20000 train_loss: 2.8452 train_time: 1.1m tok/s: 8298578 +705/20000 train_loss: 2.7733 train_time: 1.1m tok/s: 8298290 +706/20000 train_loss: 2.7536 train_time: 1.1m tok/s: 8298109 +707/20000 train_loss: 2.7517 train_time: 1.1m tok/s: 8298178 +708/20000 train_loss: 2.8362 train_time: 1.1m tok/s: 8297974 +709/20000 train_loss: 2.8019 train_time: 1.1m tok/s: 8298228 +710/20000 train_loss: 2.6346 train_time: 1.1m tok/s: 8298160 +711/20000 train_loss: 2.7097 train_time: 1.1m tok/s: 8298146 +712/20000 train_loss: 2.6394 train_time: 1.1m tok/s: 8298161 +713/20000 train_loss: 2.6951 train_time: 1.1m tok/s: 8298121 +714/20000 train_loss: 2.7545 train_time: 1.1m tok/s: 8298128 +715/20000 train_loss: 2.7078 train_time: 1.1m tok/s: 8298108 +716/20000 train_loss: 2.7246 train_time: 1.1m tok/s: 8298145 +717/20000 train_loss: 2.9165 train_time: 1.1m tok/s: 8297654 +718/20000 train_loss: 2.8169 train_time: 1.1m tok/s: 8297984 +719/20000 train_loss: 2.7478 train_time: 1.1m tok/s: 8298118 +720/20000 train_loss: 2.6921 train_time: 1.1m tok/s: 8298092 +721/20000 train_loss: 2.8326 train_time: 1.1m tok/s: 8297968 +722/20000 train_loss: 2.6637 train_time: 1.1m tok/s: 8297759 +723/20000 train_loss: 2.8672 train_time: 1.1m tok/s: 8297761 +724/20000 train_loss: 2.7640 train_time: 1.1m tok/s: 8297829 +725/20000 train_loss: 2.6268 train_time: 1.1m tok/s: 8297886 +726/20000 train_loss: 2.7807 train_time: 1.1m tok/s: 8297955 +727/20000 train_loss: 2.6035 train_time: 1.1m tok/s: 8297960 +728/20000 train_loss: 2.7978 train_time: 1.1m tok/s: 8297941 +729/20000 train_loss: 2.8350 train_time: 1.2m tok/s: 8297817 +730/20000 train_loss: 2.7661 train_time: 1.2m tok/s: 8297810 +731/20000 train_loss: 2.8660 train_time: 1.2m tok/s: 8297866 +732/20000 train_loss: 2.7096 train_time: 1.2m tok/s: 8297875 +733/20000 train_loss: 2.8850 train_time: 1.2m tok/s: 8297671 +734/20000 train_loss: 2.7393 train_time: 1.2m tok/s: 8297484 +735/20000 train_loss: 2.7917 train_time: 1.2m tok/s: 8297398 +736/20000 train_loss: 2.6670 train_time: 1.2m tok/s: 8297406 +737/20000 train_loss: 2.8021 train_time: 1.2m tok/s: 8297345 +738/20000 train_loss: 2.6717 train_time: 1.2m tok/s: 8297280 +739/20000 train_loss: 2.5957 train_time: 1.2m tok/s: 8297372 +740/20000 train_loss: 2.8301 train_time: 1.2m tok/s: 8297033 +741/20000 train_loss: 2.8245 train_time: 1.2m tok/s: 8297366 +742/20000 train_loss: 2.6877 train_time: 1.2m tok/s: 8297417 +743/20000 train_loss: 2.8414 train_time: 1.2m tok/s: 8297476 +744/20000 train_loss: 2.7248 train_time: 1.2m tok/s: 8297446 +745/20000 train_loss: 2.7384 train_time: 1.2m tok/s: 8297403 +746/20000 train_loss: 2.8204 train_time: 1.2m tok/s: 8297465 +747/20000 train_loss: 2.7030 train_time: 1.2m tok/s: 8297495 +748/20000 train_loss: 2.7486 train_time: 1.2m tok/s: 8297484 +749/20000 train_loss: 2.8073 train_time: 1.2m tok/s: 8297444 +750/20000 train_loss: 2.8176 train_time: 1.2m tok/s: 8297397 +751/20000 train_loss: 2.6822 train_time: 1.2m tok/s: 8297350 +752/20000 train_loss: 2.7749 train_time: 1.2m tok/s: 8297248 +753/20000 train_loss: 2.4224 train_time: 1.2m tok/s: 8297080 +754/20000 train_loss: 2.6687 train_time: 1.2m tok/s: 8296968 +755/20000 train_loss: 2.8643 train_time: 1.2m tok/s: 8297051 +756/20000 train_loss: 3.1201 train_time: 1.2m tok/s: 8297052 +757/20000 train_loss: 2.7877 train_time: 1.2m tok/s: 8296978 +758/20000 train_loss: 2.7214 train_time: 1.2m tok/s: 8296942 +759/20000 train_loss: 2.6822 train_time: 1.2m tok/s: 8296550 +760/20000 train_loss: 2.8602 train_time: 1.2m tok/s: 8296901 +761/20000 train_loss: 2.7322 train_time: 1.2m tok/s: 8296956 +762/20000 train_loss: 2.8311 train_time: 1.2m tok/s: 8297097 +763/20000 train_loss: 2.6589 train_time: 1.2m tok/s: 8297098 +764/20000 train_loss: 2.7034 train_time: 1.2m tok/s: 8296931 +765/20000 train_loss: 2.6744 train_time: 1.2m tok/s: 8296835 +766/20000 train_loss: 2.6684 train_time: 1.2m tok/s: 8296745 +767/20000 train_loss: 2.6903 train_time: 1.2m tok/s: 8296716 +768/20000 train_loss: 2.7369 train_time: 1.2m tok/s: 8296698 +769/20000 train_loss: 2.7653 train_time: 1.2m tok/s: 8296782 +770/20000 train_loss: 2.7693 train_time: 1.2m tok/s: 8296763 +771/20000 train_loss: 2.7840 train_time: 1.2m tok/s: 8296776 +772/20000 train_loss: 2.7676 train_time: 1.2m tok/s: 8296820 +773/20000 train_loss: 2.7031 train_time: 1.2m tok/s: 8296771 +774/20000 train_loss: 2.8448 train_time: 1.2m tok/s: 8296814 +775/20000 train_loss: 2.8117 train_time: 1.2m tok/s: 8296910 +776/20000 train_loss: 2.8967 train_time: 1.2m tok/s: 8296747 +777/20000 train_loss: 2.9946 train_time: 1.2m tok/s: 8296413 +778/20000 train_loss: 2.7078 train_time: 1.2m tok/s: 8296257 +779/20000 train_loss: 2.4482 train_time: 1.2m tok/s: 8296236 +780/20000 train_loss: 2.7765 train_time: 1.2m tok/s: 8296236 +781/20000 train_loss: 2.7510 train_time: 1.2m tok/s: 8296253 +782/20000 train_loss: 3.0301 train_time: 1.2m tok/s: 8296167 +783/20000 train_loss: 2.5320 train_time: 1.2m tok/s: 8295981 +784/20000 train_loss: 2.9025 train_time: 1.2m tok/s: 8296022 +785/20000 train_loss: 2.8582 train_time: 1.2m tok/s: 8296108 +786/20000 train_loss: 2.7230 train_time: 1.2m tok/s: 8296206 +787/20000 train_loss: 2.6346 train_time: 1.2m tok/s: 8296130 +788/20000 train_loss: 2.6754 train_time: 1.2m tok/s: 8296160 +789/20000 train_loss: 2.7905 train_time: 1.2m tok/s: 8296035 +790/20000 train_loss: 2.6396 train_time: 1.2m tok/s: 8295904 +791/20000 train_loss: 2.5933 train_time: 1.2m tok/s: 8295826 +792/20000 train_loss: 2.7325 train_time: 1.3m tok/s: 8295828 +793/20000 train_loss: 2.7115 train_time: 1.3m tok/s: 8295855 +794/20000 train_loss: 2.7200 train_time: 1.3m tok/s: 8295619 +795/20000 train_loss: 2.8574 train_time: 1.3m tok/s: 8295687 +796/20000 train_loss: 2.7175 train_time: 1.3m tok/s: 8295844 +797/20000 train_loss: 2.7364 train_time: 1.3m tok/s: 8295892 +798/20000 train_loss: 2.7487 train_time: 1.3m tok/s: 8295843 +799/20000 train_loss: 2.7865 train_time: 1.3m tok/s: 8295831 +800/20000 train_loss: 2.7143 train_time: 1.3m tok/s: 8295899 +801/20000 train_loss: 2.7490 train_time: 1.3m tok/s: 8295931 +802/20000 train_loss: 2.8162 train_time: 1.3m tok/s: 8295889 +803/20000 train_loss: 2.6884 train_time: 1.3m tok/s: 8295866 +804/20000 train_loss: 2.6690 train_time: 1.3m tok/s: 8295916 +805/20000 train_loss: 2.6819 train_time: 1.3m tok/s: 8295826 +806/20000 train_loss: 2.7969 train_time: 1.3m tok/s: 8295833 +807/20000 train_loss: 2.7816 train_time: 1.3m tok/s: 8295869 +808/20000 train_loss: 2.8153 train_time: 1.3m tok/s: 8295927 +809/20000 train_loss: 2.6283 train_time: 1.3m tok/s: 8295864 +810/20000 train_loss: 2.8478 train_time: 1.3m tok/s: 8295825 +811/20000 train_loss: 2.8485 train_time: 1.3m tok/s: 8295917 +812/20000 train_loss: 2.6848 train_time: 1.3m tok/s: 8295932 +813/20000 train_loss: 2.7511 train_time: 1.3m tok/s: 8295857 +814/20000 train_loss: 2.7852 train_time: 1.3m tok/s: 8295900 +815/20000 train_loss: 2.8946 train_time: 1.3m tok/s: 8295925 +816/20000 train_loss: 2.7158 train_time: 1.3m tok/s: 8296007 +817/20000 train_loss: 2.6982 train_time: 1.3m tok/s: 8296034 +818/20000 train_loss: 2.7748 train_time: 1.3m tok/s: 8296069 +819/20000 train_loss: 2.7918 train_time: 1.3m tok/s: 8295973 +820/20000 train_loss: 3.0389 train_time: 1.3m tok/s: 8295840 +821/20000 train_loss: 2.7877 train_time: 1.3m tok/s: 8295738 +822/20000 train_loss: 2.6005 train_time: 1.3m tok/s: 8295699 +823/20000 train_loss: 2.6417 train_time: 1.3m tok/s: 8295684 +824/20000 train_loss: 2.8072 train_time: 1.3m tok/s: 8295655 +825/20000 train_loss: 2.8816 train_time: 1.3m tok/s: 8295521 +826/20000 train_loss: 2.8607 train_time: 1.3m tok/s: 8295468 +827/20000 train_loss: 2.6405 train_time: 1.3m tok/s: 8295505 +828/20000 train_loss: 2.7220 train_time: 1.3m tok/s: 8295578 +829/20000 train_loss: 3.3518 train_time: 1.3m tok/s: 8295512 +830/20000 train_loss: 2.7548 train_time: 1.3m tok/s: 8295351 +831/20000 train_loss: 2.7382 train_time: 1.3m tok/s: 8295235 +832/20000 train_loss: 2.7740 train_time: 1.3m tok/s: 8295255 +833/20000 train_loss: 2.8629 train_time: 1.3m tok/s: 8295230 +834/20000 train_loss: 2.6932 train_time: 1.3m tok/s: 8295295 +835/20000 train_loss: 2.7784 train_time: 1.3m tok/s: 8295300 +836/20000 train_loss: 2.6204 train_time: 1.3m tok/s: 8295219 +837/20000 train_loss: 2.5056 train_time: 1.3m tok/s: 8295060 +838/20000 train_loss: 2.6266 train_time: 1.3m tok/s: 8294917 +839/20000 train_loss: 2.7218 train_time: 1.3m tok/s: 8294850 +840/20000 train_loss: 3.1152 train_time: 1.3m tok/s: 8294798 +841/20000 train_loss: 2.7132 train_time: 1.3m tok/s: 8294749 +842/20000 train_loss: 2.7220 train_time: 1.3m tok/s: 8294657 +843/20000 train_loss: 2.6449 train_time: 1.3m tok/s: 8294535 +844/20000 train_loss: 2.7341 train_time: 1.3m tok/s: 8294699 +845/20000 train_loss: 2.6809 train_time: 1.3m tok/s: 8294712 +846/20000 train_loss: 2.6698 train_time: 1.3m tok/s: 8294630 +847/20000 train_loss: 2.7196 train_time: 1.3m tok/s: 8294593 +848/20000 train_loss: 2.6391 train_time: 1.3m tok/s: 8294627 +849/20000 train_loss: 2.7447 train_time: 1.3m tok/s: 8294640 +850/20000 train_loss: 2.5853 train_time: 1.3m tok/s: 8294669 +851/20000 train_loss: 2.7553 train_time: 1.3m tok/s: 8294660 +852/20000 train_loss: 2.5657 train_time: 1.3m tok/s: 8294640 +853/20000 train_loss: 2.7151 train_time: 1.3m tok/s: 8294619 +854/20000 train_loss: 2.7049 train_time: 1.3m tok/s: 8294325 +855/20000 train_loss: 2.7534 train_time: 1.4m tok/s: 8294382 +856/20000 train_loss: 2.6900 train_time: 1.4m tok/s: 8294555 +857/20000 train_loss: 2.8341 train_time: 1.4m tok/s: 8294564 +858/20000 train_loss: 2.8270 train_time: 1.4m tok/s: 8294589 +859/20000 train_loss: 2.7284 train_time: 1.4m tok/s: 8294531 +860/20000 train_loss: 2.6671 train_time: 1.4m tok/s: 8294463 +861/20000 train_loss: 2.7085 train_time: 1.4m tok/s: 8294459 +862/20000 train_loss: 2.6575 train_time: 1.4m tok/s: 8294320 +863/20000 train_loss: 2.9023 train_time: 1.4m tok/s: 8294206 +864/20000 train_loss: 2.7471 train_time: 1.4m tok/s: 8294141 +865/20000 train_loss: 2.7772 train_time: 1.4m tok/s: 8294100 +866/20000 train_loss: 2.6312 train_time: 1.4m tok/s: 8294046 +867/20000 train_loss: 2.6686 train_time: 1.4m tok/s: 8294003 +868/20000 train_loss: 2.6776 train_time: 1.4m tok/s: 8293969 +869/20000 train_loss: 2.6940 train_time: 1.4m tok/s: 8294070 +870/20000 train_loss: 2.6665 train_time: 1.4m tok/s: 8294034 +871/20000 train_loss: 2.6537 train_time: 1.4m tok/s: 8293954 +872/20000 train_loss: 2.7489 train_time: 1.4m tok/s: 8294049 +873/20000 train_loss: 2.6562 train_time: 1.4m tok/s: 8294141 +874/20000 train_loss: 2.8032 train_time: 1.4m tok/s: 8294074 +875/20000 train_loss: 2.7548 train_time: 1.4m tok/s: 8294138 +876/20000 train_loss: 2.8111 train_time: 1.4m tok/s: 8294159 +877/20000 train_loss: 2.6979 train_time: 1.4m tok/s: 8294266 +878/20000 train_loss: 2.6905 train_time: 1.4m tok/s: 8294270 +879/20000 train_loss: 2.7355 train_time: 1.4m tok/s: 8294272 +880/20000 train_loss: 2.7430 train_time: 1.4m tok/s: 8294289 +881/20000 train_loss: 2.6663 train_time: 1.4m tok/s: 8294301 +882/20000 train_loss: 2.6867 train_time: 1.4m tok/s: 8294213 +883/20000 train_loss: 2.7811 train_time: 1.4m tok/s: 8294166 +884/20000 train_loss: 2.5199 train_time: 1.4m tok/s: 8294176 +885/20000 train_loss: 2.6323 train_time: 1.4m tok/s: 8294245 +886/20000 train_loss: 2.7023 train_time: 1.4m tok/s: 8294219 +887/20000 train_loss: 2.6647 train_time: 1.4m tok/s: 8294228 +888/20000 train_loss: 2.6402 train_time: 1.4m tok/s: 8294238 +889/20000 train_loss: 2.8013 train_time: 1.4m tok/s: 8294204 +890/20000 train_loss: 2.6004 train_time: 1.4m tok/s: 8294232 +891/20000 train_loss: 2.6834 train_time: 1.4m tok/s: 8294163 +892/20000 train_loss: 2.7063 train_time: 1.4m tok/s: 8294273 +893/20000 train_loss: 2.6458 train_time: 1.4m tok/s: 8294263 +894/20000 train_loss: 2.7050 train_time: 1.4m tok/s: 8294282 +895/20000 train_loss: 2.7347 train_time: 1.4m tok/s: 8294310 +896/20000 train_loss: 2.7970 train_time: 1.4m tok/s: 8294213 +897/20000 train_loss: 2.7144 train_time: 1.4m tok/s: 8294095 +898/20000 train_loss: 2.6857 train_time: 1.4m tok/s: 8294077 +899/20000 train_loss: 2.6688 train_time: 1.4m tok/s: 8294201 +900/20000 train_loss: 2.7170 train_time: 1.4m tok/s: 8293859 +901/20000 train_loss: 2.6337 train_time: 1.4m tok/s: 8294138 +902/20000 train_loss: 2.6247 train_time: 1.4m tok/s: 8294077 +903/20000 train_loss: 2.5815 train_time: 1.4m tok/s: 8294059 +904/20000 train_loss: 2.5665 train_time: 1.4m tok/s: 8294108 +905/20000 train_loss: 2.6955 train_time: 1.4m tok/s: 8294111 +906/20000 train_loss: 2.7737 train_time: 1.4m tok/s: 8294160 +907/20000 train_loss: 2.7549 train_time: 1.4m tok/s: 8294211 +908/20000 train_loss: 2.8321 train_time: 1.4m tok/s: 8294277 +909/20000 train_loss: 2.7711 train_time: 1.4m tok/s: 8294258 +910/20000 train_loss: 2.8257 train_time: 1.4m tok/s: 8294225 +911/20000 train_loss: 2.7224 train_time: 1.4m tok/s: 8294324 +912/20000 train_loss: 2.5640 train_time: 1.4m tok/s: 8293946 +913/20000 train_loss: 2.7284 train_time: 1.4m tok/s: 8293727 +914/20000 train_loss: 2.8046 train_time: 1.4m tok/s: 8293777 +915/20000 train_loss: 2.7424 train_time: 1.4m tok/s: 8293849 +916/20000 train_loss: 2.7503 train_time: 1.4m tok/s: 8293863 +917/20000 train_loss: 2.6817 train_time: 1.4m tok/s: 8293756 +918/20000 train_loss: 2.5597 train_time: 1.5m tok/s: 8293720 +919/20000 train_loss: 2.6169 train_time: 1.5m tok/s: 8293732 +920/20000 train_loss: 2.6384 train_time: 1.5m tok/s: 8293811 +921/20000 train_loss: 2.5038 train_time: 1.5m tok/s: 8293780 +922/20000 train_loss: 2.7040 train_time: 1.5m tok/s: 8293717 +923/20000 train_loss: 2.6313 train_time: 1.5m tok/s: 8293657 +924/20000 train_loss: 2.6171 train_time: 1.5m tok/s: 8293629 +925/20000 train_loss: 2.9362 train_time: 1.5m tok/s: 8293568 +926/20000 train_loss: 2.5529 train_time: 1.5m tok/s: 8293459 +927/20000 train_loss: 2.7464 train_time: 1.5m tok/s: 8293388 +928/20000 train_loss: 2.7986 train_time: 1.5m tok/s: 8293474 +929/20000 train_loss: 2.7133 train_time: 1.5m tok/s: 8293536 +930/20000 train_loss: 2.8809 train_time: 1.5m tok/s: 8293516 +931/20000 train_loss: 2.7619 train_time: 1.5m tok/s: 8293458 +932/20000 train_loss: 2.6941 train_time: 1.5m tok/s: 8293499 +933/20000 train_loss: 2.7061 train_time: 1.5m tok/s: 8293477 +934/20000 train_loss: 2.7147 train_time: 1.5m tok/s: 8293372 +935/20000 train_loss: 2.7950 train_time: 1.5m tok/s: 8293366 +936/20000 train_loss: 2.6119 train_time: 1.5m tok/s: 8293323 +937/20000 train_loss: 2.7236 train_time: 1.5m tok/s: 8293278 +938/20000 train_loss: 2.5030 train_time: 1.5m tok/s: 8293148 +939/20000 train_loss: 2.5132 train_time: 1.5m tok/s: 8292947 +940/20000 train_loss: 2.7668 train_time: 1.5m tok/s: 8292859 +941/20000 train_loss: 2.8628 train_time: 1.5m tok/s: 8292829 +942/20000 train_loss: 2.6959 train_time: 1.5m tok/s: 8292809 +943/20000 train_loss: 2.7055 train_time: 1.5m tok/s: 8292878 +944/20000 train_loss: 2.7998 train_time: 1.5m tok/s: 8292902 +945/20000 train_loss: 2.7235 train_time: 1.5m tok/s: 8292984 +946/20000 train_loss: 2.6258 train_time: 1.5m tok/s: 8292691 +947/20000 train_loss: 2.7945 train_time: 1.5m tok/s: 8292612 +948/20000 train_loss: 2.7209 train_time: 1.5m tok/s: 8292667 +949/20000 train_loss: 2.7109 train_time: 1.5m tok/s: 8292829 +950/20000 train_loss: 2.7215 train_time: 1.5m tok/s: 8292760 +951/20000 train_loss: 2.7858 train_time: 1.5m tok/s: 8292784 +952/20000 train_loss: 2.5494 train_time: 1.5m tok/s: 8292746 +953/20000 train_loss: 2.6845 train_time: 1.5m tok/s: 8292806 +954/20000 train_loss: 2.6898 train_time: 1.5m tok/s: 8292593 +955/20000 train_loss: 2.7997 train_time: 1.5m tok/s: 8292541 +956/20000 train_loss: 2.8012 train_time: 1.5m tok/s: 8292501 +957/20000 train_loss: 2.6737 train_time: 1.5m tok/s: 8292406 +958/20000 train_loss: 2.7615 train_time: 1.5m tok/s: 8292336 +959/20000 train_loss: 2.7770 train_time: 1.5m tok/s: 8292410 +960/20000 train_loss: 2.9850 train_time: 1.5m tok/s: 8292405 +961/20000 train_loss: 2.7904 train_time: 1.5m tok/s: 8292446 +962/20000 train_loss: 2.6981 train_time: 1.5m tok/s: 8292353 +963/20000 train_loss: 2.7506 train_time: 1.5m tok/s: 8292304 +964/20000 train_loss: 2.6572 train_time: 1.5m tok/s: 8292330 +965/20000 train_loss: 2.7795 train_time: 1.5m tok/s: 8292344 +966/20000 train_loss: 2.7727 train_time: 1.5m tok/s: 8292365 +967/20000 train_loss: 2.5718 train_time: 1.5m tok/s: 8292321 +968/20000 train_loss: 2.6727 train_time: 1.5m tok/s: 8292294 +969/20000 train_loss: 2.7023 train_time: 1.5m tok/s: 8292268 +970/20000 train_loss: 2.7033 train_time: 1.5m tok/s: 8292113 +971/20000 train_loss: 2.5500 train_time: 1.5m tok/s: 8291911 +972/20000 train_loss: 2.4939 train_time: 1.5m tok/s: 8291777 +973/20000 train_loss: 2.5627 train_time: 1.5m tok/s: 8291599 +974/20000 train_loss: 2.6866 train_time: 1.5m tok/s: 8291524 +975/20000 train_loss: 2.7491 train_time: 1.5m tok/s: 8291569 +976/20000 train_loss: 2.6846 train_time: 1.5m tok/s: 8291484 +977/20000 train_loss: 2.7308 train_time: 1.5m tok/s: 8291428 +978/20000 train_loss: 2.7629 train_time: 1.5m tok/s: 8291452 +979/20000 train_loss: 2.6029 train_time: 1.5m tok/s: 8291471 +980/20000 train_loss: 2.6519 train_time: 1.5m tok/s: 8291146 +981/20000 train_loss: 2.6397 train_time: 1.6m tok/s: 8291244 +982/20000 train_loss: 2.6290 train_time: 1.6m tok/s: 8291332 +983/20000 train_loss: 2.7525 train_time: 1.6m tok/s: 8291353 +984/20000 train_loss: 2.6464 train_time: 1.6m tok/s: 8291286 +985/20000 train_loss: 2.7633 train_time: 1.6m tok/s: 8291261 +986/20000 train_loss: 2.6870 train_time: 1.6m tok/s: 8291239 +987/20000 train_loss: 2.6757 train_time: 1.6m tok/s: 8291342 +988/20000 train_loss: 2.5424 train_time: 1.6m tok/s: 8291255 +989/20000 train_loss: 2.6684 train_time: 1.6m tok/s: 8291281 +990/20000 train_loss: 2.6639 train_time: 1.6m tok/s: 8291052 +991/20000 train_loss: 2.8053 train_time: 1.6m tok/s: 8291210 +992/20000 train_loss: 2.6279 train_time: 1.6m tok/s: 8291233 +993/20000 train_loss: 2.5743 train_time: 1.6m tok/s: 8291291 +994/20000 train_loss: 2.7147 train_time: 1.6m tok/s: 8291229 +995/20000 train_loss: 2.8797 train_time: 1.6m tok/s: 8291320 +996/20000 train_loss: 2.8171 train_time: 1.6m tok/s: 8291293 +997/20000 train_loss: 2.7741 train_time: 1.6m tok/s: 8291424 +998/20000 train_loss: 2.6603 train_time: 1.6m tok/s: 8291420 +999/20000 train_loss: 2.7502 train_time: 1.6m tok/s: 8291494 +1000/20000 train_loss: 2.7887 train_time: 1.6m tok/s: 8291508 +1001/20000 train_loss: 2.6736 train_time: 1.6m tok/s: 8291522 +1002/20000 train_loss: 2.7246 train_time: 1.6m tok/s: 8291458 +1003/20000 train_loss: 2.6443 train_time: 1.6m tok/s: 8291388 +1004/20000 train_loss: 2.6519 train_time: 1.6m tok/s: 8291275 +1005/20000 train_loss: 2.6357 train_time: 1.6m tok/s: 8291363 +1006/20000 train_loss: 2.7104 train_time: 1.6m tok/s: 8291370 +1007/20000 train_loss: 2.5870 train_time: 1.6m tok/s: 8291414 +1008/20000 train_loss: 2.5316 train_time: 1.6m tok/s: 8291312 +1009/20000 train_loss: 2.6862 train_time: 1.6m tok/s: 8291242 +1010/20000 train_loss: 2.8172 train_time: 1.6m tok/s: 8291344 +1011/20000 train_loss: 2.8028 train_time: 1.6m tok/s: 8291345 +1012/20000 train_loss: 2.4291 train_time: 1.6m tok/s: 8291269 +1013/20000 train_loss: 2.6167 train_time: 1.6m tok/s: 8291100 +1014/20000 train_loss: 2.7281 train_time: 1.6m tok/s: 8291092 +1015/20000 train_loss: 2.7571 train_time: 1.6m tok/s: 8291071 +1016/20000 train_loss: 2.5613 train_time: 1.6m tok/s: 8290947 +1017/20000 train_loss: 2.7099 train_time: 1.6m tok/s: 8290992 +1018/20000 train_loss: 2.7847 train_time: 1.6m tok/s: 8290826 +1019/20000 train_loss: 2.6675 train_time: 1.6m tok/s: 8290782 +1020/20000 train_loss: 2.6883 train_time: 1.6m tok/s: 8290721 +1021/20000 train_loss: 2.6644 train_time: 1.6m tok/s: 8290592 +1022/20000 train_loss: 2.7582 train_time: 1.6m tok/s: 8290652 +1023/20000 train_loss: 2.6656 train_time: 1.6m tok/s: 8290686 +1024/20000 train_loss: 2.6840 train_time: 1.6m tok/s: 8290760 +1025/20000 train_loss: 2.7606 train_time: 1.6m tok/s: 8290774 +1026/20000 train_loss: 3.3080 train_time: 1.6m tok/s: 8290568 +1027/20000 train_loss: 2.5470 train_time: 1.6m tok/s: 8290453 +1028/20000 train_loss: 2.6138 train_time: 1.6m tok/s: 8290533 +1029/20000 train_loss: 2.6961 train_time: 1.6m tok/s: 8290397 +1030/20000 train_loss: 2.5558 train_time: 1.6m tok/s: 8290363 +1031/20000 train_loss: 2.6123 train_time: 1.6m tok/s: 8290232 +1032/20000 train_loss: 2.7306 train_time: 1.6m tok/s: 8290265 +1033/20000 train_loss: 2.8480 train_time: 1.6m tok/s: 8290247 +1034/20000 train_loss: 2.6284 train_time: 1.6m tok/s: 8290225 +1035/20000 train_loss: 2.7264 train_time: 1.6m tok/s: 8290290 +1036/20000 train_loss: 2.6793 train_time: 1.6m tok/s: 8290244 +1037/20000 train_loss: 2.7000 train_time: 1.6m tok/s: 8290273 +1038/20000 train_loss: 2.4940 train_time: 1.6m tok/s: 8290226 +1039/20000 train_loss: 2.6972 train_time: 1.6m tok/s: 8290262 +1040/20000 train_loss: 2.6464 train_time: 1.6m tok/s: 8290341 +1041/20000 train_loss: 2.6616 train_time: 1.6m tok/s: 8290270 +1042/20000 train_loss: 2.6669 train_time: 1.6m tok/s: 8290163 +1043/20000 train_loss: 2.7216 train_time: 1.6m tok/s: 8290293 +1044/20000 train_loss: 2.6650 train_time: 1.7m tok/s: 8290159 +1045/20000 train_loss: 2.7900 train_time: 1.7m tok/s: 8290096 +1046/20000 train_loss: 2.7329 train_time: 1.7m tok/s: 8290060 +1047/20000 train_loss: 2.6961 train_time: 1.7m tok/s: 8290131 +1048/20000 train_loss: 2.6000 train_time: 1.7m tok/s: 8290125 +1049/20000 train_loss: 2.7256 train_time: 1.7m tok/s: 8290097 +1050/20000 train_loss: 2.8196 train_time: 1.7m tok/s: 8290131 +1051/20000 train_loss: 2.7290 train_time: 1.7m tok/s: 8290208 +1052/20000 train_loss: 2.6132 train_time: 1.7m tok/s: 8290240 +1053/20000 train_loss: 2.6429 train_time: 1.7m tok/s: 8290126 +1054/20000 train_loss: 2.6169 train_time: 1.7m tok/s: 8290105 +1055/20000 train_loss: 2.5842 train_time: 1.7m tok/s: 8290135 +1056/20000 train_loss: 2.6564 train_time: 1.7m tok/s: 8290119 +1057/20000 train_loss: 2.7149 train_time: 1.7m tok/s: 8290027 +1058/20000 train_loss: 2.5992 train_time: 1.7m tok/s: 8290014 +1059/20000 train_loss: 2.7480 train_time: 1.7m tok/s: 8290023 +1060/20000 train_loss: 2.6713 train_time: 1.7m tok/s: 8289817 +1061/20000 train_loss: 2.7516 train_time: 1.7m tok/s: 8289741 +1062/20000 train_loss: 2.7687 train_time: 1.7m tok/s: 8289763 +1063/20000 train_loss: 2.7642 train_time: 1.7m tok/s: 8289712 +1064/20000 train_loss: 2.6602 train_time: 1.7m tok/s: 8289870 +1065/20000 train_loss: 2.4648 train_time: 1.7m tok/s: 8289766 +1066/20000 train_loss: 2.7878 train_time: 1.7m tok/s: 8289628 +1067/20000 train_loss: 2.7928 train_time: 1.7m tok/s: 8289627 +1068/20000 train_loss: 2.6958 train_time: 1.7m tok/s: 8289681 +1069/20000 train_loss: 2.6422 train_time: 1.7m tok/s: 8289669 +1070/20000 train_loss: 2.5726 train_time: 1.7m tok/s: 8289627 +1071/20000 train_loss: 2.7354 train_time: 1.7m tok/s: 8289619 +1072/20000 train_loss: 2.6651 train_time: 1.7m tok/s: 8289705 +1073/20000 train_loss: 2.6650 train_time: 1.7m tok/s: 8289725 +1074/20000 train_loss: 2.6912 train_time: 1.7m tok/s: 8289780 +1075/20000 train_loss: 2.7155 train_time: 1.7m tok/s: 8289879 +1076/20000 train_loss: 2.7147 train_time: 1.7m tok/s: 8289771 +1077/20000 train_loss: 2.6620 train_time: 1.7m tok/s: 8289621 +1078/20000 train_loss: 2.8109 train_time: 1.7m tok/s: 8289594 +1079/20000 train_loss: 2.6719 train_time: 1.7m tok/s: 8289590 +1080/20000 train_loss: 2.6461 train_time: 1.7m tok/s: 8289724 +1081/20000 train_loss: 2.6727 train_time: 1.7m tok/s: 8289684 +1082/20000 train_loss: 2.6684 train_time: 1.7m tok/s: 8289693 +1083/20000 train_loss: 2.6029 train_time: 1.7m tok/s: 8289681 +1084/20000 train_loss: 2.6838 train_time: 1.7m tok/s: 8289672 +1085/20000 train_loss: 2.6418 train_time: 1.7m tok/s: 8289674 +1086/20000 train_loss: 2.6871 train_time: 1.7m tok/s: 8289759 +1087/20000 train_loss: 2.6662 train_time: 1.7m tok/s: 8289710 +1088/20000 train_loss: 2.8441 train_time: 1.7m tok/s: 8289590 +1089/20000 train_loss: 2.8072 train_time: 1.7m tok/s: 8289593 +1090/20000 train_loss: 2.6342 train_time: 1.7m tok/s: 8289566 +1091/20000 train_loss: 2.6721 train_time: 1.7m tok/s: 8289587 +1092/20000 train_loss: 2.6997 train_time: 1.7m tok/s: 8289624 +1093/20000 train_loss: 2.7487 train_time: 1.7m tok/s: 8289719 +1094/20000 train_loss: 2.8096 train_time: 1.7m tok/s: 8289708 +1095/20000 train_loss: 2.6185 train_time: 1.7m tok/s: 8289661 +1096/20000 train_loss: 2.5221 train_time: 1.7m tok/s: 8289657 +1097/20000 train_loss: 2.6454 train_time: 1.7m tok/s: 8289504 +1098/20000 train_loss: 2.6637 train_time: 1.7m tok/s: 8289441 +1099/20000 train_loss: 2.5080 train_time: 1.7m tok/s: 8289500 +1100/20000 train_loss: 2.5778 train_time: 1.7m tok/s: 8289498 +1101/20000 train_loss: 2.6555 train_time: 1.7m tok/s: 8289543 +1102/20000 train_loss: 2.6712 train_time: 1.7m tok/s: 8289544 +1103/20000 train_loss: 2.7344 train_time: 1.7m tok/s: 8289574 +1104/20000 train_loss: 2.7002 train_time: 1.7m tok/s: 8289625 +1105/20000 train_loss: 2.7228 train_time: 1.7m tok/s: 8289668 +1106/20000 train_loss: 2.7289 train_time: 1.7m tok/s: 8289781 +1107/20000 train_loss: 2.7410 train_time: 1.8m tok/s: 8289731 +1108/20000 train_loss: 2.6759 train_time: 1.8m tok/s: 8289670 +1109/20000 train_loss: 2.6663 train_time: 1.8m tok/s: 8289382 +1110/20000 train_loss: 2.6663 train_time: 1.8m tok/s: 8289701 +1111/20000 train_loss: 2.6353 train_time: 1.8m tok/s: 8289731 +1112/20000 train_loss: 2.6223 train_time: 1.8m tok/s: 8289713 +1113/20000 train_loss: 2.6511 train_time: 1.8m tok/s: 8289652 +1114/20000 train_loss: 2.8156 train_time: 1.8m tok/s: 8289653 +1115/20000 train_loss: 2.6675 train_time: 1.8m tok/s: 8289673 +1116/20000 train_loss: 2.8600 train_time: 1.8m tok/s: 8289775 +1117/20000 train_loss: 2.6900 train_time: 1.8m tok/s: 8289647 +1118/20000 train_loss: 2.7256 train_time: 1.8m tok/s: 8289648 +1119/20000 train_loss: 2.7414 train_time: 1.8m tok/s: 8289634 +1120/20000 train_loss: 2.6326 train_time: 1.8m tok/s: 8289637 +1121/20000 train_loss: 2.6419 train_time: 1.8m tok/s: 8289682 +1122/20000 train_loss: 2.7385 train_time: 1.8m tok/s: 8289764 +1123/20000 train_loss: 2.5331 train_time: 1.8m tok/s: 8289828 +1124/20000 train_loss: 2.6846 train_time: 1.8m tok/s: 8289658 +1125/20000 train_loss: 2.5687 train_time: 1.8m tok/s: 8289681 +1126/20000 train_loss: 2.6399 train_time: 1.8m tok/s: 8289764 +1127/20000 train_loss: 2.8796 train_time: 1.8m tok/s: 8289801 +1128/20000 train_loss: 2.8540 train_time: 1.8m tok/s: 8289726 +1129/20000 train_loss: 2.5903 train_time: 1.8m tok/s: 8289740 +1130/20000 train_loss: 2.7668 train_time: 1.8m tok/s: 8289757 +1131/20000 train_loss: 2.7480 train_time: 1.8m tok/s: 8289793 +1132/20000 train_loss: 2.6195 train_time: 1.8m tok/s: 8289858 +1133/20000 train_loss: 2.5727 train_time: 1.8m tok/s: 8289871 +1134/20000 train_loss: 2.7329 train_time: 1.8m tok/s: 8289865 +1135/20000 train_loss: 2.7223 train_time: 1.8m tok/s: 8289870 +1136/20000 train_loss: 2.5657 train_time: 1.8m tok/s: 8289957 +1137/20000 train_loss: 2.6078 train_time: 1.8m tok/s: 8289952 +1138/20000 train_loss: 2.5649 train_time: 1.8m tok/s: 8289990 +1139/20000 train_loss: 2.5529 train_time: 1.8m tok/s: 8290017 +1140/20000 train_loss: 2.6680 train_time: 1.8m tok/s: 8290001 +1141/20000 train_loss: 2.7064 train_time: 1.8m tok/s: 8290041 +1142/20000 train_loss: 2.6940 train_time: 1.8m tok/s: 8290032 +1143/20000 train_loss: 2.7221 train_time: 1.8m tok/s: 8289986 +1144/20000 train_loss: 2.7719 train_time: 1.8m tok/s: 8289990 +1145/20000 train_loss: 2.7275 train_time: 1.8m tok/s: 8289979 +1146/20000 train_loss: 2.5903 train_time: 1.8m tok/s: 8290005 +1147/20000 train_loss: 2.7482 train_time: 1.8m tok/s: 8290072 +1148/20000 train_loss: 2.5592 train_time: 1.8m tok/s: 8290099 +1149/20000 train_loss: 2.7173 train_time: 1.8m tok/s: 8290033 +1150/20000 train_loss: 2.5930 train_time: 1.8m tok/s: 8290071 +1151/20000 train_loss: 2.5966 train_time: 1.8m tok/s: 8290065 +1152/20000 train_loss: 2.4608 train_time: 1.8m tok/s: 8290067 +1153/20000 train_loss: 2.5986 train_time: 1.8m tok/s: 8290054 +1154/20000 train_loss: 2.7343 train_time: 1.8m tok/s: 8290065 +1155/20000 train_loss: 2.5865 train_time: 1.8m tok/s: 8290044 +1156/20000 train_loss: 2.6965 train_time: 1.8m tok/s: 8290109 +1157/20000 train_loss: 2.6817 train_time: 1.8m tok/s: 8290059 +1158/20000 train_loss: 2.7806 train_time: 1.8m tok/s: 8290023 +1159/20000 train_loss: 2.7105 train_time: 1.8m tok/s: 8290040 +1160/20000 train_loss: 2.6770 train_time: 1.8m tok/s: 8290159 +1161/20000 train_loss: 2.6578 train_time: 1.8m tok/s: 8290157 +1162/20000 train_loss: 2.7018 train_time: 1.8m tok/s: 8290136 +1163/20000 train_loss: 2.6980 train_time: 1.8m tok/s: 8290179 +1164/20000 train_loss: 2.6612 train_time: 1.8m tok/s: 8290143 +1165/20000 train_loss: 2.5280 train_time: 1.8m tok/s: 8290058 +1166/20000 train_loss: 2.7248 train_time: 1.8m tok/s: 8290046 +1167/20000 train_loss: 2.7683 train_time: 1.8m tok/s: 8289960 +1168/20000 train_loss: 2.5654 train_time: 1.8m tok/s: 8289858 +1169/20000 train_loss: 2.6962 train_time: 1.8m tok/s: 8289868 +1170/20000 train_loss: 2.9088 train_time: 1.8m tok/s: 8289935 +1171/20000 train_loss: 2.6582 train_time: 1.9m tok/s: 8290044 +1172/20000 train_loss: 2.7082 train_time: 1.9m tok/s: 8290088 +1173/20000 train_loss: 2.6415 train_time: 1.9m tok/s: 8290117 +1174/20000 train_loss: 2.7030 train_time: 1.9m tok/s: 8290125 +1175/20000 train_loss: 2.6541 train_time: 1.9m tok/s: 8290100 +1176/20000 train_loss: 2.7741 train_time: 1.9m tok/s: 8290095 +1177/20000 train_loss: 2.7789 train_time: 1.9m tok/s: 8290030 +1178/20000 train_loss: 2.6656 train_time: 1.9m tok/s: 8289949 +1179/20000 train_loss: 2.5437 train_time: 1.9m tok/s: 8289875 +1180/20000 train_loss: 2.6035 train_time: 1.9m tok/s: 8289796 +1181/20000 train_loss: 2.6624 train_time: 1.9m tok/s: 8289827 +1182/20000 train_loss: 2.5818 train_time: 1.9m tok/s: 8289901 +1183/20000 train_loss: 2.7845 train_time: 1.9m tok/s: 8289976 +1184/20000 train_loss: 2.4910 train_time: 1.9m tok/s: 8289866 +1185/20000 train_loss: 2.6695 train_time: 1.9m tok/s: 8289913 +1186/20000 train_loss: 2.6624 train_time: 1.9m tok/s: 8289843 +1187/20000 train_loss: 2.7636 train_time: 1.9m tok/s: 8289719 +1188/20000 train_loss: 2.8720 train_time: 1.9m tok/s: 8289823 +1189/20000 train_loss: 2.6345 train_time: 1.9m tok/s: 8289850 +1190/20000 train_loss: 2.6962 train_time: 1.9m tok/s: 8289876 +1191/20000 train_loss: 2.6439 train_time: 1.9m tok/s: 8289891 +1192/20000 train_loss: 2.6717 train_time: 1.9m tok/s: 8289868 +1193/20000 train_loss: 2.6938 train_time: 1.9m tok/s: 8289860 +1194/20000 train_loss: 2.6980 train_time: 1.9m tok/s: 8289787 +1195/20000 train_loss: 2.6016 train_time: 1.9m tok/s: 8289776 +1196/20000 train_loss: 2.8403 train_time: 1.9m tok/s: 8289754 +1197/20000 train_loss: 2.5704 train_time: 1.9m tok/s: 8289741 +1198/20000 train_loss: 2.6925 train_time: 1.9m tok/s: 8289726 +1199/20000 train_loss: 2.7395 train_time: 1.9m tok/s: 8289835 +1200/20000 train_loss: 2.7285 train_time: 1.9m tok/s: 8289899 +1201/20000 train_loss: 2.7233 train_time: 1.9m tok/s: 8289975 +1202/20000 train_loss: 2.8214 train_time: 1.9m tok/s: 8290034 +1203/20000 train_loss: 2.6659 train_time: 1.9m tok/s: 8290035 +1204/20000 train_loss: 2.7038 train_time: 1.9m tok/s: 8290104 +1205/20000 train_loss: 2.7365 train_time: 1.9m tok/s: 8289984 +1206/20000 train_loss: 2.7635 train_time: 1.9m tok/s: 8289972 +1207/20000 train_loss: 2.5724 train_time: 1.9m tok/s: 8289897 +1208/20000 train_loss: 2.5677 train_time: 1.9m tok/s: 8289917 +1209/20000 train_loss: 2.6944 train_time: 1.9m tok/s: 8289876 +1210/20000 train_loss: 2.6276 train_time: 1.9m tok/s: 8289886 +1211/20000 train_loss: 2.5623 train_time: 1.9m tok/s: 8289870 +1212/20000 train_loss: 2.5357 train_time: 1.9m tok/s: 8289774 +1213/20000 train_loss: 2.8085 train_time: 1.9m tok/s: 8289689 +1214/20000 train_loss: 2.6244 train_time: 1.9m tok/s: 8289776 +1215/20000 train_loss: 2.7184 train_time: 1.9m tok/s: 8289851 +1216/20000 train_loss: 2.6802 train_time: 1.9m tok/s: 8289944 +1217/20000 train_loss: 2.7604 train_time: 1.9m tok/s: 8289896 +1218/20000 train_loss: 2.6975 train_time: 1.9m tok/s: 8289968 +1219/20000 train_loss: 3.3063 train_time: 1.9m tok/s: 8289990 +1220/20000 train_loss: 2.6071 train_time: 1.9m tok/s: 8289954 +1221/20000 train_loss: 2.7657 train_time: 1.9m tok/s: 8289921 +1222/20000 train_loss: 2.5917 train_time: 1.9m tok/s: 8289942 +1223/20000 train_loss: 2.6984 train_time: 1.9m tok/s: 8289882 +1224/20000 train_loss: 2.7103 train_time: 1.9m tok/s: 8289870 +1225/20000 train_loss: 2.5383 train_time: 1.9m tok/s: 8289856 +1226/20000 train_loss: 2.6507 train_time: 1.9m tok/s: 8289828 +1227/20000 train_loss: 2.8631 train_time: 1.9m tok/s: 8289819 +1228/20000 train_loss: 2.6573 train_time: 1.9m tok/s: 8289773 +1229/20000 train_loss: 2.6616 train_time: 1.9m tok/s: 8289749 +1230/20000 train_loss: 2.7540 train_time: 1.9m tok/s: 8289750 +1231/20000 train_loss: 2.6936 train_time: 1.9m tok/s: 8289778 +1232/20000 train_loss: 2.6812 train_time: 1.9m tok/s: 8289821 +1233/20000 train_loss: 2.6616 train_time: 1.9m tok/s: 8289738 +1234/20000 train_loss: 2.6524 train_time: 2.0m tok/s: 8289763 +1235/20000 train_loss: 2.5949 train_time: 2.0m tok/s: 8289723 +1236/20000 train_loss: 2.6399 train_time: 2.0m tok/s: 8289720 +1237/20000 train_loss: 2.6166 train_time: 2.0m tok/s: 8289745 +1238/20000 train_loss: 2.5816 train_time: 2.0m tok/s: 8289846 +1239/20000 train_loss: 2.5846 train_time: 2.0m tok/s: 8289882 +1240/20000 train_loss: 2.5392 train_time: 2.0m tok/s: 8289892 +1241/20000 train_loss: 2.5892 train_time: 2.0m tok/s: 8289847 +1242/20000 train_loss: 2.5857 train_time: 2.0m tok/s: 8289712 +1243/20000 train_loss: 2.6755 train_time: 2.0m tok/s: 8289547 +1244/20000 train_loss: 2.7768 train_time: 2.0m tok/s: 8289474 +1245/20000 train_loss: 2.6669 train_time: 2.0m tok/s: 8289324 +1246/20000 train_loss: 2.7858 train_time: 2.0m tok/s: 8289324 +1247/20000 train_loss: 2.7811 train_time: 2.0m tok/s: 8289266 +1248/20000 train_loss: 2.6493 train_time: 2.0m tok/s: 8289352 +1249/20000 train_loss: 2.6384 train_time: 2.0m tok/s: 8289394 +1250/20000 train_loss: 2.6450 train_time: 2.0m tok/s: 8289467 +1251/20000 train_loss: 2.6006 train_time: 2.0m tok/s: 8289536 +1252/20000 train_loss: 2.6738 train_time: 2.0m tok/s: 8289479 +1253/20000 train_loss: 2.6218 train_time: 2.0m tok/s: 8289468 +1254/20000 train_loss: 2.6841 train_time: 2.0m tok/s: 8289414 +1255/20000 train_loss: 2.4468 train_time: 2.0m tok/s: 8289425 +1256/20000 train_loss: 2.6775 train_time: 2.0m tok/s: 8289406 +1257/20000 train_loss: 2.6010 train_time: 2.0m tok/s: 8289381 +1258/20000 train_loss: 2.6300 train_time: 2.0m tok/s: 8289324 +1259/20000 train_loss: 2.7604 train_time: 2.0m tok/s: 8289208 +1260/20000 train_loss: 2.7066 train_time: 2.0m tok/s: 8289279 +1261/20000 train_loss: 2.7848 train_time: 2.0m tok/s: 8289334 +1262/20000 train_loss: 2.6972 train_time: 2.0m tok/s: 8289326 +1263/20000 train_loss: 2.7058 train_time: 2.0m tok/s: 8289371 +1264/20000 train_loss: 2.6344 train_time: 2.0m tok/s: 8289372 +1265/20000 train_loss: 2.6325 train_time: 2.0m tok/s: 8289336 +1266/20000 train_loss: 2.6446 train_time: 2.0m tok/s: 8289315 +1267/20000 train_loss: 2.6872 train_time: 2.0m tok/s: 8289344 +1268/20000 train_loss: 2.4885 train_time: 2.0m tok/s: 8289405 +1269/20000 train_loss: 2.6810 train_time: 2.0m tok/s: 8289389 +1270/20000 train_loss: 2.6481 train_time: 2.0m tok/s: 8289413 +1271/20000 train_loss: 2.5640 train_time: 2.0m tok/s: 8289442 +1272/20000 train_loss: 2.8062 train_time: 2.0m tok/s: 8289508 +1273/20000 train_loss: 2.7470 train_time: 2.0m tok/s: 8289556 +1274/20000 train_loss: 2.6750 train_time: 2.0m tok/s: 8289603 +1275/20000 train_loss: 2.7781 train_time: 2.0m tok/s: 8289597 +1276/20000 train_loss: 2.7137 train_time: 2.0m tok/s: 8289579 +1277/20000 train_loss: 2.7021 train_time: 2.0m tok/s: 8289469 +1278/20000 train_loss: 2.6150 train_time: 2.0m tok/s: 8289635 +1279/20000 train_loss: 2.7173 train_time: 2.0m tok/s: 8289714 +1280/20000 train_loss: 2.6410 train_time: 2.0m tok/s: 8289674 +1281/20000 train_loss: 2.8148 train_time: 2.0m tok/s: 8289637 +1282/20000 train_loss: 2.5363 train_time: 2.0m tok/s: 8289555 +1283/20000 train_loss: 2.6318 train_time: 2.0m tok/s: 8289600 +1284/20000 train_loss: 2.6146 train_time: 2.0m tok/s: 8289610 +1285/20000 train_loss: 2.7804 train_time: 2.0m tok/s: 8289637 +1286/20000 train_loss: 2.6689 train_time: 2.0m tok/s: 8289624 +1287/20000 train_loss: 2.7074 train_time: 2.0m tok/s: 8289611 +1288/20000 train_loss: 2.7208 train_time: 2.0m tok/s: 8289651 +1289/20000 train_loss: 2.7481 train_time: 2.0m tok/s: 8289617 +1290/20000 train_loss: 2.6312 train_time: 2.0m tok/s: 8289678 +1291/20000 train_loss: 2.7456 train_time: 2.0m tok/s: 8289719 +1292/20000 train_loss: 2.7271 train_time: 2.0m tok/s: 8289726 +1293/20000 train_loss: 2.7142 train_time: 2.0m tok/s: 8289742 +1294/20000 train_loss: 2.7186 train_time: 2.0m tok/s: 8289773 +1295/20000 train_loss: 2.7468 train_time: 2.0m tok/s: 8289836 +1296/20000 train_loss: 2.6851 train_time: 2.0m tok/s: 8289535 +1297/20000 train_loss: 2.5932 train_time: 2.1m tok/s: 8289695 +1298/20000 train_loss: 2.6589 train_time: 2.1m tok/s: 8289774 +1299/20000 train_loss: 2.5134 train_time: 2.1m tok/s: 8289742 +1300/20000 train_loss: 2.6338 train_time: 2.1m tok/s: 8289741 +1301/20000 train_loss: 2.6841 train_time: 2.1m tok/s: 8289791 +1302/20000 train_loss: 2.6684 train_time: 2.1m tok/s: 8289871 +1303/20000 train_loss: 2.8871 train_time: 2.1m tok/s: 8289819 +1304/20000 train_loss: 2.7367 train_time: 2.1m tok/s: 8289785 +1305/20000 train_loss: 2.7642 train_time: 2.1m tok/s: 8289760 +1306/20000 train_loss: 2.8538 train_time: 2.1m tok/s: 8289738 +1307/20000 train_loss: 2.6249 train_time: 2.1m tok/s: 8289702 +1308/20000 train_loss: 2.6284 train_time: 2.1m tok/s: 8289669 +1309/20000 train_loss: 2.6644 train_time: 2.1m tok/s: 8289631 +1310/20000 train_loss: 2.5617 train_time: 2.1m tok/s: 8289539 +1311/20000 train_loss: 2.6100 train_time: 2.1m tok/s: 8289447 +1312/20000 train_loss: 2.5485 train_time: 2.1m tok/s: 8289451 +1313/20000 train_loss: 2.5721 train_time: 2.1m tok/s: 8289441 +1314/20000 train_loss: 2.5571 train_time: 2.1m tok/s: 8289511 +1315/20000 train_loss: 2.4212 train_time: 2.1m tok/s: 8289422 +1316/20000 train_loss: 2.6830 train_time: 2.1m tok/s: 8289337 +1317/20000 train_loss: 2.6843 train_time: 2.1m tok/s: 8289418 +1318/20000 train_loss: 2.7176 train_time: 2.1m tok/s: 8289502 +1319/20000 train_loss: 2.8105 train_time: 2.1m tok/s: 8289546 +1320/20000 train_loss: 2.7265 train_time: 2.1m tok/s: 8289546 +1321/20000 train_loss: 2.7221 train_time: 2.1m tok/s: 8289584 +1322/20000 train_loss: 2.7394 train_time: 2.1m tok/s: 8289638 +1323/20000 train_loss: 2.5862 train_time: 2.1m tok/s: 8289603 +1324/20000 train_loss: 2.6482 train_time: 2.1m tok/s: 8289585 +1325/20000 train_loss: 2.8296 train_time: 2.1m tok/s: 8289624 +1326/20000 train_loss: 2.8349 train_time: 2.1m tok/s: 8289637 +1327/20000 train_loss: 2.6598 train_time: 2.1m tok/s: 8289602 +1328/20000 train_loss: 2.6520 train_time: 2.1m tok/s: 8289628 +1329/20000 train_loss: 2.7424 train_time: 2.1m tok/s: 8289630 +1330/20000 train_loss: 2.6112 train_time: 2.1m tok/s: 8289667 +1331/20000 train_loss: 2.6786 train_time: 2.1m tok/s: 8289647 +1332/20000 train_loss: 2.9194 train_time: 2.1m tok/s: 8289598 +1333/20000 train_loss: 2.8342 train_time: 2.1m tok/s: 8289540 +1334/20000 train_loss: 2.6707 train_time: 2.1m tok/s: 8289565 +1335/20000 train_loss: 2.6290 train_time: 2.1m tok/s: 8289533 +1336/20000 train_loss: 2.6590 train_time: 2.1m tok/s: 8289534 +1337/20000 train_loss: 2.8070 train_time: 2.1m tok/s: 8289545 +1338/20000 train_loss: 2.8904 train_time: 2.1m tok/s: 8289551 +1339/20000 train_loss: 2.6914 train_time: 2.1m tok/s: 8289525 +1340/20000 train_loss: 2.5319 train_time: 2.1m tok/s: 8289538 +1341/20000 train_loss: 2.5616 train_time: 2.1m tok/s: 8289544 +1342/20000 train_loss: 2.6442 train_time: 2.1m tok/s: 8289530 +1343/20000 train_loss: 2.6638 train_time: 2.1m tok/s: 8289438 +1344/20000 train_loss: 2.6880 train_time: 2.1m tok/s: 8289479 +1345/20000 train_loss: 2.6402 train_time: 2.1m tok/s: 8289518 +1346/20000 train_loss: 2.7671 train_time: 2.1m tok/s: 8289560 +1347/20000 train_loss: 2.7919 train_time: 2.1m tok/s: 8289551 +1348/20000 train_loss: 2.6919 train_time: 2.1m tok/s: 8289570 +1349/20000 train_loss: 2.7235 train_time: 2.1m tok/s: 8289578 +1350/20000 train_loss: 2.6769 train_time: 2.1m tok/s: 8289592 +1351/20000 train_loss: 2.7885 train_time: 2.1m tok/s: 8289533 +1352/20000 train_loss: 2.6690 train_time: 2.1m tok/s: 8289551 +1353/20000 train_loss: 2.7048 train_time: 2.1m tok/s: 8289562 +1354/20000 train_loss: 2.3823 train_time: 2.1m tok/s: 8289535 +1355/20000 train_loss: 2.5791 train_time: 2.1m tok/s: 8289480 +1356/20000 train_loss: 2.6877 train_time: 2.1m tok/s: 8289544 +1357/20000 train_loss: 2.6747 train_time: 2.1m tok/s: 8289618 +1358/20000 train_loss: 2.7438 train_time: 2.1m tok/s: 8289620 +1359/20000 train_loss: 2.5080 train_time: 2.1m tok/s: 8289560 +1360/20000 train_loss: 2.7351 train_time: 2.2m tok/s: 8289440 +1361/20000 train_loss: 2.6339 train_time: 2.2m tok/s: 8289428 +1362/20000 train_loss: 2.6175 train_time: 2.2m tok/s: 8289437 +1363/20000 train_loss: 2.7021 train_time: 2.2m tok/s: 8289348 +1364/20000 train_loss: 2.5359 train_time: 2.2m tok/s: 8289394 +1365/20000 train_loss: 2.5318 train_time: 2.2m tok/s: 8289401 +1366/20000 train_loss: 2.6229 train_time: 2.2m tok/s: 8289438 +1367/20000 train_loss: 2.7057 train_time: 2.2m tok/s: 8289517 +1368/20000 train_loss: 2.5719 train_time: 2.2m tok/s: 8289515 +1369/20000 train_loss: 2.6747 train_time: 2.2m tok/s: 8289504 +1370/20000 train_loss: 2.7219 train_time: 2.2m tok/s: 8289517 +1371/20000 train_loss: 2.6987 train_time: 2.2m tok/s: 8289429 +1372/20000 train_loss: 2.7290 train_time: 2.2m tok/s: 8289437 +1373/20000 train_loss: 2.6653 train_time: 2.2m tok/s: 8289512 +1374/20000 train_loss: 2.7701 train_time: 2.2m tok/s: 8289488 +1375/20000 train_loss: 2.7282 train_time: 2.2m tok/s: 8289534 +1376/20000 train_loss: 2.5834 train_time: 2.2m tok/s: 8289552 +1377/20000 train_loss: 2.6405 train_time: 2.2m tok/s: 8289568 +1378/20000 train_loss: 2.5870 train_time: 2.2m tok/s: 8289561 +1379/20000 train_loss: 2.6362 train_time: 2.2m tok/s: 8289537 +1380/20000 train_loss: 2.5969 train_time: 2.2m tok/s: 8289477 +1381/20000 train_loss: 2.6423 train_time: 2.2m tok/s: 8289535 +1382/20000 train_loss: 2.6654 train_time: 2.2m tok/s: 8289503 +1383/20000 train_loss: 2.6851 train_time: 2.2m tok/s: 8289605 +1384/20000 train_loss: 2.5768 train_time: 2.2m tok/s: 8289644 +1385/20000 train_loss: 2.6073 train_time: 2.2m tok/s: 8289683 +1386/20000 train_loss: 2.7596 train_time: 2.2m tok/s: 8289740 +1387/20000 train_loss: 2.6777 train_time: 2.2m tok/s: 8289815 +1388/20000 train_loss: 2.7557 train_time: 2.2m tok/s: 8289864 +1389/20000 train_loss: 2.6059 train_time: 2.2m tok/s: 8289829 +1390/20000 train_loss: 2.7366 train_time: 2.2m tok/s: 8289860 +1391/20000 train_loss: 2.5455 train_time: 2.2m tok/s: 8289858 +1392/20000 train_loss: 2.7353 train_time: 2.2m tok/s: 8289872 +1393/20000 train_loss: 2.6311 train_time: 2.2m tok/s: 8289844 +1394/20000 train_loss: 2.8983 train_time: 2.2m tok/s: 8289854 +1395/20000 train_loss: 2.4958 train_time: 2.2m tok/s: 8289800 +1396/20000 train_loss: 2.8222 train_time: 2.2m tok/s: 8289815 +1397/20000 train_loss: 2.7164 train_time: 2.2m tok/s: 8289810 +1398/20000 train_loss: 2.8064 train_time: 2.2m tok/s: 8289850 +1399/20000 train_loss: 2.6388 train_time: 2.2m tok/s: 8289869 +1400/20000 train_loss: 2.7437 train_time: 2.2m tok/s: 8289668 +1401/20000 train_loss: 2.7299 train_time: 2.2m tok/s: 8289909 +1402/20000 train_loss: 2.5746 train_time: 2.2m tok/s: 8289966 +1403/20000 train_loss: 2.5970 train_time: 2.2m tok/s: 8290021 +1404/20000 train_loss: 2.6829 train_time: 2.2m tok/s: 8289933 +1405/20000 train_loss: 2.7138 train_time: 2.2m tok/s: 8289923 +1406/20000 train_loss: 2.8384 train_time: 2.2m tok/s: 8289881 +1407/20000 train_loss: 2.5746 train_time: 2.2m tok/s: 8289769 +1408/20000 train_loss: 2.6943 train_time: 2.2m tok/s: 8289738 +1409/20000 train_loss: 2.7697 train_time: 2.2m tok/s: 8289704 +1410/20000 train_loss: 2.6568 train_time: 2.2m tok/s: 8289745 +1411/20000 train_loss: 2.6951 train_time: 2.2m tok/s: 8289784 +1412/20000 train_loss: 2.7362 train_time: 2.2m tok/s: 8289849 +1413/20000 train_loss: 2.6159 train_time: 2.2m tok/s: 8289898 +1414/20000 train_loss: 2.6011 train_time: 2.2m tok/s: 8289937 +1415/20000 train_loss: 2.6644 train_time: 2.2m tok/s: 8289984 +1416/20000 train_loss: 2.6193 train_time: 2.2m tok/s: 8290024 +1417/20000 train_loss: 2.6256 train_time: 2.2m tok/s: 8290036 +1418/20000 train_loss: 2.7543 train_time: 2.2m tok/s: 8290018 +1419/20000 train_loss: 2.6610 train_time: 2.2m tok/s: 8289964 +1420/20000 train_loss: 2.6295 train_time: 2.2m tok/s: 8289935 +1421/20000 train_loss: 2.7504 train_time: 2.2m tok/s: 8289964 +1422/20000 train_loss: 2.7445 train_time: 2.2m tok/s: 8290018 +1423/20000 train_loss: 2.7065 train_time: 2.2m tok/s: 8290087 +1424/20000 train_loss: 2.7075 train_time: 2.3m tok/s: 8290120 +1425/20000 train_loss: 2.6289 train_time: 2.3m tok/s: 8290139 +1426/20000 train_loss: 2.6965 train_time: 2.3m tok/s: 8290137 +1427/20000 train_loss: 2.6480 train_time: 2.3m tok/s: 8290079 +1428/20000 train_loss: 2.6511 train_time: 2.3m tok/s: 8290041 +1429/20000 train_loss: 2.5895 train_time: 2.3m tok/s: 8290014 +1430/20000 train_loss: 2.6250 train_time: 2.3m tok/s: 8290061 +1431/20000 train_loss: 2.6025 train_time: 2.3m tok/s: 8290066 +1432/20000 train_loss: 2.4340 train_time: 2.3m tok/s: 8290070 +1433/20000 train_loss: 2.6247 train_time: 2.3m tok/s: 8290072 +1434/20000 train_loss: 2.7408 train_time: 2.3m tok/s: 8290029 +1435/20000 train_loss: 2.7156 train_time: 2.3m tok/s: 8290151 +1436/20000 train_loss: 2.6158 train_time: 2.3m tok/s: 8290208 +1437/20000 train_loss: 2.7040 train_time: 2.3m tok/s: 8290089 +1438/20000 train_loss: 2.7535 train_time: 2.3m tok/s: 8290252 +1439/20000 train_loss: 2.6696 train_time: 2.3m tok/s: 8290299 +1440/20000 train_loss: 2.7105 train_time: 2.3m tok/s: 8290235 +1441/20000 train_loss: 2.6704 train_time: 2.3m tok/s: 8290254 +1442/20000 train_loss: 2.5798 train_time: 2.3m tok/s: 8290199 +1443/20000 train_loss: 2.5953 train_time: 2.3m tok/s: 8290196 +1444/20000 train_loss: 2.5516 train_time: 2.3m tok/s: 8290237 +1445/20000 train_loss: 2.6645 train_time: 2.3m tok/s: 8290159 +1446/20000 train_loss: 2.7756 train_time: 2.3m tok/s: 8290183 +1447/20000 train_loss: 2.7184 train_time: 2.3m tok/s: 8290203 +1448/20000 train_loss: 2.6965 train_time: 2.3m tok/s: 8290210 +1449/20000 train_loss: 2.6414 train_time: 2.3m tok/s: 8290246 +1450/20000 train_loss: 2.7317 train_time: 2.3m tok/s: 8290171 +1451/20000 train_loss: 2.5956 train_time: 2.3m tok/s: 8290219 +1452/20000 train_loss: 2.6192 train_time: 2.3m tok/s: 8290203 +1453/20000 train_loss: 2.6481 train_time: 2.3m tok/s: 8290213 +1454/20000 train_loss: 2.7536 train_time: 2.3m tok/s: 8290105 +1455/20000 train_loss: 2.5734 train_time: 2.3m tok/s: 8290122 +1456/20000 train_loss: 2.4553 train_time: 2.3m tok/s: 8290157 +1457/20000 train_loss: 2.4776 train_time: 2.3m tok/s: 8290160 +1458/20000 train_loss: 2.5955 train_time: 2.3m tok/s: 8290109 +1459/20000 train_loss: 2.6735 train_time: 2.3m tok/s: 8290091 +1460/20000 train_loss: 2.6961 train_time: 2.3m tok/s: 8290104 +1461/20000 train_loss: 2.7715 train_time: 2.3m tok/s: 8290143 +1462/20000 train_loss: 2.6461 train_time: 2.3m tok/s: 8290185 +1463/20000 train_loss: 2.6614 train_time: 2.3m tok/s: 8290284 +1464/20000 train_loss: 2.6443 train_time: 2.3m tok/s: 8290139 +1465/20000 train_loss: 2.6788 train_time: 2.3m tok/s: 8290050 +1466/20000 train_loss: 2.6148 train_time: 2.3m tok/s: 8290046 +1467/20000 train_loss: 2.6164 train_time: 2.3m tok/s: 8290134 +1468/20000 train_loss: 2.5571 train_time: 2.3m tok/s: 8290086 +1469/20000 train_loss: 2.5827 train_time: 2.3m tok/s: 8290101 +1470/20000 train_loss: 2.5082 train_time: 2.3m tok/s: 8290074 +1471/20000 train_loss: 2.8054 train_time: 2.3m tok/s: 8290060 +1472/20000 train_loss: 2.8790 train_time: 2.3m tok/s: 8290021 +1473/20000 train_loss: 2.7658 train_time: 2.3m tok/s: 8289917 +1474/20000 train_loss: 2.7332 train_time: 2.3m tok/s: 8289869 +1475/20000 train_loss: 2.6932 train_time: 2.3m tok/s: 8289941 +1476/20000 train_loss: 2.7734 train_time: 2.3m tok/s: 8289916 +1477/20000 train_loss: 2.6032 train_time: 2.3m tok/s: 8289895 +1478/20000 train_loss: 2.6055 train_time: 2.3m tok/s: 8289910 +1479/20000 train_loss: 2.5880 train_time: 2.3m tok/s: 8289926 +1480/20000 train_loss: 2.6088 train_time: 2.3m tok/s: 8289939 +1481/20000 train_loss: 2.6343 train_time: 2.3m tok/s: 8289951 +1482/20000 train_loss: 3.0609 train_time: 2.3m tok/s: 8289847 +1483/20000 train_loss: 2.6934 train_time: 2.3m tok/s: 8289791 +1484/20000 train_loss: 2.6290 train_time: 2.3m tok/s: 8289760 +1485/20000 train_loss: 2.7844 train_time: 2.3m tok/s: 8289774 +1486/20000 train_loss: 2.5946 train_time: 2.3m tok/s: 8289794 +1487/20000 train_loss: 2.6944 train_time: 2.4m tok/s: 8289863 +1488/20000 train_loss: 2.6450 train_time: 2.4m tok/s: 8289911 +1489/20000 train_loss: 2.5784 train_time: 2.4m tok/s: 8289973 +1490/20000 train_loss: 2.6722 train_time: 2.4m tok/s: 8289934 +1491/20000 train_loss: 2.6699 train_time: 2.4m tok/s: 8289923 +1492/20000 train_loss: 2.5969 train_time: 2.4m tok/s: 8289958 +1493/20000 train_loss: 2.6667 train_time: 2.4m tok/s: 8289960 +1494/20000 train_loss: 2.6404 train_time: 2.4m tok/s: 8289929 +1495/20000 train_loss: 2.5674 train_time: 2.4m tok/s: 8289852 +1496/20000 train_loss: 2.6768 train_time: 2.4m tok/s: 8289843 +1497/20000 train_loss: 2.6022 train_time: 2.4m tok/s: 8289873 +1498/20000 train_loss: 2.9241 train_time: 2.4m tok/s: 8289841 +1499/20000 train_loss: 2.6967 train_time: 2.4m tok/s: 8289898 +1500/20000 train_loss: 2.7213 train_time: 2.4m tok/s: 8289922 +1501/20000 train_loss: 2.6910 train_time: 2.4m tok/s: 8289955 +1502/20000 train_loss: 2.8005 train_time: 2.4m tok/s: 8289892 +1503/20000 train_loss: 2.6704 train_time: 2.4m tok/s: 8289854 +1504/20000 train_loss: 2.7262 train_time: 2.4m tok/s: 8289925 +1505/20000 train_loss: 2.6814 train_time: 2.4m tok/s: 8289918 +1506/20000 train_loss: 2.7087 train_time: 2.4m tok/s: 8289916 +1507/20000 train_loss: 2.8259 train_time: 2.4m tok/s: 8289941 +1508/20000 train_loss: 2.5302 train_time: 2.4m tok/s: 8289832 +1509/20000 train_loss: 2.5798 train_time: 2.4m tok/s: 8289873 +1510/20000 train_loss: 2.5392 train_time: 2.4m tok/s: 8289852 +1511/20000 train_loss: 2.4979 train_time: 2.4m tok/s: 8289838 +1512/20000 train_loss: 2.5664 train_time: 2.4m tok/s: 8289868 +1513/20000 train_loss: 2.7208 train_time: 2.4m tok/s: 8289917 +1514/20000 train_loss: 2.7470 train_time: 2.4m tok/s: 8289959 +1515/20000 train_loss: 2.6917 train_time: 2.4m tok/s: 8289994 +1516/20000 train_loss: 2.5843 train_time: 2.4m tok/s: 8289976 +1517/20000 train_loss: 2.5939 train_time: 2.4m tok/s: 8289933 +1518/20000 train_loss: 2.7321 train_time: 2.4m tok/s: 8289938 +1519/20000 train_loss: 2.6561 train_time: 2.4m tok/s: 8289890 +1520/20000 train_loss: 2.6659 train_time: 2.4m tok/s: 8289901 +1521/20000 train_loss: 2.6451 train_time: 2.4m tok/s: 8289867 +1522/20000 train_loss: 2.6459 train_time: 2.4m tok/s: 8289889 +1523/20000 train_loss: 2.6730 train_time: 2.4m tok/s: 8289909 +1524/20000 train_loss: 2.6426 train_time: 2.4m tok/s: 8289902 +1525/20000 train_loss: 2.6276 train_time: 2.4m tok/s: 8289883 +1526/20000 train_loss: 2.7202 train_time: 2.4m tok/s: 8289794 +1527/20000 train_loss: 2.6743 train_time: 2.4m tok/s: 8289736 +1528/20000 train_loss: 2.4648 train_time: 2.4m tok/s: 8289612 +1529/20000 train_loss: 2.6276 train_time: 2.4m tok/s: 8289753 +1530/20000 train_loss: 2.6090 train_time: 2.4m tok/s: 8289734 +1531/20000 train_loss: 2.3566 train_time: 2.4m tok/s: 8289752 +1532/20000 train_loss: 2.6189 train_time: 2.4m tok/s: 8289792 +1533/20000 train_loss: 2.6721 train_time: 2.4m tok/s: 8289694 +1534/20000 train_loss: 2.6341 train_time: 2.4m tok/s: 8289644 +1535/20000 train_loss: 2.7731 train_time: 2.4m tok/s: 8289638 +1536/20000 train_loss: 2.6943 train_time: 2.4m tok/s: 8289695 +1537/20000 train_loss: 3.0719 train_time: 2.4m tok/s: 8289636 +1538/20000 train_loss: 2.7273 train_time: 2.4m tok/s: 8289575 +1539/20000 train_loss: 2.6285 train_time: 2.4m tok/s: 8289602 +1540/20000 train_loss: 2.6781 train_time: 2.4m tok/s: 8289589 +1541/20000 train_loss: 2.5856 train_time: 2.4m tok/s: 8289627 +1542/20000 train_loss: 2.6145 train_time: 2.4m tok/s: 8289625 +1543/20000 train_loss: 2.6287 train_time: 2.4m tok/s: 8289641 +1544/20000 train_loss: 2.5817 train_time: 2.4m tok/s: 8289662 +1545/20000 train_loss: 2.6177 train_time: 2.4m tok/s: 8289613 +1546/20000 train_loss: 2.4991 train_time: 2.4m tok/s: 8289579 +1547/20000 train_loss: 2.7421 train_time: 2.4m tok/s: 8289591 +1548/20000 train_loss: 2.6856 train_time: 2.4m tok/s: 8289583 +1549/20000 train_loss: 2.5766 train_time: 2.4m tok/s: 8289635 +1550/20000 train_loss: 2.7105 train_time: 2.5m tok/s: 8289612 +1551/20000 train_loss: 2.6733 train_time: 2.5m tok/s: 8289607 +1552/20000 train_loss: 2.5505 train_time: 2.5m tok/s: 8289614 +1553/20000 train_loss: 2.4925 train_time: 2.5m tok/s: 8289513 +1554/20000 train_loss: 2.5861 train_time: 2.5m tok/s: 8289554 +1555/20000 train_loss: 2.6270 train_time: 2.5m tok/s: 8289569 +1556/20000 train_loss: 2.5132 train_time: 2.5m tok/s: 8289558 +1557/20000 train_loss: 2.5526 train_time: 2.5m tok/s: 8289493 +1558/20000 train_loss: 2.5655 train_time: 2.5m tok/s: 8289406 +1559/20000 train_loss: 2.5477 train_time: 2.5m tok/s: 8289403 +1560/20000 train_loss: 2.6183 train_time: 2.5m tok/s: 8289489 +1561/20000 train_loss: 2.5433 train_time: 2.5m tok/s: 8289467 +1562/20000 train_loss: 2.5881 train_time: 2.5m tok/s: 8289501 +1563/20000 train_loss: 2.4923 train_time: 2.5m tok/s: 8289516 +1564/20000 train_loss: 2.5840 train_time: 2.5m tok/s: 8289483 +1565/20000 train_loss: 2.5718 train_time: 2.5m tok/s: 8289431 +1566/20000 train_loss: 2.7408 train_time: 2.5m tok/s: 8289408 +1567/20000 train_loss: 2.6868 train_time: 2.5m tok/s: 8289436 +1568/20000 train_loss: 2.5256 train_time: 2.5m tok/s: 8289487 +1569/20000 train_loss: 2.5940 train_time: 2.5m tok/s: 8289459 +1570/20000 train_loss: 2.5458 train_time: 2.5m tok/s: 8289444 +1571/20000 train_loss: 2.6133 train_time: 2.5m tok/s: 8289426 +1572/20000 train_loss: 3.1973 train_time: 2.5m tok/s: 8289480 +1573/20000 train_loss: 2.7536 train_time: 2.5m tok/s: 8289458 +1574/20000 train_loss: 2.5991 train_time: 2.5m tok/s: 8289391 +1575/20000 train_loss: 2.5450 train_time: 2.5m tok/s: 8289393 +1576/20000 train_loss: 2.5396 train_time: 2.5m tok/s: 8289377 +1577/20000 train_loss: 2.5724 train_time: 2.5m tok/s: 8289350 +1578/20000 train_loss: 2.4928 train_time: 2.5m tok/s: 8289273 +1579/20000 train_loss: 2.7610 train_time: 2.5m tok/s: 8289234 +1580/20000 train_loss: 2.6520 train_time: 2.5m tok/s: 8289258 +1581/20000 train_loss: 2.4919 train_time: 2.5m tok/s: 8289274 +1582/20000 train_loss: 2.5201 train_time: 2.5m tok/s: 8289247 +1583/20000 train_loss: 2.5840 train_time: 2.5m tok/s: 8289259 +1584/20000 train_loss: 2.5565 train_time: 2.5m tok/s: 8289269 +1585/20000 train_loss: 2.7077 train_time: 2.5m tok/s: 8289294 +1586/20000 train_loss: 2.5587 train_time: 2.5m tok/s: 8289267 +1587/20000 train_loss: 2.5920 train_time: 2.5m tok/s: 8289287 +1588/20000 train_loss: 2.6336 train_time: 2.5m tok/s: 8289288 +1589/20000 train_loss: 2.6964 train_time: 2.5m tok/s: 8289280 +1590/20000 train_loss: 2.6479 train_time: 2.5m tok/s: 8289295 +1591/20000 train_loss: 2.6423 train_time: 2.5m tok/s: 8289281 +1592/20000 train_loss: 2.5726 train_time: 2.5m tok/s: 8289296 +1593/20000 train_loss: 2.6373 train_time: 2.5m tok/s: 8289321 +1594/20000 train_loss: 2.7380 train_time: 2.5m tok/s: 8289374 +1595/20000 train_loss: 2.6700 train_time: 2.5m tok/s: 8289429 +1596/20000 train_loss: 2.4443 train_time: 2.5m tok/s: 8289436 +1597/20000 train_loss: 2.5591 train_time: 2.5m tok/s: 8289439 +1598/20000 train_loss: 2.6178 train_time: 2.5m tok/s: 8289474 +1599/20000 train_loss: 2.6159 train_time: 2.5m tok/s: 8289455 +1600/20000 train_loss: 2.8126 train_time: 2.5m tok/s: 8289509 +1601/20000 train_loss: 2.6539 train_time: 2.5m tok/s: 8289530 +1602/20000 train_loss: 2.7662 train_time: 2.5m tok/s: 8289403 +1603/20000 train_loss: 2.5763 train_time: 2.5m tok/s: 8289415 +1604/20000 train_loss: 2.6000 train_time: 2.5m tok/s: 8289475 +1605/20000 train_loss: 2.6192 train_time: 2.5m tok/s: 8289446 +1606/20000 train_loss: 2.6125 train_time: 2.5m tok/s: 8289461 +1607/20000 train_loss: 2.5265 train_time: 2.5m tok/s: 8289467 +1608/20000 train_loss: 2.5057 train_time: 2.5m tok/s: 8289431 +1609/20000 train_loss: 2.7006 train_time: 2.5m tok/s: 8289260 +1610/20000 train_loss: 2.6048 train_time: 2.5m tok/s: 8289400 +1611/20000 train_loss: 2.5893 train_time: 2.5m tok/s: 8289420 +1612/20000 train_loss: 2.6534 train_time: 2.5m tok/s: 8289374 +1613/20000 train_loss: 2.6523 train_time: 2.6m tok/s: 8289311 +1614/20000 train_loss: 2.7154 train_time: 2.6m tok/s: 8289353 +1615/20000 train_loss: 2.7230 train_time: 2.6m tok/s: 8289168 +1616/20000 train_loss: 2.6559 train_time: 2.6m tok/s: 8289329 +1617/20000 train_loss: 2.6027 train_time: 2.6m tok/s: 8289318 +1618/20000 train_loss: 3.0212 train_time: 2.6m tok/s: 8289336 +1619/20000 train_loss: 2.7345 train_time: 2.6m tok/s: 8289338 +1620/20000 train_loss: 2.5622 train_time: 2.6m tok/s: 8289397 +1621/20000 train_loss: 2.5789 train_time: 2.6m tok/s: 8289446 +1622/20000 train_loss: 2.7540 train_time: 2.6m tok/s: 8289424 +1623/20000 train_loss: 2.6729 train_time: 2.6m tok/s: 8289470 +1624/20000 train_loss: 2.6263 train_time: 2.6m tok/s: 8289378 +1625/20000 train_loss: 2.6282 train_time: 2.6m tok/s: 8289337 +1626/20000 train_loss: 2.6990 train_time: 2.6m tok/s: 8289318 +1627/20000 train_loss: 2.4426 train_time: 2.6m tok/s: 8289320 +1628/20000 train_loss: 2.5911 train_time: 2.6m tok/s: 8289337 +1629/20000 train_loss: 2.5683 train_time: 2.6m tok/s: 8289348 +1630/20000 train_loss: 2.5837 train_time: 2.6m tok/s: 8289344 +1631/20000 train_loss: 2.7993 train_time: 2.6m tok/s: 8289392 +1632/20000 train_loss: 2.6962 train_time: 2.6m tok/s: 8289380 +1633/20000 train_loss: 2.6685 train_time: 2.6m tok/s: 8289369 +1634/20000 train_loss: 2.6197 train_time: 2.6m tok/s: 8289414 +1635/20000 train_loss: 2.6757 train_time: 2.6m tok/s: 8289409 +1636/20000 train_loss: 2.4593 train_time: 2.6m tok/s: 8289306 +1637/20000 train_loss: 2.5506 train_time: 2.6m tok/s: 8289218 +1638/20000 train_loss: 2.4993 train_time: 2.6m tok/s: 8289210 +1639/20000 train_loss: 2.5273 train_time: 2.6m tok/s: 8289201 +1640/20000 train_loss: 2.3713 train_time: 2.6m tok/s: 8289195 +1641/20000 train_loss: 2.5441 train_time: 2.6m tok/s: 8289213 +1642/20000 train_loss: 2.7469 train_time: 2.6m tok/s: 8289231 +1643/20000 train_loss: 2.4356 train_time: 2.6m tok/s: 8289263 +1644/20000 train_loss: 2.4756 train_time: 2.6m tok/s: 8289286 +1645/20000 train_loss: 2.7615 train_time: 2.6m tok/s: 8289231 +1646/20000 train_loss: 2.5191 train_time: 2.6m tok/s: 8289274 +1647/20000 train_loss: 2.7400 train_time: 2.6m tok/s: 8289046 +1648/20000 train_loss: 2.6378 train_time: 2.6m tok/s: 8289222 +1649/20000 train_loss: 2.7455 train_time: 2.6m tok/s: 8289199 +1650/20000 train_loss: 2.5639 train_time: 2.6m tok/s: 8289219 +1651/20000 train_loss: 2.7306 train_time: 2.6m tok/s: 8289272 +1652/20000 train_loss: 2.6466 train_time: 2.6m tok/s: 8289336 +1653/20000 train_loss: 2.7454 train_time: 2.6m tok/s: 8289369 +1654/20000 train_loss: 2.6709 train_time: 2.6m tok/s: 8289389 +1655/20000 train_loss: 2.5596 train_time: 2.6m tok/s: 8289403 +1656/20000 train_loss: 2.6004 train_time: 2.6m tok/s: 8289382 +1657/20000 train_loss: 2.6426 train_time: 2.6m tok/s: 8289425 +1658/20000 train_loss: 2.6346 train_time: 2.6m tok/s: 8289425 +1659/20000 train_loss: 2.5898 train_time: 2.6m tok/s: 8289389 +1660/20000 train_loss: 2.5584 train_time: 2.6m tok/s: 8289356 +1661/20000 train_loss: 2.7490 train_time: 2.6m tok/s: 8289269 +1662/20000 train_loss: 2.7385 train_time: 2.6m tok/s: 8289183 +1663/20000 train_loss: 2.7954 train_time: 2.6m tok/s: 8289088 +1664/20000 train_loss: 2.8005 train_time: 2.6m tok/s: 8289094 +1665/20000 train_loss: 2.8007 train_time: 2.6m tok/s: 8289102 +1666/20000 train_loss: 2.6963 train_time: 2.6m tok/s: 8289078 +1667/20000 train_loss: 2.6052 train_time: 2.6m tok/s: 8288970 +1668/20000 train_loss: 2.6319 train_time: 2.6m tok/s: 8289147 +1669/20000 train_loss: 2.7543 train_time: 2.6m tok/s: 8289144 +1670/20000 train_loss: 2.5522 train_time: 2.6m tok/s: 8289156 +1671/20000 train_loss: 2.4846 train_time: 2.6m tok/s: 8289187 +1672/20000 train_loss: 2.6066 train_time: 2.6m tok/s: 8289194 +1673/20000 train_loss: 2.5759 train_time: 2.6m tok/s: 8289205 +1674/20000 train_loss: 2.6393 train_time: 2.6m tok/s: 8289227 +1675/20000 train_loss: 2.4378 train_time: 2.6m tok/s: 8289282 +1676/20000 train_loss: 2.6768 train_time: 2.7m tok/s: 8289264 +1677/20000 train_loss: 2.5994 train_time: 2.7m tok/s: 8289282 +1678/20000 train_loss: 2.6688 train_time: 2.7m tok/s: 8289219 +1679/20000 train_loss: 2.6078 train_time: 2.7m tok/s: 8289189 +1680/20000 train_loss: 2.5286 train_time: 2.7m tok/s: 8289159 +1681/20000 train_loss: 2.5149 train_time: 2.7m tok/s: 8289156 +1682/20000 train_loss: 2.6209 train_time: 2.7m tok/s: 8289213 +1683/20000 train_loss: 2.6239 train_time: 2.7m tok/s: 8289195 +1684/20000 train_loss: 2.5695 train_time: 2.7m tok/s: 8289138 +1685/20000 train_loss: 2.6733 train_time: 2.7m tok/s: 8289126 +1686/20000 train_loss: 2.5784 train_time: 2.7m tok/s: 8289169 +1687/20000 train_loss: 2.5505 train_time: 2.7m tok/s: 8289189 +1688/20000 train_loss: 2.5665 train_time: 2.7m tok/s: 8289192 +1689/20000 train_loss: 2.5221 train_time: 2.7m tok/s: 8289231 +1690/20000 train_loss: 2.8026 train_time: 2.7m tok/s: 8289234 +1691/20000 train_loss: 2.5978 train_time: 2.7m tok/s: 8289238 +1692/20000 train_loss: 2.5819 train_time: 2.7m tok/s: 8289290 +1693/20000 train_loss: 2.3905 train_time: 2.7m tok/s: 8289264 +1694/20000 train_loss: 2.6124 train_time: 2.7m tok/s: 8289289 +1695/20000 train_loss: 2.6133 train_time: 2.7m tok/s: 8289297 +1696/20000 train_loss: 2.6552 train_time: 2.7m tok/s: 8289304 +1697/20000 train_loss: 2.7536 train_time: 2.7m tok/s: 8289313 +1698/20000 train_loss: 2.6500 train_time: 2.7m tok/s: 8289315 +1699/20000 train_loss: 2.7457 train_time: 2.7m tok/s: 8289284 +1700/20000 train_loss: 2.5793 train_time: 2.7m tok/s: 8289284 +1701/20000 train_loss: 2.4770 train_time: 2.7m tok/s: 8289332 +1702/20000 train_loss: 2.6095 train_time: 2.7m tok/s: 8289393 +1703/20000 train_loss: 2.6425 train_time: 2.7m tok/s: 8289370 +1704/20000 train_loss: 2.7578 train_time: 2.7m tok/s: 8289389 +1705/20000 train_loss: 2.7683 train_time: 2.7m tok/s: 8289360 +1706/20000 train_loss: 2.6990 train_time: 2.7m tok/s: 8289407 +1707/20000 train_loss: 2.8456 train_time: 2.7m tok/s: 8289488 +1708/20000 train_loss: 2.4698 train_time: 2.7m tok/s: 8289459 +1709/20000 train_loss: 2.6600 train_time: 2.7m tok/s: 8289418 +1710/20000 train_loss: 2.6355 train_time: 2.7m tok/s: 8289442 +1711/20000 train_loss: 2.5721 train_time: 2.7m tok/s: 8289462 +1712/20000 train_loss: 2.6973 train_time: 2.7m tok/s: 8289481 +1713/20000 train_loss: 2.7929 train_time: 2.7m tok/s: 8289473 +1714/20000 train_loss: 2.5588 train_time: 2.7m tok/s: 8289480 +1715/20000 train_loss: 2.7454 train_time: 2.7m tok/s: 8289480 +1716/20000 train_loss: 2.7268 train_time: 2.7m tok/s: 8289435 +1717/20000 train_loss: 2.7327 train_time: 2.7m tok/s: 8289461 +1718/20000 train_loss: 2.8165 train_time: 2.7m tok/s: 8289480 +1719/20000 train_loss: 2.7142 train_time: 2.7m tok/s: 8289479 +1720/20000 train_loss: 2.5282 train_time: 2.7m tok/s: 8289482 +1721/20000 train_loss: 2.6022 train_time: 2.7m tok/s: 8289465 +1722/20000 train_loss: 2.7339 train_time: 2.7m tok/s: 8289456 +1723/20000 train_loss: 2.6209 train_time: 2.7m tok/s: 8289480 +1724/20000 train_loss: 2.6630 train_time: 2.7m tok/s: 8289539 +1725/20000 train_loss: 2.5943 train_time: 2.7m tok/s: 8289534 +1726/20000 train_loss: 2.6297 train_time: 2.7m tok/s: 8289597 +1727/20000 train_loss: 2.5957 train_time: 2.7m tok/s: 8289616 +1728/20000 train_loss: 2.8359 train_time: 2.7m tok/s: 8289637 +1729/20000 train_loss: 2.6579 train_time: 2.7m tok/s: 8289653 +1730/20000 train_loss: 2.7412 train_time: 2.7m tok/s: 8289696 +1731/20000 train_loss: 2.7398 train_time: 2.7m tok/s: 8289723 +1732/20000 train_loss: 2.7029 train_time: 2.7m tok/s: 8289755 +1733/20000 train_loss: 2.6964 train_time: 2.7m tok/s: 8289770 +1734/20000 train_loss: 2.6076 train_time: 2.7m tok/s: 8289749 +1735/20000 train_loss: 2.4626 train_time: 2.7m tok/s: 8289725 +1736/20000 train_loss: 2.6986 train_time: 2.7m tok/s: 8289671 +1737/20000 train_loss: 2.5797 train_time: 2.7m tok/s: 8289682 +1738/20000 train_loss: 2.8127 train_time: 2.7m tok/s: 8289704 +1739/20000 train_loss: 2.7428 train_time: 2.7m tok/s: 8289740 +1740/20000 train_loss: 2.3853 train_time: 2.8m tok/s: 8289748 +1741/20000 train_loss: 2.7948 train_time: 2.8m tok/s: 8289766 +1742/20000 train_loss: 2.6065 train_time: 2.8m tok/s: 8289801 +1743/20000 train_loss: 2.5185 train_time: 2.8m tok/s: 8289860 +1744/20000 train_loss: 2.5818 train_time: 2.8m tok/s: 8289802 +1745/20000 train_loss: 2.6255 train_time: 2.8m tok/s: 8289832 +1746/20000 train_loss: 2.6077 train_time: 2.8m tok/s: 8289897 +1747/20000 train_loss: 2.6369 train_time: 2.8m tok/s: 8289856 +1748/20000 train_loss: 2.5655 train_time: 2.8m tok/s: 8289877 +1749/20000 train_loss: 2.6263 train_time: 2.8m tok/s: 8289844 +1750/20000 train_loss: 2.6700 train_time: 2.8m tok/s: 8289830 +1751/20000 train_loss: 2.6696 train_time: 2.8m tok/s: 8289853 +1752/20000 train_loss: 2.6229 train_time: 2.8m tok/s: 8289850 +1753/20000 train_loss: 2.6093 train_time: 2.8m tok/s: 8289918 +1754/20000 train_loss: 2.6758 train_time: 2.8m tok/s: 8289914 +1755/20000 train_loss: 2.5880 train_time: 2.8m tok/s: 8289934 +1756/20000 train_loss: 2.6143 train_time: 2.8m tok/s: 8289922 +1757/20000 train_loss: 2.5828 train_time: 2.8m tok/s: 8289884 +1758/20000 train_loss: 2.8590 train_time: 2.8m tok/s: 8289897 +1759/20000 train_loss: 2.6340 train_time: 2.8m tok/s: 8289878 +1760/20000 train_loss: 2.5281 train_time: 2.8m tok/s: 8289876 +1761/20000 train_loss: 2.6349 train_time: 2.8m tok/s: 8289891 +1762/20000 train_loss: 2.6996 train_time: 2.8m tok/s: 8289926 +1763/20000 train_loss: 2.7069 train_time: 2.8m tok/s: 8289976 +1764/20000 train_loss: 2.6542 train_time: 2.8m tok/s: 8290035 +1765/20000 train_loss: 2.5818 train_time: 2.8m tok/s: 8290059 +1766/20000 train_loss: 2.7126 train_time: 2.8m tok/s: 8290073 +1767/20000 train_loss: 2.5701 train_time: 2.8m tok/s: 8290101 +1768/20000 train_loss: 2.6498 train_time: 2.8m tok/s: 8290131 +1769/20000 train_loss: 2.6444 train_time: 2.8m tok/s: 8290094 +1770/20000 train_loss: 2.6616 train_time: 2.8m tok/s: 8290094 +1771/20000 train_loss: 2.5271 train_time: 2.8m tok/s: 8290094 +1772/20000 train_loss: 2.5448 train_time: 2.8m tok/s: 8290104 +1773/20000 train_loss: 2.8507 train_time: 2.8m tok/s: 8290106 +1774/20000 train_loss: 2.7066 train_time: 2.8m tok/s: 8290153 +1775/20000 train_loss: 2.7488 train_time: 2.8m tok/s: 8290170 +1776/20000 train_loss: 2.5782 train_time: 2.8m tok/s: 8290165 +1777/20000 train_loss: 2.6936 train_time: 2.8m tok/s: 8290138 +1778/20000 train_loss: 2.6524 train_time: 2.8m tok/s: 8290179 +1779/20000 train_loss: 2.6546 train_time: 2.8m tok/s: 8290229 +1780/20000 train_loss: 2.6533 train_time: 2.8m tok/s: 8290202 +1781/20000 train_loss: 2.5253 train_time: 2.8m tok/s: 8290173 +1782/20000 train_loss: 2.4177 train_time: 2.8m tok/s: 8290170 +1783/20000 train_loss: 2.6245 train_time: 2.8m tok/s: 8290145 +1784/20000 train_loss: 2.6234 train_time: 2.8m tok/s: 8290178 +1785/20000 train_loss: 2.6367 train_time: 2.8m tok/s: 8290211 +1786/20000 train_loss: 2.7920 train_time: 2.8m tok/s: 8290071 +1787/20000 train_loss: 2.7117 train_time: 2.8m tok/s: 8290239 +1788/20000 train_loss: 2.6257 train_time: 2.8m tok/s: 8290308 +1789/20000 train_loss: 2.7159 train_time: 2.8m tok/s: 8290313 +1790/20000 train_loss: 2.5190 train_time: 2.8m tok/s: 8290362 +1791/20000 train_loss: 2.3548 train_time: 2.8m tok/s: 8290293 +1792/20000 train_loss: 2.6315 train_time: 2.8m tok/s: 8290241 +1793/20000 train_loss: 2.4954 train_time: 2.8m tok/s: 8290249 +1794/20000 train_loss: 2.4335 train_time: 2.8m tok/s: 8290219 +1795/20000 train_loss: 2.6275 train_time: 2.8m tok/s: 8290275 +1796/20000 train_loss: 2.6406 train_time: 2.8m tok/s: 8290250 +1797/20000 train_loss: 2.8086 train_time: 2.8m tok/s: 8290228 +1798/20000 train_loss: 2.5702 train_time: 2.8m tok/s: 8290239 +1799/20000 train_loss: 2.6548 train_time: 2.8m tok/s: 8290247 +1800/20000 train_loss: 2.5546 train_time: 2.8m tok/s: 8290279 +1801/20000 train_loss: 2.6855 train_time: 2.8m tok/s: 8290331 +1802/20000 train_loss: 2.5868 train_time: 2.8m tok/s: 8290329 +1803/20000 train_loss: 2.6414 train_time: 2.9m tok/s: 8290315 +1804/20000 train_loss: 2.6016 train_time: 2.9m tok/s: 8290300 +1805/20000 train_loss: 2.5874 train_time: 2.9m tok/s: 8290304 +1806/20000 train_loss: 2.8101 train_time: 2.9m tok/s: 8290279 +1807/20000 train_loss: 2.7069 train_time: 2.9m tok/s: 8290270 +1808/20000 train_loss: 2.7242 train_time: 2.9m tok/s: 8290267 +1809/20000 train_loss: 2.6120 train_time: 2.9m tok/s: 8290217 +1810/20000 train_loss: 2.7155 train_time: 2.9m tok/s: 8290130 +1811/20000 train_loss: 2.5655 train_time: 2.9m tok/s: 8290127 +1812/20000 train_loss: 2.6125 train_time: 2.9m tok/s: 8290131 +1813/20000 train_loss: 2.7006 train_time: 2.9m tok/s: 8290160 +1814/20000 train_loss: 2.7466 train_time: 2.9m tok/s: 8290179 +1815/20000 train_loss: 2.5505 train_time: 2.9m tok/s: 8290188 +1816/20000 train_loss: 2.4879 train_time: 2.9m tok/s: 8290132 +1817/20000 train_loss: 2.8407 train_time: 2.9m tok/s: 8290085 +1818/20000 train_loss: 2.6704 train_time: 2.9m tok/s: 8290115 +1819/20000 train_loss: 2.6908 train_time: 2.9m tok/s: 8289993 +1820/20000 train_loss: 2.5759 train_time: 2.9m tok/s: 8290091 +1821/20000 train_loss: 2.6439 train_time: 2.9m tok/s: 8290102 +1822/20000 train_loss: 2.7076 train_time: 2.9m tok/s: 8290131 +1823/20000 train_loss: 2.3949 train_time: 2.9m tok/s: 8290111 +1824/20000 train_loss: 2.6029 train_time: 2.9m tok/s: 8290081 +1825/20000 train_loss: 2.6236 train_time: 2.9m tok/s: 8290047 +1826/20000 train_loss: 2.4582 train_time: 2.9m tok/s: 8290018 +1827/20000 train_loss: 2.5753 train_time: 2.9m tok/s: 8289965 +1828/20000 train_loss: 2.5220 train_time: 2.9m tok/s: 8289942 +1829/20000 train_loss: 2.4708 train_time: 2.9m tok/s: 8289899 +1830/20000 train_loss: 2.6467 train_time: 2.9m tok/s: 8289913 +1831/20000 train_loss: 2.6506 train_time: 2.9m tok/s: 8289909 +1832/20000 train_loss: 2.6405 train_time: 2.9m tok/s: 8289933 +1833/20000 train_loss: 2.7046 train_time: 2.9m tok/s: 8289924 +1834/20000 train_loss: 2.6721 train_time: 2.9m tok/s: 8289991 +1835/20000 train_loss: 2.5727 train_time: 2.9m tok/s: 8289994 +1836/20000 train_loss: 2.5778 train_time: 2.9m tok/s: 8289953 +1837/20000 train_loss: 2.4866 train_time: 2.9m tok/s: 8289995 +1838/20000 train_loss: 2.6090 train_time: 2.9m tok/s: 8290006 +1839/20000 train_loss: 2.7081 train_time: 2.9m tok/s: 8289879 +1840/20000 train_loss: 2.6524 train_time: 2.9m tok/s: 8290023 +1841/20000 train_loss: 2.6946 train_time: 2.9m tok/s: 8290019 +1842/20000 train_loss: 2.6282 train_time: 2.9m tok/s: 8290039 +1843/20000 train_loss: 2.5589 train_time: 2.9m tok/s: 8290097 +1844/20000 train_loss: 2.6547 train_time: 2.9m tok/s: 8290070 +1845/20000 train_loss: 2.8158 train_time: 2.9m tok/s: 8290039 +1846/20000 train_loss: 2.5909 train_time: 2.9m tok/s: 8290040 +1847/20000 train_loss: 2.4679 train_time: 2.9m tok/s: 8290042 +1848/20000 train_loss: 2.5278 train_time: 2.9m tok/s: 8289958 +1849/20000 train_loss: 2.5266 train_time: 2.9m tok/s: 8289991 +1850/20000 train_loss: 2.6525 train_time: 2.9m tok/s: 8290014 +1851/20000 train_loss: 2.6582 train_time: 2.9m tok/s: 8290048 +1852/20000 train_loss: 2.6734 train_time: 2.9m tok/s: 8290022 +1853/20000 train_loss: 2.5092 train_time: 2.9m tok/s: 8290014 +1854/20000 train_loss: 2.5508 train_time: 2.9m tok/s: 8290031 +1855/20000 train_loss: 2.6357 train_time: 2.9m tok/s: 8290049 +1856/20000 train_loss: 2.6715 train_time: 2.9m tok/s: 8290061 +1857/20000 train_loss: 2.7893 train_time: 2.9m tok/s: 8289925 +1858/20000 train_loss: 2.7324 train_time: 2.9m tok/s: 8290105 +1859/20000 train_loss: 2.6434 train_time: 2.9m tok/s: 8290126 +1860/20000 train_loss: 2.5791 train_time: 2.9m tok/s: 8290164 +1861/20000 train_loss: 2.5691 train_time: 2.9m tok/s: 8290146 +1862/20000 train_loss: 2.5410 train_time: 2.9m tok/s: 8290180 +1863/20000 train_loss: 2.6204 train_time: 2.9m tok/s: 8290194 +1864/20000 train_loss: 2.5799 train_time: 2.9m tok/s: 8290177 +1865/20000 train_loss: 2.7351 train_time: 2.9m tok/s: 8290141 +1866/20000 train_loss: 2.6326 train_time: 3.0m tok/s: 8290059 +1867/20000 train_loss: 2.5472 train_time: 3.0m tok/s: 8290038 +1868/20000 train_loss: 2.5482 train_time: 3.0m tok/s: 8290009 +1869/20000 train_loss: 2.6837 train_time: 3.0m tok/s: 8290003 +1870/20000 train_loss: 2.6340 train_time: 3.0m tok/s: 8290018 +1871/20000 train_loss: 2.5469 train_time: 3.0m tok/s: 8290039 +1872/20000 train_loss: 2.5854 train_time: 3.0m tok/s: 8290039 +1873/20000 train_loss: 2.7294 train_time: 3.0m tok/s: 8290054 +1874/20000 train_loss: 2.6603 train_time: 3.0m tok/s: 8290090 +1875/20000 train_loss: 2.7443 train_time: 3.0m tok/s: 8290039 +1876/20000 train_loss: 2.7810 train_time: 3.0m tok/s: 8290106 +1877/20000 train_loss: 2.9580 train_time: 3.0m tok/s: 8290050 +1878/20000 train_loss: 2.6546 train_time: 3.0m tok/s: 8290027 +1879/20000 train_loss: 2.6077 train_time: 3.0m tok/s: 8290016 +1880/20000 train_loss: 2.7505 train_time: 3.0m tok/s: 8289996 +1881/20000 train_loss: 2.5947 train_time: 3.0m tok/s: 8290011 +1882/20000 train_loss: 2.7270 train_time: 3.0m tok/s: 8290034 +1883/20000 train_loss: 2.5909 train_time: 3.0m tok/s: 8290001 +1884/20000 train_loss: 2.5610 train_time: 3.0m tok/s: 8289963 +1885/20000 train_loss: 2.6374 train_time: 3.0m tok/s: 8289942 +1886/20000 train_loss: 2.5631 train_time: 3.0m tok/s: 8289966 +1887/20000 train_loss: 2.6277 train_time: 3.0m tok/s: 8289930 +1888/20000 train_loss: 2.4888 train_time: 3.0m tok/s: 8289975 +1889/20000 train_loss: 2.5460 train_time: 3.0m tok/s: 8289990 +1890/20000 train_loss: 2.6743 train_time: 3.0m tok/s: 8289994 +1891/20000 train_loss: 2.5088 train_time: 3.0m tok/s: 8290029 +1892/20000 train_loss: 2.6814 train_time: 3.0m tok/s: 8290054 +1893/20000 train_loss: 2.6826 train_time: 3.0m tok/s: 8290045 +1894/20000 train_loss: 2.5974 train_time: 3.0m tok/s: 8290054 +1895/20000 train_loss: 2.6483 train_time: 3.0m tok/s: 8290074 +1896/20000 train_loss: 2.6221 train_time: 3.0m tok/s: 8290031 +1897/20000 train_loss: 2.5623 train_time: 3.0m tok/s: 8290076 +1898/20000 train_loss: 2.7007 train_time: 3.0m tok/s: 8290062 +1899/20000 train_loss: 2.6297 train_time: 3.0m tok/s: 8290074 +1900/20000 train_loss: 2.6156 train_time: 3.0m tok/s: 8290102 +1901/20000 train_loss: 2.6859 train_time: 3.0m tok/s: 8290113 +1902/20000 train_loss: 2.6006 train_time: 3.0m tok/s: 8290116 +1903/20000 train_loss: 2.7628 train_time: 3.0m tok/s: 8290115 +1904/20000 train_loss: 3.1416 train_time: 3.0m tok/s: 8290079 +1905/20000 train_loss: 2.4826 train_time: 3.0m tok/s: 8290028 +1906/20000 train_loss: 2.6308 train_time: 3.0m tok/s: 8290002 +1907/20000 train_loss: 2.5151 train_time: 3.0m tok/s: 8289994 +1908/20000 train_loss: 2.5362 train_time: 3.0m tok/s: 8290017 +1909/20000 train_loss: 2.5855 train_time: 3.0m tok/s: 8290040 +1910/20000 train_loss: 2.5332 train_time: 3.0m tok/s: 8290060 +1911/20000 train_loss: 2.4814 train_time: 3.0m tok/s: 8290060 +1912/20000 train_loss: 2.7027 train_time: 3.0m tok/s: 8290086 +1913/20000 train_loss: 2.7120 train_time: 3.0m tok/s: 8290094 +1914/20000 train_loss: 2.6948 train_time: 3.0m tok/s: 8290112 +1915/20000 train_loss: 2.7060 train_time: 3.0m tok/s: 8290169 +1916/20000 train_loss: 2.5766 train_time: 3.0m tok/s: 8290180 +1917/20000 train_loss: 2.7139 train_time: 3.0m tok/s: 8290203 +1918/20000 train_loss: 2.5770 train_time: 3.0m tok/s: 8290208 +1919/20000 train_loss: 2.5572 train_time: 3.0m tok/s: 8290231 +1920/20000 train_loss: 2.4943 train_time: 3.0m tok/s: 8290232 +1921/20000 train_loss: 2.7059 train_time: 3.0m tok/s: 8290218 +1922/20000 train_loss: 2.5984 train_time: 3.0m tok/s: 8290238 +1923/20000 train_loss: 2.5236 train_time: 3.0m tok/s: 8290266 +1924/20000 train_loss: 2.5957 train_time: 3.0m tok/s: 8290309 +1925/20000 train_loss: 2.5285 train_time: 3.0m tok/s: 8290254 +1926/20000 train_loss: 2.7436 train_time: 3.0m tok/s: 8290270 +1927/20000 train_loss: 2.5713 train_time: 3.0m tok/s: 8290290 +1928/20000 train_loss: 2.6347 train_time: 3.0m tok/s: 8290295 +1929/20000 train_loss: 2.6408 train_time: 3.0m tok/s: 8290294 +1930/20000 train_loss: 2.7059 train_time: 3.1m tok/s: 8290282 +1931/20000 train_loss: 2.6401 train_time: 3.1m tok/s: 8290299 +1932/20000 train_loss: 2.7560 train_time: 3.1m tok/s: 8290264 +1933/20000 train_loss: 2.6515 train_time: 3.1m tok/s: 8290281 +1934/20000 train_loss: 2.6652 train_time: 3.1m tok/s: 8290322 +1935/20000 train_loss: 2.5591 train_time: 3.1m tok/s: 8290294 +1936/20000 train_loss: 2.6781 train_time: 3.1m tok/s: 8290249 +1937/20000 train_loss: 2.6718 train_time: 3.1m tok/s: 8290254 +1938/20000 train_loss: 2.6975 train_time: 3.1m tok/s: 8290295 +1939/20000 train_loss: 2.6180 train_time: 3.1m tok/s: 8290333 +1940/20000 train_loss: 2.8139 train_time: 3.1m tok/s: 8290359 +1941/20000 train_loss: 2.4738 train_time: 3.1m tok/s: 8290387 +1942/20000 train_loss: 2.4976 train_time: 3.1m tok/s: 8290435 +1943/20000 train_loss: 2.4980 train_time: 3.1m tok/s: 8290452 +1944/20000 train_loss: 2.5234 train_time: 3.1m tok/s: 8290440 +1945/20000 train_loss: 2.5715 train_time: 3.1m tok/s: 8290477 +1946/20000 train_loss: 2.6210 train_time: 3.1m tok/s: 8290342 +1947/20000 train_loss: 2.6885 train_time: 3.1m tok/s: 8290517 +1948/20000 train_loss: 2.6837 train_time: 3.1m tok/s: 8290561 +1949/20000 train_loss: 2.7277 train_time: 3.1m tok/s: 8290529 +1950/20000 train_loss: 2.5849 train_time: 3.1m tok/s: 8290571 +1951/20000 train_loss: 2.8011 train_time: 3.1m tok/s: 8290479 +1952/20000 train_loss: 2.8197 train_time: 3.1m tok/s: 8290545 +1953/20000 train_loss: 2.6386 train_time: 3.1m tok/s: 8290575 +1954/20000 train_loss: 2.5848 train_time: 3.1m tok/s: 8290579 +1955/20000 train_loss: 2.8469 train_time: 3.1m tok/s: 8290541 +1956/20000 train_loss: 2.5796 train_time: 3.1m tok/s: 8290536 +1957/20000 train_loss: 2.6011 train_time: 3.1m tok/s: 8290553 +1958/20000 train_loss: 2.5690 train_time: 3.1m tok/s: 8290580 +1959/20000 train_loss: 2.5482 train_time: 3.1m tok/s: 8290584 +1960/20000 train_loss: 2.5001 train_time: 3.1m tok/s: 8290568 +1961/20000 train_loss: 2.5150 train_time: 3.1m tok/s: 8290550 +1962/20000 train_loss: 2.5980 train_time: 3.1m tok/s: 8290578 +1963/20000 train_loss: 2.5633 train_time: 3.1m tok/s: 8290650 +1964/20000 train_loss: 2.5851 train_time: 3.1m tok/s: 8290716 +1965/20000 train_loss: 2.5863 train_time: 3.1m tok/s: 8290684 +1966/20000 train_loss: 2.7447 train_time: 3.1m tok/s: 8290649 +1967/20000 train_loss: 2.5570 train_time: 3.1m tok/s: 8290626 +1968/20000 train_loss: 2.7037 train_time: 3.1m tok/s: 8290603 +1969/20000 train_loss: 2.7658 train_time: 3.1m tok/s: 8290631 +1970/20000 train_loss: 2.5912 train_time: 3.1m tok/s: 8290640 +1971/20000 train_loss: 2.6143 train_time: 3.1m tok/s: 8290629 +1972/20000 train_loss: 2.6814 train_time: 3.1m tok/s: 8290647 +1973/20000 train_loss: 2.5994 train_time: 3.1m tok/s: 8290641 +1974/20000 train_loss: 2.7719 train_time: 3.1m tok/s: 8290654 +1975/20000 train_loss: 2.5244 train_time: 3.1m tok/s: 8290669 +1976/20000 train_loss: 2.7181 train_time: 3.1m tok/s: 8290700 +1977/20000 train_loss: 2.5265 train_time: 3.1m tok/s: 8290746 +1978/20000 train_loss: 2.6921 train_time: 3.1m tok/s: 8290747 +1979/20000 train_loss: 2.5237 train_time: 3.1m tok/s: 8290780 +1980/20000 train_loss: 2.5761 train_time: 3.1m tok/s: 8290764 +1981/20000 train_loss: 2.4745 train_time: 3.1m tok/s: 8290740 +1982/20000 train_loss: 2.6388 train_time: 3.1m tok/s: 8290704 +1983/20000 train_loss: 2.3818 train_time: 3.1m tok/s: 8290711 +1984/20000 train_loss: 2.6869 train_time: 3.1m tok/s: 8290731 +1985/20000 train_loss: 2.6267 train_time: 3.1m tok/s: 8290749 +1986/20000 train_loss: 2.6724 train_time: 3.1m tok/s: 8290773 +1987/20000 train_loss: 2.6883 train_time: 3.1m tok/s: 8290819 +1988/20000 train_loss: 2.6504 train_time: 3.1m tok/s: 8290865 +1989/20000 train_loss: 2.4860 train_time: 3.1m tok/s: 8290904 +1990/20000 train_loss: 2.6527 train_time: 3.1m tok/s: 8290941 +1991/20000 train_loss: 2.5696 train_time: 3.1m tok/s: 8290994 +1992/20000 train_loss: 2.7516 train_time: 3.1m tok/s: 8291020 +1993/20000 train_loss: 2.5681 train_time: 3.2m tok/s: 8291045 +1994/20000 train_loss: 2.6140 train_time: 3.2m tok/s: 8291067 +1995/20000 train_loss: 2.5215 train_time: 3.2m tok/s: 8291067 +1996/20000 train_loss: 2.6147 train_time: 3.2m tok/s: 8291087 +1997/20000 train_loss: 2.5925 train_time: 3.2m tok/s: 8291090 +1998/20000 train_loss: 2.6135 train_time: 3.2m tok/s: 8291139 +1999/20000 train_loss: 2.6771 train_time: 3.2m tok/s: 8291131 +2000/20000 train_loss: 2.4927 train_time: 3.2m tok/s: 8291148 +2001/20000 train_loss: 2.5829 train_time: 3.2m tok/s: 8291179 +2002/20000 train_loss: 2.4523 train_time: 3.2m tok/s: 8291199 +2003/20000 train_loss: 2.6492 train_time: 3.2m tok/s: 8291202 +2004/20000 train_loss: 2.6347 train_time: 3.2m tok/s: 8291264 +2005/20000 train_loss: 2.6165 train_time: 3.2m tok/s: 8291323 +2006/20000 train_loss: 2.5791 train_time: 3.2m tok/s: 8291375 +2007/20000 train_loss: 2.6216 train_time: 3.2m tok/s: 8291320 +2008/20000 train_loss: 2.5454 train_time: 3.2m tok/s: 8291310 +2009/20000 train_loss: 2.6492 train_time: 3.2m tok/s: 8291338 +2010/20000 train_loss: 2.7025 train_time: 3.2m tok/s: 8291299 +2011/20000 train_loss: 2.5708 train_time: 3.2m tok/s: 8291302 +2012/20000 train_loss: 2.5746 train_time: 3.2m tok/s: 8291348 +2013/20000 train_loss: 2.4797 train_time: 3.2m tok/s: 8291319 +2014/20000 train_loss: 2.4742 train_time: 3.2m tok/s: 8291005 +2015/20000 train_loss: 2.6916 train_time: 3.2m tok/s: 8291316 +2016/20000 train_loss: 2.4779 train_time: 3.2m tok/s: 8291334 +2017/20000 train_loss: 2.6328 train_time: 3.2m tok/s: 8291327 +2018/20000 train_loss: 2.6270 train_time: 3.2m tok/s: 8291368 +2019/20000 train_loss: 2.7204 train_time: 3.2m tok/s: 8291425 +2020/20000 train_loss: 2.7076 train_time: 3.2m tok/s: 8291440 +2021/20000 train_loss: 2.5610 train_time: 3.2m tok/s: 8291402 +2022/20000 train_loss: 2.4910 train_time: 3.2m tok/s: 8291382 +2023/20000 train_loss: 2.6881 train_time: 3.2m tok/s: 8291389 +2024/20000 train_loss: 2.6380 train_time: 3.2m tok/s: 8291414 +2025/20000 train_loss: 2.4804 train_time: 3.2m tok/s: 8291386 +2026/20000 train_loss: 2.7093 train_time: 3.2m tok/s: 8291371 +2027/20000 train_loss: 2.5967 train_time: 3.2m tok/s: 8291375 +2028/20000 train_loss: 2.7139 train_time: 3.2m tok/s: 8291353 +2029/20000 train_loss: 2.4944 train_time: 3.2m tok/s: 8291365 +2030/20000 train_loss: 2.5111 train_time: 3.2m tok/s: 8291425 +2031/20000 train_loss: 2.5048 train_time: 3.2m tok/s: 8291456 +2032/20000 train_loss: 2.5616 train_time: 3.2m tok/s: 8291413 +2033/20000 train_loss: 2.8412 train_time: 3.2m tok/s: 8291355 +2034/20000 train_loss: 2.6811 train_time: 3.2m tok/s: 8291334 +2035/20000 train_loss: 2.6551 train_time: 3.2m tok/s: 8291385 +2036/20000 train_loss: 2.6146 train_time: 3.2m tok/s: 8291379 +2037/20000 train_loss: 2.8215 train_time: 3.2m tok/s: 8291311 +2038/20000 train_loss: 2.5898 train_time: 3.2m tok/s: 8291246 +2039/20000 train_loss: 2.6309 train_time: 3.2m tok/s: 8291310 +2040/20000 train_loss: 2.5776 train_time: 3.2m tok/s: 8291315 +2041/20000 train_loss: 2.6383 train_time: 3.2m tok/s: 8291318 +2042/20000 train_loss: 2.5297 train_time: 3.2m tok/s: 8291281 +2043/20000 train_loss: 2.4900 train_time: 3.2m tok/s: 8291254 +2044/20000 train_loss: 2.6508 train_time: 3.2m tok/s: 8291273 +2045/20000 train_loss: 2.4366 train_time: 3.2m tok/s: 8291291 +2046/20000 train_loss: 2.4630 train_time: 3.2m tok/s: 8291344 +2047/20000 train_loss: 2.7441 train_time: 3.2m tok/s: 8291379 +2048/20000 train_loss: 2.5711 train_time: 3.2m tok/s: 8291419 +2049/20000 train_loss: 2.7087 train_time: 3.2m tok/s: 8291483 +2050/20000 train_loss: 2.6641 train_time: 3.2m tok/s: 8291453 +2051/20000 train_loss: 2.6474 train_time: 3.2m tok/s: 8291482 +2052/20000 train_loss: 2.5261 train_time: 3.2m tok/s: 8291515 +2053/20000 train_loss: 2.6339 train_time: 3.2m tok/s: 8291529 +2054/20000 train_loss: 2.6695 train_time: 3.2m tok/s: 8291551 +2055/20000 train_loss: 2.5748 train_time: 3.2m tok/s: 8291526 +2056/20000 train_loss: 2.6133 train_time: 3.3m tok/s: 8291547 +2057/20000 train_loss: 2.6575 train_time: 3.3m tok/s: 8291507 +2058/20000 train_loss: 2.5596 train_time: 3.3m tok/s: 8291530 +2059/20000 train_loss: 2.5005 train_time: 3.3m tok/s: 8291544 +2060/20000 train_loss: 2.5871 train_time: 3.3m tok/s: 8291571 +2061/20000 train_loss: 2.5861 train_time: 3.3m tok/s: 8291583 +2062/20000 train_loss: 2.6091 train_time: 3.3m tok/s: 8291609 +2063/20000 train_loss: 2.5513 train_time: 3.3m tok/s: 8291594 +2064/20000 train_loss: 2.7987 train_time: 3.3m tok/s: 8291620 +2065/20000 train_loss: 2.5345 train_time: 3.3m tok/s: 8291640 +2066/20000 train_loss: 2.6121 train_time: 3.3m tok/s: 8291576 +2067/20000 train_loss: 2.6664 train_time: 3.3m tok/s: 8291515 +2068/20000 train_loss: 2.6048 train_time: 3.3m tok/s: 8291539 +2069/20000 train_loss: 2.4607 train_time: 3.3m tok/s: 8291538 +2070/20000 train_loss: 2.6100 train_time: 3.3m tok/s: 8291578 +2071/20000 train_loss: 2.5489 train_time: 3.3m tok/s: 8291588 +2072/20000 train_loss: 2.5948 train_time: 3.3m tok/s: 8291586 +2073/20000 train_loss: 2.5307 train_time: 3.3m tok/s: 8291561 +2074/20000 train_loss: 2.6917 train_time: 3.3m tok/s: 8291554 +2075/20000 train_loss: 2.5746 train_time: 3.3m tok/s: 8291600 +2076/20000 train_loss: 2.6688 train_time: 3.3m tok/s: 8291607 +2077/20000 train_loss: 3.5665 train_time: 3.3m tok/s: 8291530 +2078/20000 train_loss: 2.7101 train_time: 3.3m tok/s: 8291442 +2079/20000 train_loss: 2.6564 train_time: 3.3m tok/s: 8291442 +2080/20000 train_loss: 2.6023 train_time: 3.3m tok/s: 8291407 +2081/20000 train_loss: 2.6039 train_time: 3.3m tok/s: 8291359 +2082/20000 train_loss: 2.5942 train_time: 3.3m tok/s: 8291387 +2083/20000 train_loss: 2.5436 train_time: 3.3m tok/s: 8291245 +2084/20000 train_loss: 2.5772 train_time: 3.3m tok/s: 8291274 +2085/20000 train_loss: 2.5786 train_time: 3.3m tok/s: 8291131 +2086/20000 train_loss: 2.6103 train_time: 3.3m tok/s: 8291312 +2087/20000 train_loss: 2.5430 train_time: 3.3m tok/s: 8291297 +2088/20000 train_loss: 2.4582 train_time: 3.3m tok/s: 8291273 +2089/20000 train_loss: 2.6284 train_time: 3.3m tok/s: 8291313 +2090/20000 train_loss: 2.7349 train_time: 3.3m tok/s: 8291358 +2091/20000 train_loss: 2.6191 train_time: 3.3m tok/s: 8291388 +2092/20000 train_loss: 2.6517 train_time: 3.3m tok/s: 8291395 +2093/20000 train_loss: 2.6436 train_time: 3.3m tok/s: 8291433 +2094/20000 train_loss: 2.5984 train_time: 3.3m tok/s: 8291465 +2095/20000 train_loss: 2.5864 train_time: 3.3m tok/s: 8291517 +2096/20000 train_loss: 2.6904 train_time: 3.3m tok/s: 8291519 +2097/20000 train_loss: 2.5687 train_time: 3.3m tok/s: 8291527 +2098/20000 train_loss: 2.4823 train_time: 3.3m tok/s: 8291539 +2099/20000 train_loss: 2.4886 train_time: 3.3m tok/s: 8291553 +2100/20000 train_loss: 2.5773 train_time: 3.3m tok/s: 8291564 +2101/20000 train_loss: 2.6519 train_time: 3.3m tok/s: 8291492 +2102/20000 train_loss: 2.5672 train_time: 3.3m tok/s: 8291475 +2103/20000 train_loss: 2.5789 train_time: 3.3m tok/s: 8291463 +2104/20000 train_loss: 2.7015 train_time: 3.3m tok/s: 8291520 +2105/20000 train_loss: 2.7181 train_time: 3.3m tok/s: 8291556 +2106/20000 train_loss: 2.7032 train_time: 3.3m tok/s: 8291602 +2107/20000 train_loss: 2.5995 train_time: 3.3m tok/s: 8291610 +2108/20000 train_loss: 2.5364 train_time: 3.3m tok/s: 8291600 +2109/20000 train_loss: 2.7123 train_time: 3.3m tok/s: 8291598 +2110/20000 train_loss: 2.5728 train_time: 3.3m tok/s: 8291613 +2111/20000 train_loss: 2.5540 train_time: 3.3m tok/s: 8291611 +2112/20000 train_loss: 2.5530 train_time: 3.3m tok/s: 8291626 +2113/20000 train_loss: 2.5222 train_time: 3.3m tok/s: 8291629 +2114/20000 train_loss: 2.7815 train_time: 3.3m tok/s: 8291652 +2115/20000 train_loss: 2.4908 train_time: 3.3m tok/s: 8291667 +2116/20000 train_loss: 2.6482 train_time: 3.3m tok/s: 8291725 +2117/20000 train_loss: 2.6899 train_time: 3.3m tok/s: 8291766 +2118/20000 train_loss: 2.6936 train_time: 3.3m tok/s: 8291820 +2119/20000 train_loss: 2.7927 train_time: 3.3m tok/s: 8291844 +2120/20000 train_loss: 2.6526 train_time: 3.4m tok/s: 8291865 +2121/20000 train_loss: 2.6317 train_time: 3.4m tok/s: 8291869 +2122/20000 train_loss: 2.5488 train_time: 3.4m tok/s: 8291882 +2123/20000 train_loss: 2.4025 train_time: 3.4m tok/s: 8291861 +2124/20000 train_loss: 2.6329 train_time: 3.4m tok/s: 8291883 +2125/20000 train_loss: 2.6149 train_time: 3.4m tok/s: 8291962 +2126/20000 train_loss: 2.6272 train_time: 3.4m tok/s: 8291961 +2127/20000 train_loss: 2.5549 train_time: 3.4m tok/s: 8291912 +2128/20000 train_loss: 2.5337 train_time: 3.4m tok/s: 8291935 +2129/20000 train_loss: 2.3653 train_time: 3.4m tok/s: 8291928 +2130/20000 train_loss: 2.6738 train_time: 3.4m tok/s: 8291946 +2131/20000 train_loss: 2.5861 train_time: 3.4m tok/s: 8291967 +2132/20000 train_loss: 2.9060 train_time: 3.4m tok/s: 8291978 +2133/20000 train_loss: 2.6964 train_time: 3.4m tok/s: 8292010 +2134/20000 train_loss: 2.6296 train_time: 3.4m tok/s: 8291934 +2135/20000 train_loss: 2.5824 train_time: 3.4m tok/s: 8291902 +2136/20000 train_loss: 2.5968 train_time: 3.4m tok/s: 8291967 +2137/20000 train_loss: 2.5364 train_time: 3.4m tok/s: 8291985 +2138/20000 train_loss: 2.5873 train_time: 3.4m tok/s: 8291997 +2139/20000 train_loss: 2.6778 train_time: 3.4m tok/s: 8291982 +2140/20000 train_loss: 2.5487 train_time: 3.4m tok/s: 8291992 +2141/20000 train_loss: 2.5237 train_time: 3.4m tok/s: 8292019 +2142/20000 train_loss: 2.6039 train_time: 3.4m tok/s: 8292026 +2143/20000 train_loss: 2.4848 train_time: 3.4m tok/s: 8292002 +2144/20000 train_loss: 2.5850 train_time: 3.4m tok/s: 8292042 +2145/20000 train_loss: 2.6658 train_time: 3.4m tok/s: 8292044 +2146/20000 train_loss: 2.5916 train_time: 3.4m tok/s: 8292004 +2147/20000 train_loss: 2.5544 train_time: 3.4m tok/s: 8291992 +2148/20000 train_loss: 2.7034 train_time: 3.4m tok/s: 8291986 +2149/20000 train_loss: 2.5341 train_time: 3.4m tok/s: 8291992 +2150/20000 train_loss: 2.7333 train_time: 3.4m tok/s: 8291989 +2151/20000 train_loss: 2.6847 train_time: 3.4m tok/s: 8292001 +2152/20000 train_loss: 2.5944 train_time: 3.4m tok/s: 8291998 +2153/20000 train_loss: 2.4621 train_time: 3.4m tok/s: 8291988 +2154/20000 train_loss: 2.6287 train_time: 3.4m tok/s: 8291979 +2155/20000 train_loss: 2.5160 train_time: 3.4m tok/s: 8291977 +2156/20000 train_loss: 2.6089 train_time: 3.4m tok/s: 8291971 +2157/20000 train_loss: 2.5869 train_time: 3.4m tok/s: 8292034 +2158/20000 train_loss: 2.5861 train_time: 3.4m tok/s: 8292027 +2159/20000 train_loss: 2.5334 train_time: 3.4m tok/s: 8292033 +2160/20000 train_loss: 2.4307 train_time: 3.4m tok/s: 8292067 +2161/20000 train_loss: 2.6130 train_time: 3.4m tok/s: 8292087 +2162/20000 train_loss: 2.6328 train_time: 3.4m tok/s: 8292131 +2163/20000 train_loss: 2.5553 train_time: 3.4m tok/s: 8292134 +2164/20000 train_loss: 2.6036 train_time: 3.4m tok/s: 8292154 +2165/20000 train_loss: 2.6352 train_time: 3.4m tok/s: 8292156 +2166/20000 train_loss: 2.5209 train_time: 3.4m tok/s: 8292145 +2167/20000 train_loss: 2.5253 train_time: 3.4m tok/s: 8292092 +2168/20000 train_loss: 2.5819 train_time: 3.4m tok/s: 8292073 +2169/20000 train_loss: 2.6276 train_time: 3.4m tok/s: 8292028 +2170/20000 train_loss: 2.4920 train_time: 3.4m tok/s: 8292015 +2171/20000 train_loss: 2.6018 train_time: 3.4m tok/s: 8292012 +2172/20000 train_loss: 2.5086 train_time: 3.4m tok/s: 8292024 +2173/20000 train_loss: 2.7131 train_time: 3.4m tok/s: 8292053 +2174/20000 train_loss: 2.5495 train_time: 3.4m tok/s: 8292066 +2175/20000 train_loss: 2.4521 train_time: 3.4m tok/s: 8292027 +2176/20000 train_loss: 2.6667 train_time: 3.4m tok/s: 8292053 +2177/20000 train_loss: 2.6090 train_time: 3.4m tok/s: 8292053 +2178/20000 train_loss: 2.5260 train_time: 3.4m tok/s: 8292101 +2179/20000 train_loss: 2.7057 train_time: 3.4m tok/s: 8292117 +2180/20000 train_loss: 2.5616 train_time: 3.4m tok/s: 8292131 +2181/20000 train_loss: 2.5550 train_time: 3.4m tok/s: 8292114 +2182/20000 train_loss: 2.4372 train_time: 3.4m tok/s: 8292150 +2183/20000 train_loss: 2.5303 train_time: 3.5m tok/s: 8292185 +2184/20000 train_loss: 2.5372 train_time: 3.5m tok/s: 8292184 +2185/20000 train_loss: 2.4353 train_time: 3.5m tok/s: 8292186 +2186/20000 train_loss: 2.7413 train_time: 3.5m tok/s: 8292203 +2187/20000 train_loss: 2.5540 train_time: 3.5m tok/s: 8292166 +2188/20000 train_loss: 2.5506 train_time: 3.5m tok/s: 8292175 +2189/20000 train_loss: 2.6329 train_time: 3.5m tok/s: 8292239 +2190/20000 train_loss: 2.6745 train_time: 3.5m tok/s: 8292276 +2191/20000 train_loss: 2.6067 train_time: 3.5m tok/s: 8292282 +2192/20000 train_loss: 2.6464 train_time: 3.5m tok/s: 8292303 +2193/20000 train_loss: 2.5787 train_time: 3.5m tok/s: 8292317 +2194/20000 train_loss: 2.5835 train_time: 3.5m tok/s: 8292352 +2195/20000 train_loss: 2.6095 train_time: 3.5m tok/s: 8292307 +2196/20000 train_loss: 2.6028 train_time: 3.5m tok/s: 8292291 +2197/20000 train_loss: 2.5812 train_time: 3.5m tok/s: 8292287 +2198/20000 train_loss: 2.5352 train_time: 3.5m tok/s: 8292293 +2199/20000 train_loss: 2.5373 train_time: 3.5m tok/s: 8292317 +2200/20000 train_loss: 2.6137 train_time: 3.5m tok/s: 8292384 +2201/20000 train_loss: 2.6671 train_time: 3.5m tok/s: 8292395 +2202/20000 train_loss: 2.5884 train_time: 3.5m tok/s: 8292340 +2203/20000 train_loss: 2.5674 train_time: 3.5m tok/s: 8292247 +2204/20000 train_loss: 2.4201 train_time: 3.5m tok/s: 8292253 +2205/20000 train_loss: 2.6445 train_time: 3.5m tok/s: 8292220 +2206/20000 train_loss: 2.5663 train_time: 3.5m tok/s: 8292240 +2207/20000 train_loss: 2.5223 train_time: 3.5m tok/s: 8292248 +2208/20000 train_loss: 2.6521 train_time: 3.5m tok/s: 8292267 +2209/20000 train_loss: 2.7697 train_time: 3.5m tok/s: 8292313 +2210/20000 train_loss: 2.7221 train_time: 3.5m tok/s: 8292332 +2211/20000 train_loss: 2.6573 train_time: 3.5m tok/s: 8292316 +2212/20000 train_loss: 2.4505 train_time: 3.5m tok/s: 8292341 +layer_loop:enabled step:2212 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2213/20000 train_loss: 2.8799 train_time: 3.5m tok/s: 8290354 +2214/20000 train_loss: 2.6519 train_time: 3.5m tok/s: 8288573 +2215/20000 train_loss: 2.6916 train_time: 3.5m tok/s: 8286861 +2216/20000 train_loss: 2.6041 train_time: 3.5m tok/s: 8285088 +2217/20000 train_loss: 2.7447 train_time: 3.5m tok/s: 8283239 +2218/20000 train_loss: 2.7245 train_time: 3.5m tok/s: 8281466 +2219/20000 train_loss: 2.5453 train_time: 3.5m tok/s: 8279704 +2220/20000 train_loss: 2.5754 train_time: 3.5m tok/s: 8277888 +2221/20000 train_loss: 2.7018 train_time: 3.5m tok/s: 8276155 +2222/20000 train_loss: 2.5240 train_time: 3.5m tok/s: 8274390 +2223/20000 train_loss: 2.6029 train_time: 3.5m tok/s: 8272593 +2224/20000 train_loss: 2.4153 train_time: 3.5m tok/s: 8270836 +2225/20000 train_loss: 2.5763 train_time: 3.5m tok/s: 8269044 +2226/20000 train_loss: 2.5582 train_time: 3.5m tok/s: 8267335 +2227/20000 train_loss: 2.5307 train_time: 3.5m tok/s: 8265609 +2228/20000 train_loss: 2.5708 train_time: 3.5m tok/s: 8263854 +2229/20000 train_loss: 2.5299 train_time: 3.5m tok/s: 8262086 +2230/20000 train_loss: 2.5482 train_time: 3.5m tok/s: 8260364 +2231/20000 train_loss: 2.3497 train_time: 3.5m tok/s: 8258610 +2232/20000 train_loss: 2.5423 train_time: 3.5m tok/s: 8256806 +2233/20000 train_loss: 2.6722 train_time: 3.5m tok/s: 8255141 +2234/20000 train_loss: 2.7006 train_time: 3.5m tok/s: 8253433 +2235/20000 train_loss: 2.6533 train_time: 3.6m tok/s: 8251728 +2236/20000 train_loss: 2.6055 train_time: 3.6m tok/s: 8250008 +2237/20000 train_loss: 2.6368 train_time: 3.6m tok/s: 8248263 +2238/20000 train_loss: 2.6565 train_time: 3.6m tok/s: 8246520 +2239/20000 train_loss: 2.7573 train_time: 3.6m tok/s: 8244744 +2240/20000 train_loss: 2.7669 train_time: 3.6m tok/s: 8243047 +2241/20000 train_loss: 2.4029 train_time: 3.6m tok/s: 8241343 +2242/20000 train_loss: 2.5853 train_time: 3.6m tok/s: 8239606 +2243/20000 train_loss: 2.5704 train_time: 3.6m tok/s: 8237926 +2244/20000 train_loss: 2.5933 train_time: 3.6m tok/s: 8236212 +2245/20000 train_loss: 2.6316 train_time: 3.6m tok/s: 8234470 +2246/20000 train_loss: 2.5407 train_time: 3.6m tok/s: 8232796 +2247/20000 train_loss: 2.6831 train_time: 3.6m tok/s: 8231117 +2248/20000 train_loss: 2.6684 train_time: 3.6m tok/s: 8229408 +2249/20000 train_loss: 2.5449 train_time: 3.6m tok/s: 8227727 +2250/20000 train_loss: 2.5389 train_time: 3.6m tok/s: 8226015 +2251/20000 train_loss: 2.5833 train_time: 3.6m tok/s: 8224343 +2252/20000 train_loss: 2.7330 train_time: 3.6m tok/s: 8222654 +2253/20000 train_loss: 2.5716 train_time: 3.6m tok/s: 8221000 +2254/20000 train_loss: 2.5642 train_time: 3.6m tok/s: 8219232 +2255/20000 train_loss: 2.4265 train_time: 3.6m tok/s: 8217526 +2256/20000 train_loss: 2.4719 train_time: 3.6m tok/s: 8215859 +2257/20000 train_loss: 2.5176 train_time: 3.6m tok/s: 8214104 +2258/20000 train_loss: 2.5385 train_time: 3.6m tok/s: 8212333 +2259/20000 train_loss: 2.5620 train_time: 3.6m tok/s: 8210689 +2260/20000 train_loss: 2.5165 train_time: 3.6m tok/s: 8208987 +2261/20000 train_loss: 2.6054 train_time: 3.6m tok/s: 8207354 +2262/20000 train_loss: 2.7081 train_time: 3.6m tok/s: 8205642 +2263/20000 train_loss: 2.6994 train_time: 3.6m tok/s: 8203990 +2264/20000 train_loss: 2.4714 train_time: 3.6m tok/s: 8202280 +2265/20000 train_loss: 2.5564 train_time: 3.6m tok/s: 8200693 +2266/20000 train_loss: 2.5837 train_time: 3.6m tok/s: 8198999 +2267/20000 train_loss: 2.6004 train_time: 3.6m tok/s: 8197349 +2268/20000 train_loss: 2.6907 train_time: 3.6m tok/s: 8195715 +2269/20000 train_loss: 2.7710 train_time: 3.6m tok/s: 8194023 +2270/20000 train_loss: 2.5252 train_time: 3.6m tok/s: 8192395 +2271/20000 train_loss: 2.5757 train_time: 3.6m tok/s: 8190703 +2272/20000 train_loss: 2.5117 train_time: 3.6m tok/s: 8189072 +2273/20000 train_loss: 2.5748 train_time: 3.6m tok/s: 8187456 +2274/20000 train_loss: 3.2742 train_time: 3.6m tok/s: 8185755 +2275/20000 train_loss: 2.4158 train_time: 3.6m tok/s: 8184062 +2276/20000 train_loss: 2.5209 train_time: 3.6m tok/s: 8182390 +2277/20000 train_loss: 2.7794 train_time: 3.6m tok/s: 8180752 +2278/20000 train_loss: 2.7189 train_time: 3.7m tok/s: 8179155 +2279/20000 train_loss: 2.6406 train_time: 3.7m tok/s: 8177514 +2280/20000 train_loss: 2.7333 train_time: 3.7m tok/s: 8175910 +2281/20000 train_loss: 2.5420 train_time: 3.7m tok/s: 8174351 +2282/20000 train_loss: 2.7305 train_time: 3.7m tok/s: 8172751 +2283/20000 train_loss: 2.9771 train_time: 3.7m tok/s: 8171110 +2284/20000 train_loss: 2.4886 train_time: 3.7m tok/s: 8169473 +2285/20000 train_loss: 2.5270 train_time: 3.7m tok/s: 8167872 +2286/20000 train_loss: 2.4520 train_time: 3.7m tok/s: 8166147 +2287/20000 train_loss: 2.4572 train_time: 3.7m tok/s: 8164521 +2288/20000 train_loss: 2.6488 train_time: 3.7m tok/s: 8162881 +2289/20000 train_loss: 2.6620 train_time: 3.7m tok/s: 8161279 +2290/20000 train_loss: 2.6427 train_time: 3.7m tok/s: 8159681 +2291/20000 train_loss: 2.5350 train_time: 3.7m tok/s: 8158101 +2292/20000 train_loss: 2.4754 train_time: 3.7m tok/s: 8156495 +2293/20000 train_loss: 2.5558 train_time: 3.7m tok/s: 8154897 +2294/20000 train_loss: 2.4628 train_time: 3.7m tok/s: 8153329 +2295/20000 train_loss: 2.6636 train_time: 3.7m tok/s: 8151746 +2296/20000 train_loss: 2.5104 train_time: 3.7m tok/s: 8150172 +2297/20000 train_loss: 2.6258 train_time: 3.7m tok/s: 8148532 +2298/20000 train_loss: 2.4971 train_time: 3.7m tok/s: 8146874 +2299/20000 train_loss: 2.5963 train_time: 3.7m tok/s: 8145301 +2300/20000 train_loss: 2.6222 train_time: 3.7m tok/s: 8143686 +2301/20000 train_loss: 2.3976 train_time: 3.7m tok/s: 8142088 +2302/20000 train_loss: 2.5739 train_time: 3.7m tok/s: 8140549 +2303/20000 train_loss: 2.4994 train_time: 3.7m tok/s: 8138982 +2304/20000 train_loss: 2.5911 train_time: 3.7m tok/s: 8137362 +2305/20000 train_loss: 2.4470 train_time: 3.7m tok/s: 8135676 +2306/20000 train_loss: 2.6719 train_time: 3.7m tok/s: 8134117 +2307/20000 train_loss: 2.6450 train_time: 3.7m tok/s: 8132587 +2308/20000 train_loss: 2.5860 train_time: 3.7m tok/s: 8131064 +2309/20000 train_loss: 2.5421 train_time: 3.7m tok/s: 8129504 +2310/20000 train_loss: 2.6229 train_time: 3.7m tok/s: 8127919 +2311/20000 train_loss: 2.6152 train_time: 3.7m tok/s: 8126410 +2312/20000 train_loss: 2.6310 train_time: 3.7m tok/s: 8124792 +2313/20000 train_loss: 2.6181 train_time: 3.7m tok/s: 8123245 +2314/20000 train_loss: 2.4363 train_time: 3.7m tok/s: 8121661 +2315/20000 train_loss: 2.4126 train_time: 3.7m tok/s: 8120063 +2316/20000 train_loss: 2.3652 train_time: 3.7m tok/s: 8118503 +2317/20000 train_loss: 2.7504 train_time: 3.7m tok/s: 8116844 +2318/20000 train_loss: 2.6389 train_time: 3.7m tok/s: 8115300 +2319/20000 train_loss: 2.4504 train_time: 3.7m tok/s: 8113742 +2320/20000 train_loss: 2.6196 train_time: 3.7m tok/s: 8112199 +2321/20000 train_loss: 2.5968 train_time: 3.8m tok/s: 8110654 +2322/20000 train_loss: 2.4604 train_time: 3.8m tok/s: 8109112 +2323/20000 train_loss: 2.6340 train_time: 3.8m tok/s: 8107570 +2324/20000 train_loss: 2.5737 train_time: 3.8m tok/s: 8105969 +2325/20000 train_loss: 2.5943 train_time: 3.8m tok/s: 8104406 +2326/20000 train_loss: 2.5995 train_time: 3.8m tok/s: 8102870 +2327/20000 train_loss: 2.5770 train_time: 3.8m tok/s: 8101346 +2328/20000 train_loss: 2.5304 train_time: 3.8m tok/s: 8099845 +2329/20000 train_loss: 2.4132 train_time: 3.8m tok/s: 8098350 +2330/20000 train_loss: 2.6775 train_time: 3.8m tok/s: 8096797 +2331/20000 train_loss: 2.5801 train_time: 3.8m tok/s: 8095266 +2332/20000 train_loss: 2.3998 train_time: 3.8m tok/s: 8093731 +2333/20000 train_loss: 2.6232 train_time: 3.8m tok/s: 8092213 +2334/20000 train_loss: 2.2830 train_time: 3.8m tok/s: 8090597 +2335/20000 train_loss: 2.5444 train_time: 3.8m tok/s: 8089121 +2336/20000 train_loss: 2.6277 train_time: 3.8m tok/s: 8087702 +2337/20000 train_loss: 2.6572 train_time: 3.8m tok/s: 8086194 +2338/20000 train_loss: 2.5334 train_time: 3.8m tok/s: 8084636 +2339/20000 train_loss: 2.6192 train_time: 3.8m tok/s: 8083138 +2340/20000 train_loss: 2.5719 train_time: 3.8m tok/s: 8081630 +2341/20000 train_loss: 2.5547 train_time: 3.8m tok/s: 8080180 +2342/20000 train_loss: 2.5023 train_time: 3.8m tok/s: 8078631 +2343/20000 train_loss: 2.4343 train_time: 3.8m tok/s: 8077146 +2344/20000 train_loss: 2.7054 train_time: 3.8m tok/s: 8075619 +2345/20000 train_loss: 3.0418 train_time: 3.8m tok/s: 8074088 +2346/20000 train_loss: 2.5283 train_time: 3.8m tok/s: 8072573 +2347/20000 train_loss: 2.5334 train_time: 3.8m tok/s: 8071027 +2348/20000 train_loss: 2.7228 train_time: 3.8m tok/s: 8069535 +2349/20000 train_loss: 2.5956 train_time: 3.8m tok/s: 8068062 +2350/20000 train_loss: 2.5778 train_time: 3.8m tok/s: 8066566 +2351/20000 train_loss: 2.5684 train_time: 3.8m tok/s: 8065122 +2352/20000 train_loss: 2.5947 train_time: 3.8m tok/s: 8063623 +2353/20000 train_loss: 2.4990 train_time: 3.8m tok/s: 8062053 +2354/20000 train_loss: 2.5453 train_time: 3.8m tok/s: 8060607 +2355/20000 train_loss: 2.5102 train_time: 3.8m tok/s: 8059138 +2356/20000 train_loss: 2.5687 train_time: 3.8m tok/s: 8057655 +2357/20000 train_loss: 2.5197 train_time: 3.8m tok/s: 8056171 +2358/20000 train_loss: 2.5226 train_time: 3.8m tok/s: 8054705 +2359/20000 train_loss: 2.5087 train_time: 3.8m tok/s: 8053218 +2360/20000 train_loss: 2.5832 train_time: 3.8m tok/s: 8051739 +2361/20000 train_loss: 2.5939 train_time: 3.8m tok/s: 8050274 +2362/20000 train_loss: 2.5251 train_time: 3.8m tok/s: 8048713 +2363/20000 train_loss: 2.5409 train_time: 3.8m tok/s: 8047267 +2364/20000 train_loss: 2.6554 train_time: 3.9m tok/s: 8045741 +2365/20000 train_loss: 2.5557 train_time: 3.9m tok/s: 8044303 +2366/20000 train_loss: 2.6198 train_time: 3.9m tok/s: 8042844 +2367/20000 train_loss: 2.5366 train_time: 3.9m tok/s: 8041390 +2368/20000 train_loss: 2.6843 train_time: 3.9m tok/s: 8039953 +2369/20000 train_loss: 2.5187 train_time: 3.9m tok/s: 8038493 +2370/20000 train_loss: 2.6036 train_time: 3.9m tok/s: 8037038 +2371/20000 train_loss: 2.5908 train_time: 3.9m tok/s: 8035597 +2372/20000 train_loss: 2.6324 train_time: 3.9m tok/s: 8034165 +2373/20000 train_loss: 2.4956 train_time: 3.9m tok/s: 8032675 +2374/20000 train_loss: 2.5590 train_time: 3.9m tok/s: 8031204 +2375/20000 train_loss: 2.5488 train_time: 3.9m tok/s: 8029757 +2376/20000 train_loss: 2.4979 train_time: 3.9m tok/s: 8028313 +2377/20000 train_loss: 2.4139 train_time: 3.9m tok/s: 8026837 +2378/20000 train_loss: 2.5139 train_time: 3.9m tok/s: 8025375 +2379/20000 train_loss: 2.8960 train_time: 3.9m tok/s: 8023909 +2380/20000 train_loss: 2.5065 train_time: 3.9m tok/s: 8022477 +2381/20000 train_loss: 2.6692 train_time: 3.9m tok/s: 8021092 +2382/20000 train_loss: 2.4629 train_time: 3.9m tok/s: 8019665 +2383/20000 train_loss: 2.6464 train_time: 3.9m tok/s: 8018186 +2384/20000 train_loss: 2.6560 train_time: 3.9m tok/s: 8016766 +2385/20000 train_loss: 2.6713 train_time: 3.9m tok/s: 8015286 +2386/20000 train_loss: 2.6318 train_time: 3.9m tok/s: 8013851 +2387/20000 train_loss: 2.4930 train_time: 3.9m tok/s: 8012418 +2388/20000 train_loss: 2.9492 train_time: 3.9m tok/s: 8010841 +2389/20000 train_loss: 2.3594 train_time: 3.9m tok/s: 8009289 +2390/20000 train_loss: 2.5891 train_time: 3.9m tok/s: 8007885 +2391/20000 train_loss: 2.5180 train_time: 3.9m tok/s: 8006443 +2392/20000 train_loss: 2.6426 train_time: 3.9m tok/s: 8005056 +2393/20000 train_loss: 2.5431 train_time: 3.9m tok/s: 8003686 +2394/20000 train_loss: 2.6311 train_time: 3.9m tok/s: 8002287 +2395/20000 train_loss: 2.6373 train_time: 3.9m tok/s: 8000897 +2396/20000 train_loss: 2.6038 train_time: 3.9m tok/s: 7999543 +2397/20000 train_loss: 2.6886 train_time: 3.9m tok/s: 7998135 +2398/20000 train_loss: 2.5166 train_time: 3.9m tok/s: 7996718 +2399/20000 train_loss: 2.5219 train_time: 3.9m tok/s: 7995316 +2400/20000 train_loss: 2.5231 train_time: 3.9m tok/s: 7993898 +2401/20000 train_loss: 2.6050 train_time: 3.9m tok/s: 7992528 +2402/20000 train_loss: 2.5077 train_time: 3.9m tok/s: 7991141 +2403/20000 train_loss: 2.8782 train_time: 3.9m tok/s: 7989662 +2404/20000 train_loss: 2.5499 train_time: 3.9m tok/s: 7988286 +2405/20000 train_loss: 2.4814 train_time: 3.9m tok/s: 7986928 +2406/20000 train_loss: 2.5731 train_time: 3.9m tok/s: 7985535 +2407/20000 train_loss: 2.5799 train_time: 4.0m tok/s: 7984206 +2408/20000 train_loss: 2.6559 train_time: 4.0m tok/s: 7982802 +2409/20000 train_loss: 2.5912 train_time: 4.0m tok/s: 7981413 +2410/20000 train_loss: 2.5747 train_time: 4.0m tok/s: 7979998 +2411/20000 train_loss: 2.5325 train_time: 4.0m tok/s: 7978635 +2412/20000 train_loss: 2.6527 train_time: 4.0m tok/s: 7977222 +2413/20000 train_loss: 2.4942 train_time: 4.0m tok/s: 7975789 +2414/20000 train_loss: 2.5594 train_time: 4.0m tok/s: 7974456 +2415/20000 train_loss: 2.5489 train_time: 4.0m tok/s: 7973052 +2416/20000 train_loss: 2.5811 train_time: 4.0m tok/s: 7971662 +2417/20000 train_loss: 2.5505 train_time: 4.0m tok/s: 7970258 +2418/20000 train_loss: 2.5037 train_time: 4.0m tok/s: 7968863 +2419/20000 train_loss: 2.5496 train_time: 4.0m tok/s: 7967501 +2420/20000 train_loss: 2.5951 train_time: 4.0m tok/s: 7966122 +2421/20000 train_loss: 2.6249 train_time: 4.0m tok/s: 7964789 +2422/20000 train_loss: 2.6069 train_time: 4.0m tok/s: 7963436 +2423/20000 train_loss: 2.5030 train_time: 4.0m tok/s: 7962059 +2424/20000 train_loss: 2.6222 train_time: 4.0m tok/s: 7960703 +2425/20000 train_loss: 2.6090 train_time: 4.0m tok/s: 7959290 +2426/20000 train_loss: 2.5448 train_time: 4.0m tok/s: 7957937 +2427/20000 train_loss: 2.4147 train_time: 4.0m tok/s: 7956549 +2428/20000 train_loss: 2.5232 train_time: 4.0m tok/s: 7955142 +2429/20000 train_loss: 2.4949 train_time: 4.0m tok/s: 7953808 +2430/20000 train_loss: 2.4655 train_time: 4.0m tok/s: 7952460 +2431/20000 train_loss: 2.5862 train_time: 4.0m tok/s: 7951016 +2432/20000 train_loss: 2.5450 train_time: 4.0m tok/s: 7949686 +2433/20000 train_loss: 2.6749 train_time: 4.0m tok/s: 7948370 +2434/20000 train_loss: 2.4874 train_time: 4.0m tok/s: 7947004 +2435/20000 train_loss: 2.6716 train_time: 4.0m tok/s: 7945646 +2436/20000 train_loss: 2.5349 train_time: 4.0m tok/s: 7944317 +2437/20000 train_loss: 2.5780 train_time: 4.0m tok/s: 7942985 +2438/20000 train_loss: 2.5266 train_time: 4.0m tok/s: 7941623 +2439/20000 train_loss: 2.5257 train_time: 4.0m tok/s: 7940262 +2440/20000 train_loss: 2.5050 train_time: 4.0m tok/s: 7938922 +2441/20000 train_loss: 2.5389 train_time: 4.0m tok/s: 7937599 +2442/20000 train_loss: 2.5173 train_time: 4.0m tok/s: 7936280 +2443/20000 train_loss: 2.6372 train_time: 4.0m tok/s: 7934920 +2444/20000 train_loss: 2.5840 train_time: 4.0m tok/s: 7933613 +2445/20000 train_loss: 2.5051 train_time: 4.0m tok/s: 7932265 +2446/20000 train_loss: 2.7138 train_time: 4.0m tok/s: 7930910 +2447/20000 train_loss: 2.7230 train_time: 4.0m tok/s: 7929570 +2448/20000 train_loss: 2.5842 train_time: 4.0m tok/s: 7928250 +2449/20000 train_loss: 2.4981 train_time: 4.0m tok/s: 7926894 +2450/20000 train_loss: 2.5508 train_time: 4.1m tok/s: 7925554 +2451/20000 train_loss: 2.5417 train_time: 4.1m tok/s: 7924270 +2452/20000 train_loss: 2.5665 train_time: 4.1m tok/s: 7922896 +2453/20000 train_loss: 2.4379 train_time: 4.1m tok/s: 7921562 +2454/20000 train_loss: 2.4481 train_time: 4.1m tok/s: 7920249 +2455/20000 train_loss: 2.5500 train_time: 4.1m tok/s: 7918857 +2456/20000 train_loss: 2.5582 train_time: 4.1m tok/s: 7917555 +2457/20000 train_loss: 2.6213 train_time: 4.1m tok/s: 7916274 +2458/20000 train_loss: 2.7157 train_time: 4.1m tok/s: 7914972 +2459/20000 train_loss: 2.5600 train_time: 4.1m tok/s: 7913714 +2460/20000 train_loss: 2.5860 train_time: 4.1m tok/s: 7912380 +2461/20000 train_loss: 2.6626 train_time: 4.1m tok/s: 7911065 +2462/20000 train_loss: 2.5418 train_time: 4.1m tok/s: 7909771 +2463/20000 train_loss: 2.6028 train_time: 4.1m tok/s: 7908450 +2464/20000 train_loss: 2.5220 train_time: 4.1m tok/s: 7907099 +2465/20000 train_loss: 2.6131 train_time: 4.1m tok/s: 7905836 +2466/20000 train_loss: 2.3269 train_time: 4.1m tok/s: 7904529 +2467/20000 train_loss: 2.5911 train_time: 4.1m tok/s: 7903200 +2468/20000 train_loss: 2.4780 train_time: 4.1m tok/s: 7901843 +2469/20000 train_loss: 2.5797 train_time: 4.1m tok/s: 7900520 +2470/20000 train_loss: 2.6088 train_time: 4.1m tok/s: 7899251 +2471/20000 train_loss: 2.5813 train_time: 4.1m tok/s: 7897978 +2472/20000 train_loss: 2.6836 train_time: 4.1m tok/s: 7896677 +2473/20000 train_loss: 2.5528 train_time: 4.1m tok/s: 7895370 +2474/20000 train_loss: 2.8426 train_time: 4.1m tok/s: 7894026 +2475/20000 train_loss: 2.6921 train_time: 4.1m tok/s: 7892764 +2476/20000 train_loss: 2.5713 train_time: 4.1m tok/s: 7891456 +2477/20000 train_loss: 2.4964 train_time: 4.1m tok/s: 7890177 +2478/20000 train_loss: 2.5296 train_time: 4.1m tok/s: 7888883 +2479/20000 train_loss: 2.6447 train_time: 4.1m tok/s: 7887620 +2480/20000 train_loss: 2.5777 train_time: 4.1m tok/s: 7886357 +2481/20000 train_loss: 2.4659 train_time: 4.1m tok/s: 7885071 +2482/20000 train_loss: 2.6205 train_time: 4.1m tok/s: 7883780 +2483/20000 train_loss: 2.5710 train_time: 4.1m tok/s: 7882569 +2484/20000 train_loss: 2.5492 train_time: 4.1m tok/s: 7881243 +2485/20000 train_loss: 2.4896 train_time: 4.1m tok/s: 7879938 +2486/20000 train_loss: 2.5771 train_time: 4.1m tok/s: 7878630 +2487/20000 train_loss: 2.6019 train_time: 4.1m tok/s: 7877391 +2488/20000 train_loss: 2.5726 train_time: 4.1m tok/s: 7876126 +2489/20000 train_loss: 2.4657 train_time: 4.1m tok/s: 7874837 +2490/20000 train_loss: 2.6323 train_time: 4.1m tok/s: 7873601 +2491/20000 train_loss: 2.5807 train_time: 4.1m tok/s: 7872317 +2492/20000 train_loss: 2.5439 train_time: 4.1m tok/s: 7870999 +2493/20000 train_loss: 2.5732 train_time: 4.2m tok/s: 7869723 +2494/20000 train_loss: 2.4490 train_time: 4.2m tok/s: 7868460 +2495/20000 train_loss: 2.4923 train_time: 4.2m tok/s: 7867173 +2496/20000 train_loss: 2.6058 train_time: 4.2m tok/s: 7865971 +2497/20000 train_loss: 2.5675 train_time: 4.2m tok/s: 7864728 +2498/20000 train_loss: 2.5576 train_time: 4.2m tok/s: 7863478 +2499/20000 train_loss: 2.6020 train_time: 4.2m tok/s: 7862215 +2500/20000 train_loss: 2.6705 train_time: 4.2m tok/s: 7860950 +2501/20000 train_loss: 2.5620 train_time: 4.2m tok/s: 7859675 +2502/20000 train_loss: 2.4124 train_time: 4.2m tok/s: 7858456 +2503/20000 train_loss: 2.5204 train_time: 4.2m tok/s: 7857222 +2504/20000 train_loss: 2.6174 train_time: 4.2m tok/s: 7855910 +2505/20000 train_loss: 2.5201 train_time: 4.2m tok/s: 7854668 +2506/20000 train_loss: 2.5713 train_time: 4.2m tok/s: 7853378 +2507/20000 train_loss: 2.4316 train_time: 4.2m tok/s: 7852140 +2508/20000 train_loss: 2.5895 train_time: 4.2m tok/s: 7850892 +2509/20000 train_loss: 2.6257 train_time: 4.2m tok/s: 7849669 +2510/20000 train_loss: 2.5246 train_time: 4.2m tok/s: 7848394 +2511/20000 train_loss: 2.5743 train_time: 4.2m tok/s: 7847035 +2512/20000 train_loss: 2.6195 train_time: 4.2m tok/s: 7845800 +2513/20000 train_loss: 2.4798 train_time: 4.2m tok/s: 7844607 +2514/20000 train_loss: 2.5702 train_time: 4.2m tok/s: 7843381 +2515/20000 train_loss: 2.6074 train_time: 4.2m tok/s: 7842162 +2516/20000 train_loss: 2.5984 train_time: 4.2m tok/s: 7840882 +2517/20000 train_loss: 2.4406 train_time: 4.2m tok/s: 7839677 +2518/20000 train_loss: 2.5454 train_time: 4.2m tok/s: 7838449 +2519/20000 train_loss: 2.5808 train_time: 4.2m tok/s: 7837193 +2520/20000 train_loss: 2.5346 train_time: 4.2m tok/s: 7836011 +2521/20000 train_loss: 2.6150 train_time: 4.2m tok/s: 7834838 +2522/20000 train_loss: 2.6140 train_time: 4.2m tok/s: 7833634 +2523/20000 train_loss: 2.5583 train_time: 4.2m tok/s: 7832429 +2524/20000 train_loss: 2.5373 train_time: 4.2m tok/s: 7831223 +2525/20000 train_loss: 2.4461 train_time: 4.2m tok/s: 7830025 +2526/20000 train_loss: 2.5357 train_time: 4.2m tok/s: 7828793 +2527/20000 train_loss: 2.5519 train_time: 4.2m tok/s: 7827501 +2528/20000 train_loss: 2.6009 train_time: 4.2m tok/s: 7826299 +2529/20000 train_loss: 2.5582 train_time: 4.2m tok/s: 7825089 +2530/20000 train_loss: 2.4847 train_time: 4.2m tok/s: 7823872 +2531/20000 train_loss: 2.4019 train_time: 4.2m tok/s: 7822675 +2532/20000 train_loss: 2.5244 train_time: 4.2m tok/s: 7821345 +2533/20000 train_loss: 2.5397 train_time: 4.2m tok/s: 7820132 +2534/20000 train_loss: 2.4583 train_time: 4.2m tok/s: 7818957 +2535/20000 train_loss: 2.5752 train_time: 4.3m tok/s: 7817761 +2536/20000 train_loss: 2.5691 train_time: 4.3m tok/s: 7816556 +2537/20000 train_loss: 2.4735 train_time: 4.3m tok/s: 7815344 +2538/20000 train_loss: 2.5922 train_time: 4.3m tok/s: 7814161 +2539/20000 train_loss: 2.7870 train_time: 4.3m tok/s: 7812994 +2540/20000 train_loss: 2.5413 train_time: 4.3m tok/s: 7811775 +2541/20000 train_loss: 2.5330 train_time: 4.3m tok/s: 7810622 +2542/20000 train_loss: 2.5476 train_time: 4.3m tok/s: 7809366 +2543/20000 train_loss: 2.6343 train_time: 4.3m tok/s: 7808157 +2544/20000 train_loss: 2.6570 train_time: 4.3m tok/s: 7806946 +2545/20000 train_loss: 2.4918 train_time: 4.3m tok/s: 7805722 +2546/20000 train_loss: 2.5387 train_time: 4.3m tok/s: 7804532 +2547/20000 train_loss: 2.5105 train_time: 4.3m tok/s: 7803382 +2548/20000 train_loss: 2.7881 train_time: 4.3m tok/s: 7802182 +2549/20000 train_loss: 2.5493 train_time: 4.3m tok/s: 7800995 +2550/20000 train_loss: 2.8252 train_time: 4.3m tok/s: 7799849 +2551/20000 train_loss: 2.5305 train_time: 4.3m tok/s: 7798682 +2552/20000 train_loss: 2.7463 train_time: 4.3m tok/s: 7797479 +2553/20000 train_loss: 2.5832 train_time: 4.3m tok/s: 7796192 +2554/20000 train_loss: 2.4617 train_time: 4.3m tok/s: 7795069 +2555/20000 train_loss: 2.5515 train_time: 4.3m tok/s: 7793932 +2556/20000 train_loss: 2.5546 train_time: 4.3m tok/s: 7792730 +2557/20000 train_loss: 2.4996 train_time: 4.3m tok/s: 7791572 +2558/20000 train_loss: 2.6084 train_time: 4.3m tok/s: 7790394 +2559/20000 train_loss: 2.4337 train_time: 4.3m tok/s: 7789195 +2560/20000 train_loss: 2.4442 train_time: 4.3m tok/s: 7787957 +2561/20000 train_loss: 2.5290 train_time: 4.3m tok/s: 7786804 +2562/20000 train_loss: 2.4698 train_time: 4.3m tok/s: 7785709 +2563/20000 train_loss: 2.4392 train_time: 4.3m tok/s: 7784496 +2564/20000 train_loss: 2.4310 train_time: 4.3m tok/s: 7783309 +2565/20000 train_loss: 2.5385 train_time: 4.3m tok/s: 7782106 +2566/20000 train_loss: 2.5647 train_time: 4.3m tok/s: 7780999 +2567/20000 train_loss: 2.5855 train_time: 4.3m tok/s: 7779852 +2568/20000 train_loss: 2.6146 train_time: 4.3m tok/s: 7778694 +2569/20000 train_loss: 2.6568 train_time: 4.3m tok/s: 7777564 +2570/20000 train_loss: 2.5494 train_time: 4.3m tok/s: 7776413 +2571/20000 train_loss: 2.6068 train_time: 4.3m tok/s: 7775282 +2572/20000 train_loss: 2.4359 train_time: 4.3m tok/s: 7774100 +2573/20000 train_loss: 2.4941 train_time: 4.3m tok/s: 7772899 +2574/20000 train_loss: 2.6808 train_time: 4.3m tok/s: 7771728 +2575/20000 train_loss: 2.5845 train_time: 4.3m tok/s: 7770574 +2576/20000 train_loss: 2.5079 train_time: 4.3m tok/s: 7769451 +2577/20000 train_loss: 2.4747 train_time: 4.3m tok/s: 7768306 +2578/20000 train_loss: 2.4417 train_time: 4.4m tok/s: 7767157 +2579/20000 train_loss: 2.5394 train_time: 4.4m tok/s: 7765993 +2580/20000 train_loss: 2.4621 train_time: 4.4m tok/s: 7764825 +2581/20000 train_loss: 2.3444 train_time: 4.4m tok/s: 7763625 +2582/20000 train_loss: 2.5593 train_time: 4.4m tok/s: 7762412 +2583/20000 train_loss: 2.5878 train_time: 4.4m tok/s: 7761326 +2584/20000 train_loss: 2.5856 train_time: 4.4m tok/s: 7760198 +2585/20000 train_loss: 2.4826 train_time: 4.4m tok/s: 7759052 +2586/20000 train_loss: 2.5101 train_time: 4.4m tok/s: 7757931 +2587/20000 train_loss: 2.5421 train_time: 4.4m tok/s: 7756778 +2588/20000 train_loss: 2.6055 train_time: 4.4m tok/s: 7755610 +2589/20000 train_loss: 2.4782 train_time: 4.4m tok/s: 7754475 +2590/20000 train_loss: 2.5138 train_time: 4.4m tok/s: 7753352 +2591/20000 train_loss: 2.4714 train_time: 4.4m tok/s: 7752221 +2592/20000 train_loss: 2.4312 train_time: 4.4m tok/s: 7751070 +2593/20000 train_loss: 2.5056 train_time: 4.4m tok/s: 7749921 +2594/20000 train_loss: 2.4257 train_time: 4.4m tok/s: 7748781 +2595/20000 train_loss: 2.6065 train_time: 4.4m tok/s: 7747643 +2596/20000 train_loss: 3.0948 train_time: 4.4m tok/s: 7746509 +2597/20000 train_loss: 2.4059 train_time: 4.4m tok/s: 7745389 +2598/20000 train_loss: 2.5067 train_time: 4.4m tok/s: 7744301 +2599/20000 train_loss: 2.6161 train_time: 4.4m tok/s: 7743214 +2600/20000 train_loss: 2.5765 train_time: 4.4m tok/s: 7742035 +2601/20000 train_loss: 2.4978 train_time: 4.4m tok/s: 7740890 +2602/20000 train_loss: 2.7667 train_time: 4.4m tok/s: 7739771 +2603/20000 train_loss: 2.5190 train_time: 4.4m tok/s: 7738597 +2604/20000 train_loss: 2.5216 train_time: 4.4m tok/s: 7737531 +2605/20000 train_loss: 2.6637 train_time: 4.4m tok/s: 7736324 +2606/20000 train_loss: 2.4122 train_time: 4.4m tok/s: 7735263 +2607/20000 train_loss: 2.4749 train_time: 4.4m tok/s: 7734096 +2608/20000 train_loss: 2.5258 train_time: 4.4m tok/s: 7733023 +2609/20000 train_loss: 2.4871 train_time: 4.4m tok/s: 7731883 +2610/20000 train_loss: 2.4762 train_time: 4.4m tok/s: 7730768 +2611/20000 train_loss: 2.6121 train_time: 4.4m tok/s: 7729697 +2612/20000 train_loss: 2.6791 train_time: 4.4m tok/s: 7728536 +2613/20000 train_loss: 2.5561 train_time: 4.4m tok/s: 7727442 +2614/20000 train_loss: 2.5422 train_time: 4.4m tok/s: 7726363 +2615/20000 train_loss: 2.6696 train_time: 4.4m tok/s: 7725266 +2616/20000 train_loss: 2.5790 train_time: 4.4m tok/s: 7724151 +2617/20000 train_loss: 2.5676 train_time: 4.4m tok/s: 7723025 +2618/20000 train_loss: 2.5650 train_time: 4.4m tok/s: 7721921 +2619/20000 train_loss: 2.4551 train_time: 4.4m tok/s: 7720834 +2620/20000 train_loss: 2.4235 train_time: 4.4m tok/s: 7719712 +2621/20000 train_loss: 2.5023 train_time: 4.5m tok/s: 7718649 +2622/20000 train_loss: 2.4919 train_time: 4.5m tok/s: 7717565 +2623/20000 train_loss: 2.5056 train_time: 4.5m tok/s: 7716417 +2624/20000 train_loss: 2.3422 train_time: 4.5m tok/s: 7715295 +2625/20000 train_loss: 2.6304 train_time: 4.5m tok/s: 7714227 +2626/20000 train_loss: 2.3545 train_time: 4.5m tok/s: 7713133 +2627/20000 train_loss: 2.4577 train_time: 4.5m tok/s: 7712035 +2628/20000 train_loss: 2.6580 train_time: 4.5m tok/s: 7710986 +2629/20000 train_loss: 2.5559 train_time: 4.5m tok/s: 7709905 +2630/20000 train_loss: 2.6164 train_time: 4.5m tok/s: 7708788 +2631/20000 train_loss: 2.5909 train_time: 4.5m tok/s: 7707673 +2632/20000 train_loss: 2.6305 train_time: 4.5m tok/s: 7706610 +2633/20000 train_loss: 2.4780 train_time: 4.5m tok/s: 7705496 +2634/20000 train_loss: 2.5648 train_time: 4.5m tok/s: 7704403 +2635/20000 train_loss: 2.4915 train_time: 4.5m tok/s: 7703336 +2636/20000 train_loss: 2.5435 train_time: 4.5m tok/s: 7702264 +2637/20000 train_loss: 2.4495 train_time: 4.5m tok/s: 7701186 +2638/20000 train_loss: 2.5394 train_time: 4.5m tok/s: 7700077 +2639/20000 train_loss: 2.2718 train_time: 4.5m tok/s: 7698963 +2640/20000 train_loss: 2.5405 train_time: 4.5m tok/s: 7697884 +2641/20000 train_loss: 2.5931 train_time: 4.5m tok/s: 7696784 +2642/20000 train_loss: 2.6497 train_time: 4.5m tok/s: 7695764 +2643/20000 train_loss: 2.5579 train_time: 4.5m tok/s: 7694601 +2644/20000 train_loss: 2.5777 train_time: 4.5m tok/s: 7693570 +2645/20000 train_loss: 2.5347 train_time: 4.5m tok/s: 7692497 +2646/20000 train_loss: 2.5665 train_time: 4.5m tok/s: 7691382 +2647/20000 train_loss: 2.6571 train_time: 4.5m tok/s: 7690337 +2648/20000 train_loss: 2.5201 train_time: 4.5m tok/s: 7689257 +2649/20000 train_loss: 2.5411 train_time: 4.5m tok/s: 7688189 +2650/20000 train_loss: 2.4645 train_time: 4.5m tok/s: 7687164 +2651/20000 train_loss: 2.4307 train_time: 4.5m tok/s: 7686092 +2652/20000 train_loss: 2.3529 train_time: 4.5m tok/s: 7685003 +2653/20000 train_loss: 2.6536 train_time: 4.5m tok/s: 7683878 +2654/20000 train_loss: 2.2672 train_time: 4.5m tok/s: 7682727 +2655/20000 train_loss: 2.9463 train_time: 4.5m tok/s: 7681600 +2656/20000 train_loss: 2.4578 train_time: 4.5m tok/s: 7680529 +2657/20000 train_loss: 2.4378 train_time: 4.5m tok/s: 7679509 +2658/20000 train_loss: 2.6176 train_time: 4.5m tok/s: 7678398 +2659/20000 train_loss: 2.5292 train_time: 4.5m tok/s: 7677331 +2660/20000 train_loss: 2.5925 train_time: 4.5m tok/s: 7676278 +2661/20000 train_loss: 2.5428 train_time: 4.5m tok/s: 7675248 +2662/20000 train_loss: 2.3326 train_time: 4.5m tok/s: 7674189 +2663/20000 train_loss: 2.7200 train_time: 4.5m tok/s: 7673141 +2664/20000 train_loss: 2.5368 train_time: 4.6m tok/s: 7672049 +2665/20000 train_loss: 2.4919 train_time: 4.6m tok/s: 7671003 +2666/20000 train_loss: 2.4038 train_time: 4.6m tok/s: 7669979 +2667/20000 train_loss: 2.2918 train_time: 4.6m tok/s: 7668952 +2668/20000 train_loss: 2.5726 train_time: 4.6m tok/s: 7667901 +2669/20000 train_loss: 2.4550 train_time: 4.6m tok/s: 7666868 +2670/20000 train_loss: 2.5660 train_time: 4.6m tok/s: 7665875 +2671/20000 train_loss: 2.6702 train_time: 4.6m tok/s: 7664864 +2672/20000 train_loss: 2.6134 train_time: 4.6m tok/s: 7663849 +2673/20000 train_loss: 2.5672 train_time: 4.6m tok/s: 7662865 +2674/20000 train_loss: 2.6238 train_time: 4.6m tok/s: 7661831 +2675/20000 train_loss: 2.5295 train_time: 4.6m tok/s: 7660813 +2676/20000 train_loss: 2.5196 train_time: 4.6m tok/s: 7659750 +2677/20000 train_loss: 2.4356 train_time: 4.6m tok/s: 7658709 +2678/20000 train_loss: 2.4755 train_time: 4.6m tok/s: 7657569 +2679/20000 train_loss: 2.3187 train_time: 4.6m tok/s: 7656554 +2680/20000 train_loss: 2.4332 train_time: 4.6m tok/s: 7655565 +2681/20000 train_loss: 2.4468 train_time: 4.6m tok/s: 7654552 +2682/20000 train_loss: 2.5389 train_time: 4.6m tok/s: 7653539 +2683/20000 train_loss: 2.4657 train_time: 4.6m tok/s: 7652527 +2684/20000 train_loss: 2.4773 train_time: 4.6m tok/s: 7651421 +2685/20000 train_loss: 2.8083 train_time: 4.6m tok/s: 7650375 +2686/20000 train_loss: 2.5242 train_time: 4.6m tok/s: 7649385 +2687/20000 train_loss: 2.5792 train_time: 4.6m tok/s: 7648413 +2688/20000 train_loss: 2.4538 train_time: 4.6m tok/s: 7647390 +2689/20000 train_loss: 2.5512 train_time: 4.6m tok/s: 7646358 +2690/20000 train_loss: 2.5487 train_time: 4.6m tok/s: 7645300 +2691/20000 train_loss: 2.5274 train_time: 4.6m tok/s: 7644297 +2692/20000 train_loss: 2.4691 train_time: 4.6m tok/s: 7643256 +2693/20000 train_loss: 2.4481 train_time: 4.6m tok/s: 7642177 +2694/20000 train_loss: 2.4732 train_time: 4.6m tok/s: 7641163 +2695/20000 train_loss: 2.5771 train_time: 4.6m tok/s: 7640130 +2696/20000 train_loss: 2.4740 train_time: 4.6m tok/s: 7639143 +2697/20000 train_loss: 2.5574 train_time: 4.6m tok/s: 7638126 +2698/20000 train_loss: 2.5726 train_time: 4.6m tok/s: 7637099 +2699/20000 train_loss: 2.4777 train_time: 4.6m tok/s: 7636112 +2700/20000 train_loss: 2.4564 train_time: 4.6m tok/s: 7635099 +2701/20000 train_loss: 2.5835 train_time: 4.6m tok/s: 7634101 +2702/20000 train_loss: 2.3716 train_time: 4.6m tok/s: 7633085 +2703/20000 train_loss: 2.5560 train_time: 4.6m tok/s: 7632070 +2704/20000 train_loss: 2.4481 train_time: 4.6m tok/s: 7631014 +2705/20000 train_loss: 2.5073 train_time: 4.6m tok/s: 7630018 +2706/20000 train_loss: 2.5115 train_time: 4.6m tok/s: 7629047 +2707/20000 train_loss: 2.6176 train_time: 4.7m tok/s: 7627997 +2708/20000 train_loss: 2.6536 train_time: 4.7m tok/s: 7626937 +2709/20000 train_loss: 2.5704 train_time: 4.7m tok/s: 7625964 +2710/20000 train_loss: 2.6256 train_time: 4.7m tok/s: 7624911 +2711/20000 train_loss: 2.7039 train_time: 4.7m tok/s: 7623877 +2712/20000 train_loss: 2.4920 train_time: 4.7m tok/s: 7622924 +2713/20000 train_loss: 2.5686 train_time: 4.7m tok/s: 7621884 +2714/20000 train_loss: 2.6242 train_time: 4.7m tok/s: 7620898 +2715/20000 train_loss: 2.4536 train_time: 4.7m tok/s: 7619941 +2716/20000 train_loss: 2.4187 train_time: 4.7m tok/s: 7618960 +2717/20000 train_loss: 2.5260 train_time: 4.7m tok/s: 7617978 +2718/20000 train_loss: 2.4493 train_time: 4.7m tok/s: 7616954 +2719/20000 train_loss: 2.4409 train_time: 4.7m tok/s: 7615980 +2720/20000 train_loss: 2.5634 train_time: 4.7m tok/s: 7614913 +2721/20000 train_loss: 2.4095 train_time: 4.7m tok/s: 7613866 +2722/20000 train_loss: 2.4847 train_time: 4.7m tok/s: 7612876 +2723/20000 train_loss: 2.4610 train_time: 4.7m tok/s: 7611924 +2724/20000 train_loss: 2.5216 train_time: 4.7m tok/s: 7610874 +2725/20000 train_loss: 2.6384 train_time: 4.7m tok/s: 7609898 +2726/20000 train_loss: 2.5066 train_time: 4.7m tok/s: 7608923 +2727/20000 train_loss: 2.5617 train_time: 4.7m tok/s: 7607963 +2728/20000 train_loss: 2.8963 train_time: 4.7m tok/s: 7607002 +2729/20000 train_loss: 2.6999 train_time: 4.7m tok/s: 7606016 +2730/20000 train_loss: 2.5326 train_time: 4.7m tok/s: 7605039 +2731/20000 train_loss: 2.6113 train_time: 4.7m tok/s: 7604040 +2732/20000 train_loss: 2.6245 train_time: 4.7m tok/s: 7603025 +2733/20000 train_loss: 2.5434 train_time: 4.7m tok/s: 7602011 +2734/20000 train_loss: 2.6167 train_time: 4.7m tok/s: 7601061 +2735/20000 train_loss: 2.4265 train_time: 4.7m tok/s: 7600103 +2736/20000 train_loss: 2.5524 train_time: 4.7m tok/s: 7599146 +2737/20000 train_loss: 2.4734 train_time: 4.7m tok/s: 7598137 +2738/20000 train_loss: 2.4146 train_time: 4.7m tok/s: 7597186 +2739/20000 train_loss: 2.5373 train_time: 4.7m tok/s: 7596203 +2740/20000 train_loss: 2.5545 train_time: 4.7m tok/s: 7595216 +2741/20000 train_loss: 2.5223 train_time: 4.7m tok/s: 7594251 +2742/20000 train_loss: 2.4775 train_time: 4.7m tok/s: 7593275 +2743/20000 train_loss: 2.5809 train_time: 4.7m tok/s: 7592264 +2744/20000 train_loss: 2.5880 train_time: 4.7m tok/s: 7591325 +2745/20000 train_loss: 2.6478 train_time: 4.7m tok/s: 7590390 +2746/20000 train_loss: 2.6168 train_time: 4.7m tok/s: 7589390 +2747/20000 train_loss: 2.4400 train_time: 4.7m tok/s: 7588418 +2748/20000 train_loss: 2.5137 train_time: 4.7m tok/s: 7587429 +2749/20000 train_loss: 2.5853 train_time: 4.7m tok/s: 7586506 +2750/20000 train_loss: 2.6463 train_time: 4.8m tok/s: 7585548 +2751/20000 train_loss: 2.6250 train_time: 4.8m tok/s: 7584574 +2752/20000 train_loss: 2.5016 train_time: 4.8m tok/s: 7583612 +2753/20000 train_loss: 2.4889 train_time: 4.8m tok/s: 7582659 +2754/20000 train_loss: 2.4544 train_time: 4.8m tok/s: 7581683 +2755/20000 train_loss: 2.4963 train_time: 4.8m tok/s: 7580748 +2756/20000 train_loss: 2.4780 train_time: 4.8m tok/s: 7579809 +2757/20000 train_loss: 2.4529 train_time: 4.8m tok/s: 7578826 +2758/20000 train_loss: 2.5935 train_time: 4.8m tok/s: 7577832 +2759/20000 train_loss: 2.4749 train_time: 4.8m tok/s: 7576927 +2760/20000 train_loss: 2.4057 train_time: 4.8m tok/s: 7575953 +2761/20000 train_loss: 2.6409 train_time: 4.8m tok/s: 7574947 +2762/20000 train_loss: 2.5392 train_time: 4.8m tok/s: 7574018 +2763/20000 train_loss: 2.5901 train_time: 4.8m tok/s: 7573067 +2764/20000 train_loss: 2.5649 train_time: 4.8m tok/s: 7572117 +2765/20000 train_loss: 2.5045 train_time: 4.8m tok/s: 7571189 +2766/20000 train_loss: 2.4603 train_time: 4.8m tok/s: 7570272 +2767/20000 train_loss: 2.5096 train_time: 4.8m tok/s: 7569299 +2768/20000 train_loss: 2.6387 train_time: 4.8m tok/s: 7568337 +2769/20000 train_loss: 2.5583 train_time: 4.8m tok/s: 7567304 +2770/20000 train_loss: 2.7016 train_time: 4.8m tok/s: 7566377 +2771/20000 train_loss: 2.5050 train_time: 4.8m tok/s: 7565395 +2772/20000 train_loss: 2.5277 train_time: 4.8m tok/s: 7564463 +2773/20000 train_loss: 2.5092 train_time: 4.8m tok/s: 7563500 +2774/20000 train_loss: 2.4683 train_time: 4.8m tok/s: 7562586 +2775/20000 train_loss: 2.4689 train_time: 4.8m tok/s: 7561666 +2776/20000 train_loss: 2.4155 train_time: 4.8m tok/s: 7560765 +2777/20000 train_loss: 2.4803 train_time: 4.8m tok/s: 7559798 +2778/20000 train_loss: 2.5297 train_time: 4.8m tok/s: 7558868 +2779/20000 train_loss: 2.3965 train_time: 4.8m tok/s: 7557843 +2780/20000 train_loss: 2.6587 train_time: 4.8m tok/s: 7556936 +2781/20000 train_loss: 2.6379 train_time: 4.8m tok/s: 7556016 +2782/20000 train_loss: 2.4596 train_time: 4.8m tok/s: 7555106 +2783/20000 train_loss: 2.6210 train_time: 4.8m tok/s: 7554185 +2784/20000 train_loss: 2.6457 train_time: 4.8m tok/s: 7553253 +2785/20000 train_loss: 2.5179 train_time: 4.8m tok/s: 7552343 +2786/20000 train_loss: 2.5174 train_time: 4.8m tok/s: 7551402 +2787/20000 train_loss: 2.5814 train_time: 4.8m tok/s: 7550445 +2788/20000 train_loss: 2.4237 train_time: 4.8m tok/s: 7549501 +2789/20000 train_loss: 2.5436 train_time: 4.8m tok/s: 7548602 +2790/20000 train_loss: 2.6068 train_time: 4.8m tok/s: 7547650 +2791/20000 train_loss: 2.3633 train_time: 4.8m tok/s: 7546682 +2792/20000 train_loss: 2.5163 train_time: 4.8m tok/s: 7545762 +2793/20000 train_loss: 2.5418 train_time: 4.9m tok/s: 7544808 +2794/20000 train_loss: 2.5052 train_time: 4.9m tok/s: 7543882 +2795/20000 train_loss: 2.3742 train_time: 4.9m tok/s: 7542992 +2796/20000 train_loss: 2.5125 train_time: 4.9m tok/s: 7542079 +2797/20000 train_loss: 2.5443 train_time: 4.9m tok/s: 7541129 +2798/20000 train_loss: 2.5160 train_time: 4.9m tok/s: 7540196 +2799/20000 train_loss: 2.7790 train_time: 4.9m tok/s: 7539273 +2800/20000 train_loss: 2.6263 train_time: 4.9m tok/s: 7538386 +2801/20000 train_loss: 2.5010 train_time: 4.9m tok/s: 7537493 +2802/20000 train_loss: 2.5134 train_time: 4.9m tok/s: 7536563 +2803/20000 train_loss: 2.5884 train_time: 4.9m tok/s: 7535592 +2804/20000 train_loss: 2.6324 train_time: 4.9m tok/s: 7534671 +2805/20000 train_loss: 2.4778 train_time: 4.9m tok/s: 7533771 +2806/20000 train_loss: 2.5043 train_time: 4.9m tok/s: 7532849 +2807/20000 train_loss: 2.6619 train_time: 4.9m tok/s: 7531918 +2808/20000 train_loss: 2.5852 train_time: 4.9m tok/s: 7530923 +2809/20000 train_loss: 2.4423 train_time: 4.9m tok/s: 7530031 +2810/20000 train_loss: 2.5246 train_time: 4.9m tok/s: 7529117 +2811/20000 train_loss: 2.5925 train_time: 4.9m tok/s: 7528188 +2812/20000 train_loss: 2.5944 train_time: 4.9m tok/s: 7527321 +2813/20000 train_loss: 2.3919 train_time: 4.9m tok/s: 7526394 +2814/20000 train_loss: 2.5354 train_time: 4.9m tok/s: 7525506 +2815/20000 train_loss: 2.6710 train_time: 4.9m tok/s: 7524621 +2816/20000 train_loss: 2.6328 train_time: 4.9m tok/s: 7523698 +2817/20000 train_loss: 2.5386 train_time: 4.9m tok/s: 7522786 +2818/20000 train_loss: 2.5612 train_time: 4.9m tok/s: 7521919 +2819/20000 train_loss: 2.4722 train_time: 4.9m tok/s: 7521060 +2820/20000 train_loss: 2.4956 train_time: 4.9m tok/s: 7520146 +2821/20000 train_loss: 2.4386 train_time: 4.9m tok/s: 7519205 +2822/20000 train_loss: 2.7896 train_time: 4.9m tok/s: 7518230 +2823/20000 train_loss: 2.6126 train_time: 4.9m tok/s: 7517337 +2824/20000 train_loss: 2.6712 train_time: 4.9m tok/s: 7516445 +2825/20000 train_loss: 2.4765 train_time: 4.9m tok/s: 7515535 +2826/20000 train_loss: 2.5638 train_time: 4.9m tok/s: 7514666 +2827/20000 train_loss: 2.4441 train_time: 4.9m tok/s: 7513801 +2828/20000 train_loss: 2.6208 train_time: 4.9m tok/s: 7512878 +2829/20000 train_loss: 2.4087 train_time: 4.9m tok/s: 7512011 +2830/20000 train_loss: 2.5036 train_time: 4.9m tok/s: 7511094 +2831/20000 train_loss: 2.7149 train_time: 4.9m tok/s: 7510244 +2832/20000 train_loss: 2.5279 train_time: 4.9m tok/s: 7509363 +2833/20000 train_loss: 2.6864 train_time: 4.9m tok/s: 7508464 +2834/20000 train_loss: 2.6328 train_time: 4.9m tok/s: 7507555 +2835/20000 train_loss: 2.5888 train_time: 5.0m tok/s: 7506658 +2836/20000 train_loss: 2.4971 train_time: 5.0m tok/s: 7505784 +2837/20000 train_loss: 2.5740 train_time: 5.0m tok/s: 7504891 +2838/20000 train_loss: 2.5450 train_time: 5.0m tok/s: 7504010 +2839/20000 train_loss: 2.5233 train_time: 5.0m tok/s: 7503109 +2840/20000 train_loss: 2.6090 train_time: 5.0m tok/s: 7502207 +2841/20000 train_loss: 2.5473 train_time: 5.0m tok/s: 7501332 +2842/20000 train_loss: 2.6238 train_time: 5.0m tok/s: 7500424 +2843/20000 train_loss: 2.4416 train_time: 5.0m tok/s: 7499519 +2844/20000 train_loss: 2.4360 train_time: 5.0m tok/s: 7498656 +2845/20000 train_loss: 2.5188 train_time: 5.0m tok/s: 7497787 +2846/20000 train_loss: 2.3889 train_time: 5.0m tok/s: 7496928 +2847/20000 train_loss: 2.4257 train_time: 5.0m tok/s: 7496002 +2848/20000 train_loss: 2.5580 train_time: 5.0m tok/s: 7495112 +2849/20000 train_loss: 2.6248 train_time: 5.0m tok/s: 7494254 +2850/20000 train_loss: 2.5764 train_time: 5.0m tok/s: 7493339 +2851/20000 train_loss: 2.7756 train_time: 5.0m tok/s: 7492468 +2852/20000 train_loss: 2.4613 train_time: 5.0m tok/s: 7491586 +2853/20000 train_loss: 2.5322 train_time: 5.0m tok/s: 7490694 +2854/20000 train_loss: 2.4541 train_time: 5.0m tok/s: 7489824 +2855/20000 train_loss: 2.6349 train_time: 5.0m tok/s: 7488938 +2856/20000 train_loss: 2.4658 train_time: 5.0m tok/s: 7488084 +2857/20000 train_loss: 2.5335 train_time: 5.0m tok/s: 7487257 +2858/20000 train_loss: 2.5089 train_time: 5.0m tok/s: 7486377 +2859/20000 train_loss: 3.1496 train_time: 5.0m tok/s: 7485454 +2860/20000 train_loss: 2.4905 train_time: 5.0m tok/s: 7484597 +2861/20000 train_loss: 2.4960 train_time: 5.0m tok/s: 7483765 +2862/20000 train_loss: 2.5102 train_time: 5.0m tok/s: 7482892 +2863/20000 train_loss: 2.3497 train_time: 5.0m tok/s: 7482028 +2864/20000 train_loss: 2.3803 train_time: 5.0m tok/s: 7481160 +2865/20000 train_loss: 2.6113 train_time: 5.0m tok/s: 7480287 +2866/20000 train_loss: 2.5235 train_time: 5.0m tok/s: 7479440 +2867/20000 train_loss: 2.3872 train_time: 5.0m tok/s: 7478578 +2868/20000 train_loss: 2.4298 train_time: 5.0m tok/s: 7477714 +2869/20000 train_loss: 2.5582 train_time: 5.0m tok/s: 7476879 +2870/20000 train_loss: 2.6493 train_time: 5.0m tok/s: 7476020 +2871/20000 train_loss: 2.4562 train_time: 5.0m tok/s: 7475191 +2872/20000 train_loss: 3.0336 train_time: 5.0m tok/s: 7474271 +2873/20000 train_loss: 2.4688 train_time: 5.0m tok/s: 7473294 +2874/20000 train_loss: 2.6097 train_time: 5.0m tok/s: 7472440 +2875/20000 train_loss: 2.5315 train_time: 5.0m tok/s: 7471623 +2876/20000 train_loss: 2.5860 train_time: 5.0m tok/s: 7470793 +2877/20000 train_loss: 2.5108 train_time: 5.0m tok/s: 7469915 +2878/20000 train_loss: 2.4971 train_time: 5.1m tok/s: 7469090 +2879/20000 train_loss: 2.5354 train_time: 5.1m tok/s: 7468257 +2880/20000 train_loss: 2.5404 train_time: 5.1m tok/s: 7467403 +2881/20000 train_loss: 2.5996 train_time: 5.1m tok/s: 7466593 +2882/20000 train_loss: 2.6848 train_time: 5.1m tok/s: 7465739 +2883/20000 train_loss: 2.6413 train_time: 5.1m tok/s: 7464864 +2884/20000 train_loss: 2.6037 train_time: 5.1m tok/s: 7464017 +2885/20000 train_loss: 2.5562 train_time: 5.1m tok/s: 7463172 +2886/20000 train_loss: 2.5530 train_time: 5.1m tok/s: 7462327 +2887/20000 train_loss: 2.5218 train_time: 5.1m tok/s: 7461478 +2888/20000 train_loss: 2.6159 train_time: 5.1m tok/s: 7460588 +2889/20000 train_loss: 2.5675 train_time: 5.1m tok/s: 7459758 +2890/20000 train_loss: 2.5891 train_time: 5.1m tok/s: 7458890 +2891/20000 train_loss: 2.5303 train_time: 5.1m tok/s: 7458100 +2892/20000 train_loss: 2.4673 train_time: 5.1m tok/s: 7457255 +2893/20000 train_loss: 2.3151 train_time: 5.1m tok/s: 7456402 +2894/20000 train_loss: 2.5533 train_time: 5.1m tok/s: 7455544 +2895/20000 train_loss: 2.5206 train_time: 5.1m tok/s: 7454696 +2896/20000 train_loss: 2.4927 train_time: 5.1m tok/s: 7453877 +2897/20000 train_loss: 2.5690 train_time: 5.1m tok/s: 7453037 +2898/20000 train_loss: 2.5617 train_time: 5.1m tok/s: 7452237 +2899/20000 train_loss: 2.5618 train_time: 5.1m tok/s: 7451397 +2900/20000 train_loss: 2.6085 train_time: 5.1m tok/s: 7450544 +2901/20000 train_loss: 2.4171 train_time: 5.1m tok/s: 7449698 +2902/20000 train_loss: 2.5346 train_time: 5.1m tok/s: 7448801 +2903/20000 train_loss: 2.4653 train_time: 5.1m tok/s: 7447968 +2904/20000 train_loss: 2.4622 train_time: 5.1m tok/s: 7447149 +2905/20000 train_loss: 2.6071 train_time: 5.1m tok/s: 7446303 +2906/20000 train_loss: 2.4156 train_time: 5.1m tok/s: 7445493 +2907/20000 train_loss: 2.4541 train_time: 5.1m tok/s: 7444695 +2908/20000 train_loss: 2.5135 train_time: 5.1m tok/s: 7443856 +2909/20000 train_loss: 2.5144 train_time: 5.1m tok/s: 7443016 +2910/20000 train_loss: 2.3947 train_time: 5.1m tok/s: 7442185 +2911/20000 train_loss: 2.5944 train_time: 5.1m tok/s: 7441344 +2912/20000 train_loss: 2.5390 train_time: 5.1m tok/s: 7440464 +2913/20000 train_loss: 2.5701 train_time: 5.1m tok/s: 7439619 +2914/20000 train_loss: 2.6236 train_time: 5.1m tok/s: 7438824 +2915/20000 train_loss: 2.5375 train_time: 5.1m tok/s: 7438025 +2916/20000 train_loss: 2.4691 train_time: 5.1m tok/s: 7437186 +2917/20000 train_loss: 2.5182 train_time: 5.1m tok/s: 7436360 +2918/20000 train_loss: 2.8601 train_time: 5.1m tok/s: 7435531 +2919/20000 train_loss: 2.6925 train_time: 5.1m tok/s: 7434695 +2920/20000 train_loss: 2.4755 train_time: 5.1m tok/s: 7433875 +2921/20000 train_loss: 2.4274 train_time: 5.2m tok/s: 7433059 +2922/20000 train_loss: 2.4653 train_time: 5.2m tok/s: 7432256 +2923/20000 train_loss: 2.4581 train_time: 5.2m tok/s: 7431427 +2924/20000 train_loss: 2.5733 train_time: 5.2m tok/s: 7430614 +2925/20000 train_loss: 2.6224 train_time: 5.2m tok/s: 7429785 +2926/20000 train_loss: 2.4967 train_time: 5.2m tok/s: 7428920 +2927/20000 train_loss: 2.5087 train_time: 5.2m tok/s: 7428128 +2928/20000 train_loss: 2.3518 train_time: 5.2m tok/s: 7427268 +2929/20000 train_loss: 2.5595 train_time: 5.2m tok/s: 7426468 +2930/20000 train_loss: 2.4493 train_time: 5.2m tok/s: 7425646 +2931/20000 train_loss: 2.4403 train_time: 5.2m tok/s: 7424848 +2932/20000 train_loss: 2.4287 train_time: 5.2m tok/s: 7424043 +2933/20000 train_loss: 2.5201 train_time: 5.2m tok/s: 7423249 +2934/20000 train_loss: 2.5334 train_time: 5.2m tok/s: 7422425 +2935/20000 train_loss: 2.5271 train_time: 5.2m tok/s: 7421616 +2936/20000 train_loss: 2.5160 train_time: 5.2m tok/s: 7420814 +2937/20000 train_loss: 2.6488 train_time: 5.2m tok/s: 7420030 +2938/20000 train_loss: 2.5955 train_time: 5.2m tok/s: 7419237 +2939/20000 train_loss: 2.5776 train_time: 5.2m tok/s: 7418430 +2940/20000 train_loss: 2.3921 train_time: 5.2m tok/s: 7417607 +2941/20000 train_loss: 2.6225 train_time: 5.2m tok/s: 7416795 +2942/20000 train_loss: 2.5324 train_time: 5.2m tok/s: 7416001 +2943/20000 train_loss: 2.4242 train_time: 5.2m tok/s: 7415157 +2944/20000 train_loss: 2.4112 train_time: 5.2m tok/s: 7414344 +2945/20000 train_loss: 2.5171 train_time: 5.2m tok/s: 7413542 +2946/20000 train_loss: 2.4908 train_time: 5.2m tok/s: 7412726 +2947/20000 train_loss: 2.4953 train_time: 5.2m tok/s: 7411922 +2948/20000 train_loss: 2.4739 train_time: 5.2m tok/s: 7411130 +2949/20000 train_loss: 2.5420 train_time: 5.2m tok/s: 7410333 +2950/20000 train_loss: 2.6409 train_time: 5.2m tok/s: 7409522 +2951/20000 train_loss: 2.7278 train_time: 5.2m tok/s: 7408721 +2952/20000 train_loss: 2.5532 train_time: 5.2m tok/s: 7407918 +2953/20000 train_loss: 2.5009 train_time: 5.2m tok/s: 7407129 +2954/20000 train_loss: 2.4773 train_time: 5.2m tok/s: 7406328 +2955/20000 train_loss: 2.4615 train_time: 5.2m tok/s: 7405512 +2956/20000 train_loss: 2.4595 train_time: 5.2m tok/s: 7404722 +2957/20000 train_loss: 2.7208 train_time: 5.2m tok/s: 7403861 +2958/20000 train_loss: 2.4363 train_time: 5.2m tok/s: 7403085 +2959/20000 train_loss: 2.4288 train_time: 5.2m tok/s: 7402341 +2960/20000 train_loss: 2.4282 train_time: 5.2m tok/s: 7401571 +2961/20000 train_loss: 2.5311 train_time: 5.2m tok/s: 7400768 +2962/20000 train_loss: 2.4758 train_time: 5.2m tok/s: 7399972 +2963/20000 train_loss: 2.6897 train_time: 5.2m tok/s: 7399182 +2964/20000 train_loss: 2.5806 train_time: 5.3m tok/s: 7398396 +2965/20000 train_loss: 2.6967 train_time: 5.3m tok/s: 7397613 +2966/20000 train_loss: 2.5784 train_time: 5.3m tok/s: 7396848 +2967/20000 train_loss: 2.5514 train_time: 5.3m tok/s: 7396045 +2968/20000 train_loss: 2.4828 train_time: 5.3m tok/s: 7395286 +2969/20000 train_loss: 2.6660 train_time: 5.3m tok/s: 7394478 +2970/20000 train_loss: 2.4726 train_time: 5.3m tok/s: 7393690 +2971/20000 train_loss: 2.4887 train_time: 5.3m tok/s: 7392885 +2972/20000 train_loss: 2.5155 train_time: 5.3m tok/s: 7392105 +2973/20000 train_loss: 2.4409 train_time: 5.3m tok/s: 7391332 +2974/20000 train_loss: 2.5871 train_time: 5.3m tok/s: 7390542 +2975/20000 train_loss: 2.5883 train_time: 5.3m tok/s: 7389761 +2976/20000 train_loss: 2.5445 train_time: 5.3m tok/s: 7389008 +2977/20000 train_loss: 2.4829 train_time: 5.3m tok/s: 7388205 +2978/20000 train_loss: 2.5794 train_time: 5.3m tok/s: 7387407 +2979/20000 train_loss: 2.4749 train_time: 5.3m tok/s: 7386608 +2980/20000 train_loss: 2.5849 train_time: 5.3m tok/s: 7385833 +2981/20000 train_loss: 2.3864 train_time: 5.3m tok/s: 7385056 +2982/20000 train_loss: 2.6201 train_time: 5.3m tok/s: 7384263 +2983/20000 train_loss: 2.4102 train_time: 5.3m tok/s: 7383482 +2984/20000 train_loss: 2.4478 train_time: 5.3m tok/s: 7382691 +2985/20000 train_loss: 2.4437 train_time: 5.3m tok/s: 7381919 +2986/20000 train_loss: 2.5728 train_time: 5.3m tok/s: 7381141 +2987/20000 train_loss: 2.4910 train_time: 5.3m tok/s: 7380304 +2988/20000 train_loss: 2.5509 train_time: 5.3m tok/s: 7379553 +2989/20000 train_loss: 2.4350 train_time: 5.3m tok/s: 7378770 +2990/20000 train_loss: 2.6988 train_time: 5.3m tok/s: 7377988 +2991/20000 train_loss: 2.4830 train_time: 5.3m tok/s: 7377233 +2992/20000 train_loss: 2.5980 train_time: 5.3m tok/s: 7376460 +2993/20000 train_loss: 2.6487 train_time: 5.3m tok/s: 7375649 +2994/20000 train_loss: 2.4309 train_time: 5.3m tok/s: 7374899 +2995/20000 train_loss: 2.6409 train_time: 5.3m tok/s: 7374142 +2996/20000 train_loss: 2.4241 train_time: 5.3m tok/s: 7373364 +2997/20000 train_loss: 2.5777 train_time: 5.3m tok/s: 7372578 +2998/20000 train_loss: 2.4826 train_time: 5.3m tok/s: 7371837 +2999/20000 train_loss: 2.4638 train_time: 5.3m tok/s: 7371016 +3000/20000 train_loss: 2.4840 train_time: 5.3m tok/s: 7370249 +3001/20000 train_loss: 2.5502 train_time: 5.3m tok/s: 7369500 +3002/20000 train_loss: 2.5050 train_time: 5.3m tok/s: 7368774 +3003/20000 train_loss: 2.5363 train_time: 5.3m tok/s: 7368032 +3004/20000 train_loss: 2.4637 train_time: 5.3m tok/s: 7367278 +3005/20000 train_loss: 2.5398 train_time: 5.3m tok/s: 7366525 +3006/20000 train_loss: 2.7526 train_time: 5.3m tok/s: 7365732 +3007/20000 train_loss: 2.5009 train_time: 5.4m tok/s: 7364942 +3008/20000 train_loss: 2.4957 train_time: 5.4m tok/s: 7364199 +3009/20000 train_loss: 2.4754 train_time: 5.4m tok/s: 7363450 +3010/20000 train_loss: 2.4135 train_time: 5.4m tok/s: 7362657 +3011/20000 train_loss: 2.5777 train_time: 5.4m tok/s: 7361864 +3012/20000 train_loss: 2.5457 train_time: 5.4m tok/s: 7361149 +3013/20000 train_loss: 2.5188 train_time: 5.4m tok/s: 7360403 +3014/20000 train_loss: 2.5573 train_time: 5.4m tok/s: 7359673 +3015/20000 train_loss: 2.6593 train_time: 5.4m tok/s: 7358923 +3016/20000 train_loss: 2.6823 train_time: 5.4m tok/s: 7358171 +3017/20000 train_loss: 2.4668 train_time: 5.4m tok/s: 7357407 +3018/20000 train_loss: 2.6175 train_time: 5.4m tok/s: 7356643 +3019/20000 train_loss: 2.4567 train_time: 5.4m tok/s: 7355938 +3020/20000 train_loss: 3.1200 train_time: 5.4m tok/s: 7355099 +3021/20000 train_loss: 2.4679 train_time: 5.4m tok/s: 7354316 +3022/20000 train_loss: 2.4517 train_time: 5.4m tok/s: 7353578 +3023/20000 train_loss: 2.5723 train_time: 5.4m tok/s: 7352832 +3024/20000 train_loss: 3.4325 train_time: 5.4m tok/s: 7351998 +3025/20000 train_loss: 2.4386 train_time: 5.4m tok/s: 7351230 +3026/20000 train_loss: 2.5236 train_time: 5.4m tok/s: 7350421 +3027/20000 train_loss: 2.5662 train_time: 5.4m tok/s: 7349682 +3028/20000 train_loss: 2.6463 train_time: 5.4m tok/s: 7348912 +3029/20000 train_loss: 2.7220 train_time: 5.4m tok/s: 7348208 +3030/20000 train_loss: 2.5623 train_time: 5.4m tok/s: 7347509 +3031/20000 train_loss: 2.5127 train_time: 5.4m tok/s: 7346802 +3032/20000 train_loss: 2.5403 train_time: 5.4m tok/s: 7346046 +3033/20000 train_loss: 2.5684 train_time: 5.4m tok/s: 7345347 +3034/20000 train_loss: 2.4828 train_time: 5.4m tok/s: 7344614 +3035/20000 train_loss: 2.3005 train_time: 5.4m tok/s: 7343860 +3036/20000 train_loss: 2.5710 train_time: 5.4m tok/s: 7343119 +3037/20000 train_loss: 2.4842 train_time: 5.4m tok/s: 7342387 +3038/20000 train_loss: 2.5276 train_time: 5.4m tok/s: 7341673 +3039/20000 train_loss: 2.5187 train_time: 5.4m tok/s: 7340924 +3040/20000 train_loss: 2.4073 train_time: 5.4m tok/s: 7340180 +3041/20000 train_loss: 2.6290 train_time: 5.4m tok/s: 7339449 +3042/20000 train_loss: 2.5788 train_time: 5.4m tok/s: 7338680 +3043/20000 train_loss: 2.6437 train_time: 5.4m tok/s: 7337893 +3044/20000 train_loss: 2.5855 train_time: 5.4m tok/s: 7337231 +3045/20000 train_loss: 2.5752 train_time: 5.4m tok/s: 7336508 +3046/20000 train_loss: 2.5592 train_time: 5.4m tok/s: 7335783 +3047/20000 train_loss: 2.3877 train_time: 5.4m tok/s: 7335014 +3048/20000 train_loss: 2.3539 train_time: 5.4m tok/s: 7334283 +3049/20000 train_loss: 2.5074 train_time: 5.4m tok/s: 7333546 +3050/20000 train_loss: 2.6039 train_time: 5.5m tok/s: 7332813 +3051/20000 train_loss: 2.3611 train_time: 5.5m tok/s: 7332120 +3052/20000 train_loss: 2.4186 train_time: 5.5m tok/s: 7331401 +3053/20000 train_loss: 2.6086 train_time: 5.5m tok/s: 7330619 +3054/20000 train_loss: 2.4201 train_time: 5.5m tok/s: 7329873 +3055/20000 train_loss: 2.5346 train_time: 5.5m tok/s: 7329145 +3056/20000 train_loss: 2.5436 train_time: 5.5m tok/s: 7328429 +3057/20000 train_loss: 2.5029 train_time: 5.5m tok/s: 7327711 +3058/20000 train_loss: 2.5473 train_time: 5.5m tok/s: 7327005 +3059/20000 train_loss: 2.4617 train_time: 5.5m tok/s: 7326289 +3060/20000 train_loss: 2.5684 train_time: 5.5m tok/s: 7325577 +3061/20000 train_loss: 2.4382 train_time: 5.5m tok/s: 7324862 +3062/20000 train_loss: 2.5901 train_time: 5.5m tok/s: 7324138 +3063/20000 train_loss: 2.5429 train_time: 5.5m tok/s: 7323402 +3064/20000 train_loss: 2.4390 train_time: 5.5m tok/s: 7322634 +3065/20000 train_loss: 2.4084 train_time: 5.5m tok/s: 7321901 +3066/20000 train_loss: 2.6699 train_time: 5.5m tok/s: 7321142 +3067/20000 train_loss: 2.4535 train_time: 5.5m tok/s: 7320403 +3068/20000 train_loss: 2.6014 train_time: 5.5m tok/s: 7319711 +3069/20000 train_loss: 2.3904 train_time: 5.5m tok/s: 7318962 +3070/20000 train_loss: 2.5560 train_time: 5.5m tok/s: 7318222 +3071/20000 train_loss: 2.5093 train_time: 5.5m tok/s: 7317495 +3072/20000 train_loss: 2.5410 train_time: 5.5m tok/s: 7316803 +3073/20000 train_loss: 2.6042 train_time: 5.5m tok/s: 7316094 +3074/20000 train_loss: 2.4253 train_time: 5.5m tok/s: 7315374 +3075/20000 train_loss: 2.4115 train_time: 5.5m tok/s: 7314642 +3076/20000 train_loss: 2.4698 train_time: 5.5m tok/s: 7313930 +3077/20000 train_loss: 2.5285 train_time: 5.5m tok/s: 7313229 +3078/20000 train_loss: 2.3914 train_time: 5.5m tok/s: 7312505 +3079/20000 train_loss: 2.4310 train_time: 5.5m tok/s: 7311771 +3080/20000 train_loss: 3.2295 train_time: 5.5m tok/s: 7311012 +3081/20000 train_loss: 2.3817 train_time: 5.5m tok/s: 7310317 +3082/20000 train_loss: 2.4439 train_time: 5.5m tok/s: 7309626 +3083/20000 train_loss: 2.4634 train_time: 5.5m tok/s: 7308935 +3084/20000 train_loss: 2.4662 train_time: 5.5m tok/s: 7308232 +3085/20000 train_loss: 2.5325 train_time: 5.5m tok/s: 7307512 +3086/20000 train_loss: 2.5705 train_time: 5.5m tok/s: 7306846 +3087/20000 train_loss: 2.6097 train_time: 5.5m tok/s: 7306108 +3088/20000 train_loss: 2.5716 train_time: 5.5m tok/s: 7305392 +3089/20000 train_loss: 2.4474 train_time: 5.5m tok/s: 7304688 +3090/20000 train_loss: 2.6913 train_time: 5.5m tok/s: 7303990 +3091/20000 train_loss: 2.4444 train_time: 5.5m tok/s: 7303301 +3092/20000 train_loss: 2.4806 train_time: 5.5m tok/s: 7302578 +3093/20000 train_loss: 2.5252 train_time: 5.6m tok/s: 7301826 +3094/20000 train_loss: 2.4747 train_time: 5.6m tok/s: 7301076 +3095/20000 train_loss: 2.3304 train_time: 5.6m tok/s: 7300375 +3096/20000 train_loss: 2.5068 train_time: 5.6m tok/s: 7299691 +3097/20000 train_loss: 2.5500 train_time: 5.6m tok/s: 7298999 +3098/20000 train_loss: 2.4669 train_time: 5.6m tok/s: 7298296 +3099/20000 train_loss: 2.3244 train_time: 5.6m tok/s: 7297584 +3100/20000 train_loss: 2.4076 train_time: 5.6m tok/s: 7296875 +3101/20000 train_loss: 2.6640 train_time: 5.6m tok/s: 7296210 +3102/20000 train_loss: 2.6487 train_time: 5.6m tok/s: 7295485 +3103/20000 train_loss: 2.4843 train_time: 5.6m tok/s: 7294776 +3104/20000 train_loss: 2.5856 train_time: 5.6m tok/s: 7294074 +3105/20000 train_loss: 2.3862 train_time: 5.6m tok/s: 7293383 +3106/20000 train_loss: 2.5853 train_time: 5.6m tok/s: 7292715 +3107/20000 train_loss: 2.3384 train_time: 5.6m tok/s: 7291974 +3108/20000 train_loss: 2.4687 train_time: 5.6m tok/s: 7291273 +3109/20000 train_loss: 2.5589 train_time: 5.6m tok/s: 7290600 +3110/20000 train_loss: 2.4217 train_time: 5.6m tok/s: 7289907 +3111/20000 train_loss: 2.4315 train_time: 5.6m tok/s: 7289212 +3112/20000 train_loss: 2.3854 train_time: 5.6m tok/s: 7288517 +3113/20000 train_loss: 2.5220 train_time: 5.6m tok/s: 7287752 +3114/20000 train_loss: 2.5429 train_time: 5.6m tok/s: 7287076 +3115/20000 train_loss: 2.5620 train_time: 5.6m tok/s: 7286374 +3116/20000 train_loss: 2.5754 train_time: 5.6m tok/s: 7285668 +3117/20000 train_loss: 2.6006 train_time: 5.6m tok/s: 7284976 +3118/20000 train_loss: 2.5344 train_time: 5.6m tok/s: 7284264 +3119/20000 train_loss: 2.5685 train_time: 5.6m tok/s: 7283581 +3120/20000 train_loss: 2.5532 train_time: 5.6m tok/s: 7282904 +3121/20000 train_loss: 2.5326 train_time: 5.6m tok/s: 7282212 +3122/20000 train_loss: 2.5538 train_time: 5.6m tok/s: 7281509 +3123/20000 train_loss: 2.4266 train_time: 5.6m tok/s: 7280831 +3124/20000 train_loss: 2.5336 train_time: 5.6m tok/s: 7280159 +3125/20000 train_loss: 2.4329 train_time: 5.6m tok/s: 7279454 +3126/20000 train_loss: 2.1199 train_time: 5.6m tok/s: 7278732 +3127/20000 train_loss: 2.6003 train_time: 5.6m tok/s: 7278072 +3128/20000 train_loss: 2.5038 train_time: 5.6m tok/s: 7277360 +3129/20000 train_loss: 2.5325 train_time: 5.6m tok/s: 7276675 +3130/20000 train_loss: 2.4751 train_time: 5.6m tok/s: 7275994 +3131/20000 train_loss: 2.4399 train_time: 5.6m tok/s: 7275311 +3132/20000 train_loss: 2.5638 train_time: 5.6m tok/s: 7274620 +3133/20000 train_loss: 2.5145 train_time: 5.6m tok/s: 7273934 +3134/20000 train_loss: 2.5469 train_time: 5.6m tok/s: 7273273 +3135/20000 train_loss: 2.5025 train_time: 5.7m tok/s: 7272596 +3136/20000 train_loss: 2.5680 train_time: 5.7m tok/s: 7271919 +3137/20000 train_loss: 2.6027 train_time: 5.7m tok/s: 7271247 +3138/20000 train_loss: 2.4463 train_time: 5.7m tok/s: 7270535 +3139/20000 train_loss: 2.4497 train_time: 5.7m tok/s: 7269879 +3140/20000 train_loss: 2.4256 train_time: 5.7m tok/s: 7269190 +3141/20000 train_loss: 2.5734 train_time: 5.7m tok/s: 7268511 +3142/20000 train_loss: 2.5418 train_time: 5.7m tok/s: 7267794 +3143/20000 train_loss: 2.5479 train_time: 5.7m tok/s: 7267111 +3144/20000 train_loss: 2.5473 train_time: 5.7m tok/s: 7266466 +3145/20000 train_loss: 2.0690 train_time: 5.7m tok/s: 7265739 +3146/20000 train_loss: 2.4465 train_time: 5.7m tok/s: 7265069 +3147/20000 train_loss: 2.5490 train_time: 5.7m tok/s: 7264410 +3148/20000 train_loss: 2.4830 train_time: 5.7m tok/s: 7263738 +3149/20000 train_loss: 2.5965 train_time: 5.7m tok/s: 7263061 +3150/20000 train_loss: 2.3880 train_time: 5.7m tok/s: 7262355 +3151/20000 train_loss: 2.4058 train_time: 5.7m tok/s: 7261727 +3152/20000 train_loss: 2.4725 train_time: 5.7m tok/s: 7261059 +3153/20000 train_loss: 2.5212 train_time: 5.7m tok/s: 7260405 +3154/20000 train_loss: 2.3718 train_time: 5.7m tok/s: 7259713 +3155/20000 train_loss: 2.4728 train_time: 5.7m tok/s: 7259047 +3156/20000 train_loss: 2.5897 train_time: 5.7m tok/s: 7258358 +3157/20000 train_loss: 2.5929 train_time: 5.7m tok/s: 7257695 +3158/20000 train_loss: 2.6330 train_time: 5.7m tok/s: 7257018 +3159/20000 train_loss: 2.5218 train_time: 5.7m tok/s: 7256357 +3160/20000 train_loss: 2.3692 train_time: 5.7m tok/s: 7255661 +3161/20000 train_loss: 2.5348 train_time: 5.7m tok/s: 7255007 +3162/20000 train_loss: 2.6202 train_time: 5.7m tok/s: 7254372 +3163/20000 train_loss: 2.5470 train_time: 5.7m tok/s: 7253690 +3164/20000 train_loss: 2.3717 train_time: 5.7m tok/s: 7253005 +3165/20000 train_loss: 2.5047 train_time: 5.7m tok/s: 7252337 +3166/20000 train_loss: 2.4214 train_time: 5.7m tok/s: 7251686 +3167/20000 train_loss: 2.5250 train_time: 5.7m tok/s: 7251024 +3168/20000 train_loss: 2.5111 train_time: 5.7m tok/s: 7250354 +3169/20000 train_loss: 2.4891 train_time: 5.7m tok/s: 7249703 +3170/20000 train_loss: 2.6197 train_time: 5.7m tok/s: 7248979 +3171/20000 train_loss: 2.6526 train_time: 5.7m tok/s: 7248349 +3172/20000 train_loss: 2.6582 train_time: 5.7m tok/s: 7247690 +3173/20000 train_loss: 2.7111 train_time: 5.7m tok/s: 7246989 +3174/20000 train_loss: 2.4650 train_time: 5.7m tok/s: 7246336 +3175/20000 train_loss: 2.6110 train_time: 5.7m tok/s: 7245684 +3176/20000 train_loss: 2.4433 train_time: 5.7m tok/s: 7245050 +3177/20000 train_loss: 2.4470 train_time: 5.7m tok/s: 7244374 +3178/20000 train_loss: 2.5914 train_time: 5.8m tok/s: 7243650 +3179/20000 train_loss: 2.4297 train_time: 5.8m tok/s: 7242931 +3180/20000 train_loss: 2.5291 train_time: 5.8m tok/s: 7242289 +3181/20000 train_loss: 2.3763 train_time: 5.8m tok/s: 7241655 +3182/20000 train_loss: 2.8157 train_time: 5.8m tok/s: 7240984 +3183/20000 train_loss: 2.6566 train_time: 5.8m tok/s: 7240328 +3184/20000 train_loss: 2.4968 train_time: 5.8m tok/s: 7239639 +3185/20000 train_loss: 2.5749 train_time: 5.8m tok/s: 7238971 +3186/20000 train_loss: 2.5264 train_time: 5.8m tok/s: 7238355 +3187/20000 train_loss: 2.5703 train_time: 5.8m tok/s: 7237702 +3188/20000 train_loss: 2.3911 train_time: 5.8m tok/s: 7237046 +3189/20000 train_loss: 2.5913 train_time: 5.8m tok/s: 7236446 +3190/20000 train_loss: 2.5740 train_time: 5.8m tok/s: 7235794 +3191/20000 train_loss: 2.4781 train_time: 5.8m tok/s: 7235130 +3192/20000 train_loss: 2.3756 train_time: 5.8m tok/s: 7234506 +3193/20000 train_loss: 2.4587 train_time: 5.8m tok/s: 7233868 +3194/20000 train_loss: 2.4452 train_time: 5.8m tok/s: 7233244 +3195/20000 train_loss: 2.5014 train_time: 5.8m tok/s: 7232590 +3196/20000 train_loss: 2.3887 train_time: 5.8m tok/s: 7231947 +3197/20000 train_loss: 2.4670 train_time: 5.8m tok/s: 7231297 +3198/20000 train_loss: 2.4922 train_time: 5.8m tok/s: 7230663 +3199/20000 train_loss: 2.5067 train_time: 5.8m tok/s: 7230001 +3200/20000 train_loss: 2.6316 train_time: 5.8m tok/s: 7229359 +3201/20000 train_loss: 2.5681 train_time: 5.8m tok/s: 7228720 +3202/20000 train_loss: 2.2660 train_time: 5.8m tok/s: 7228023 +3203/20000 train_loss: 2.5436 train_time: 5.8m tok/s: 7227364 +3204/20000 train_loss: 2.4430 train_time: 5.8m tok/s: 7226720 +3205/20000 train_loss: 2.5027 train_time: 5.8m tok/s: 7226083 +3206/20000 train_loss: 2.4455 train_time: 5.8m tok/s: 7225461 +3207/20000 train_loss: 2.5527 train_time: 5.8m tok/s: 7224817 +3208/20000 train_loss: 2.5809 train_time: 5.8m tok/s: 7224176 +3209/20000 train_loss: 2.4532 train_time: 5.8m tok/s: 7223534 +3210/20000 train_loss: 2.7146 train_time: 5.8m tok/s: 7222868 +3211/20000 train_loss: 2.5104 train_time: 5.8m tok/s: 7222248 +3212/20000 train_loss: 2.4045 train_time: 5.8m tok/s: 7221617 +3213/20000 train_loss: 2.5550 train_time: 5.8m tok/s: 7220942 +3214/20000 train_loss: 2.4174 train_time: 5.8m tok/s: 7220273 +3215/20000 train_loss: 2.4334 train_time: 5.8m tok/s: 7219657 +3216/20000 train_loss: 2.3250 train_time: 5.8m tok/s: 7219011 +3217/20000 train_loss: 2.4573 train_time: 5.8m tok/s: 7218367 +3218/20000 train_loss: 2.4992 train_time: 5.8m tok/s: 7217746 +3219/20000 train_loss: 2.5620 train_time: 5.8m tok/s: 7217107 +3220/20000 train_loss: 2.6253 train_time: 5.8m tok/s: 7216498 +3221/20000 train_loss: 2.4274 train_time: 5.9m tok/s: 7215888 +3222/20000 train_loss: 2.5267 train_time: 5.9m tok/s: 7215253 +3223/20000 train_loss: 2.6323 train_time: 5.9m tok/s: 7214591 +3224/20000 train_loss: 2.9704 train_time: 5.9m tok/s: 7213903 +3225/20000 train_loss: 2.5757 train_time: 5.9m tok/s: 7213276 +3226/20000 train_loss: 2.4038 train_time: 5.9m tok/s: 7212627 +3227/20000 train_loss: 2.4830 train_time: 5.9m tok/s: 7212004 +3228/20000 train_loss: 2.8148 train_time: 5.9m tok/s: 7211356 +3229/20000 train_loss: 2.4096 train_time: 5.9m tok/s: 7210751 +3230/20000 train_loss: 2.4425 train_time: 5.9m tok/s: 7210108 +3231/20000 train_loss: 2.5574 train_time: 5.9m tok/s: 7209509 +3232/20000 train_loss: 2.5294 train_time: 5.9m tok/s: 7208890 +3233/20000 train_loss: 2.5237 train_time: 5.9m tok/s: 7208263 +3234/20000 train_loss: 2.5934 train_time: 5.9m tok/s: 7207608 +3235/20000 train_loss: 2.5289 train_time: 5.9m tok/s: 7206955 +3236/20000 train_loss: 2.2800 train_time: 5.9m tok/s: 7206336 +3237/20000 train_loss: 2.4675 train_time: 5.9m tok/s: 7205725 +3238/20000 train_loss: 2.3632 train_time: 5.9m tok/s: 7205075 +3239/20000 train_loss: 2.3968 train_time: 5.9m tok/s: 7204418 +3240/20000 train_loss: 2.4947 train_time: 5.9m tok/s: 7203797 +3241/20000 train_loss: 2.4977 train_time: 5.9m tok/s: 7203199 +3242/20000 train_loss: 2.5252 train_time: 5.9m tok/s: 7202593 +3243/20000 train_loss: 2.4453 train_time: 5.9m tok/s: 7201971 +3244/20000 train_loss: 2.5583 train_time: 5.9m tok/s: 7201333 +3245/20000 train_loss: 2.6331 train_time: 5.9m tok/s: 7200710 +3246/20000 train_loss: 2.5135 train_time: 5.9m tok/s: 7200100 +3247/20000 train_loss: 2.4101 train_time: 5.9m tok/s: 7199471 +3248/20000 train_loss: 2.4641 train_time: 5.9m tok/s: 7198825 +3249/20000 train_loss: 2.5948 train_time: 5.9m tok/s: 7198204 +3250/20000 train_loss: 2.5486 train_time: 5.9m tok/s: 7197573 +3251/20000 train_loss: 2.5854 train_time: 5.9m tok/s: 7196933 +3252/20000 train_loss: 2.4455 train_time: 5.9m tok/s: 7196304 +3253/20000 train_loss: 2.4404 train_time: 5.9m tok/s: 7195694 +3254/20000 train_loss: 2.9084 train_time: 5.9m tok/s: 7195007 +3255/20000 train_loss: 2.4589 train_time: 5.9m tok/s: 7194370 +3256/20000 train_loss: 2.4984 train_time: 5.9m tok/s: 7193805 +3257/20000 train_loss: 2.5737 train_time: 5.9m tok/s: 7193129 +3258/20000 train_loss: 2.5417 train_time: 5.9m tok/s: 7192523 +3259/20000 train_loss: 2.4893 train_time: 5.9m tok/s: 7191915 +3260/20000 train_loss: 2.4826 train_time: 5.9m tok/s: 7191321 +3261/20000 train_loss: 2.5216 train_time: 5.9m tok/s: 7190728 +3262/20000 train_loss: 2.4027 train_time: 5.9m tok/s: 7190091 +3263/20000 train_loss: 2.4644 train_time: 5.9m tok/s: 7189482 +3264/20000 train_loss: 2.5121 train_time: 6.0m tok/s: 7188864 +3265/20000 train_loss: 2.5335 train_time: 6.0m tok/s: 7188248 +3266/20000 train_loss: 2.5419 train_time: 6.0m tok/s: 7187674 +3267/20000 train_loss: 2.4892 train_time: 6.0m tok/s: 7187056 +3268/20000 train_loss: 2.5159 train_time: 6.0m tok/s: 7186424 +3269/20000 train_loss: 2.6167 train_time: 6.0m tok/s: 7185794 +3270/20000 train_loss: 2.4660 train_time: 6.0m tok/s: 7185171 +3271/20000 train_loss: 2.5525 train_time: 6.0m tok/s: 7184526 +3272/20000 train_loss: 2.5093 train_time: 6.0m tok/s: 7183936 +3273/20000 train_loss: 2.6726 train_time: 6.0m tok/s: 7183326 +3274/20000 train_loss: 2.4739 train_time: 6.0m tok/s: 7182728 +3275/20000 train_loss: 2.5227 train_time: 6.0m tok/s: 7182144 +3276/20000 train_loss: 2.4958 train_time: 6.0m tok/s: 7181545 +3277/20000 train_loss: 2.5035 train_time: 6.0m tok/s: 7180858 +3278/20000 train_loss: 2.4002 train_time: 6.0m tok/s: 7180291 +3279/20000 train_loss: 2.4710 train_time: 6.0m tok/s: 7179678 +3280/20000 train_loss: 2.3869 train_time: 6.0m tok/s: 7179092 +3281/20000 train_loss: 2.4256 train_time: 6.0m tok/s: 7178429 +3282/20000 train_loss: 2.4917 train_time: 6.0m tok/s: 7177787 +3283/20000 train_loss: 2.4325 train_time: 6.0m tok/s: 7177198 +3284/20000 train_loss: 2.6316 train_time: 6.0m tok/s: 7176613 +3285/20000 train_loss: 2.4892 train_time: 6.0m tok/s: 7175992 +3286/20000 train_loss: 2.4447 train_time: 6.0m tok/s: 7175391 +3287/20000 train_loss: 2.5094 train_time: 6.0m tok/s: 7174823 +3288/20000 train_loss: 2.5037 train_time: 6.0m tok/s: 7174229 +3289/20000 train_loss: 2.4839 train_time: 6.0m tok/s: 7173620 +3290/20000 train_loss: 2.4590 train_time: 6.0m tok/s: 7173036 +3291/20000 train_loss: 2.4432 train_time: 6.0m tok/s: 7172454 +3292/20000 train_loss: 2.4648 train_time: 6.0m tok/s: 7171860 +3293/20000 train_loss: 2.4589 train_time: 6.0m tok/s: 7171230 +3294/20000 train_loss: 2.4295 train_time: 6.0m tok/s: 7170630 +3295/20000 train_loss: 2.5179 train_time: 6.0m tok/s: 7170041 +3296/20000 train_loss: 2.3923 train_time: 6.0m tok/s: 7169416 +3297/20000 train_loss: 2.4029 train_time: 6.0m tok/s: 7168810 +3298/20000 train_loss: 2.4810 train_time: 6.0m tok/s: 7168196 +3299/20000 train_loss: 2.3018 train_time: 6.0m tok/s: 7167548 +3300/20000 train_loss: 2.4603 train_time: 6.0m tok/s: 7166960 +3301/20000 train_loss: 2.4298 train_time: 6.0m tok/s: 7166367 +3302/20000 train_loss: 2.2260 train_time: 6.0m tok/s: 7165696 +3303/20000 train_loss: 2.6549 train_time: 6.0m tok/s: 7165109 +3304/20000 train_loss: 2.4978 train_time: 6.0m tok/s: 7164535 +3305/20000 train_loss: 2.5661 train_time: 6.0m tok/s: 7163947 +3306/20000 train_loss: 2.6147 train_time: 6.0m tok/s: 7163386 +3307/20000 train_loss: 2.5220 train_time: 6.1m tok/s: 7162767 +3308/20000 train_loss: 2.2471 train_time: 6.1m tok/s: 7162169 +3309/20000 train_loss: 2.4734 train_time: 6.1m tok/s: 7161575 +3310/20000 train_loss: 2.5375 train_time: 6.1m tok/s: 7161013 +3311/20000 train_loss: 2.6588 train_time: 6.1m tok/s: 7160392 +3312/20000 train_loss: 2.5314 train_time: 6.1m tok/s: 7159828 +3313/20000 train_loss: 2.3741 train_time: 6.1m tok/s: 7159227 +3314/20000 train_loss: 2.4932 train_time: 6.1m tok/s: 7158673 +3315/20000 train_loss: 2.5193 train_time: 6.1m tok/s: 7158077 +3316/20000 train_loss: 2.5116 train_time: 6.1m tok/s: 7157515 +3317/20000 train_loss: 2.4245 train_time: 6.1m tok/s: 7156898 +3318/20000 train_loss: 2.3951 train_time: 6.1m tok/s: 7156298 +3319/20000 train_loss: 2.3987 train_time: 6.1m tok/s: 7155696 +3320/20000 train_loss: 2.4221 train_time: 6.1m tok/s: 7155103 +3321/20000 train_loss: 2.4437 train_time: 6.1m tok/s: 7154516 +3322/20000 train_loss: 2.3967 train_time: 6.1m tok/s: 7153916 +3323/20000 train_loss: 2.4075 train_time: 6.1m tok/s: 7153311 +3324/20000 train_loss: 2.4545 train_time: 6.1m tok/s: 7152717 +3325/20000 train_loss: 2.5554 train_time: 6.1m tok/s: 7152169 +3326/20000 train_loss: 2.5091 train_time: 6.1m tok/s: 7151592 +3327/20000 train_loss: 2.4195 train_time: 6.1m tok/s: 7151021 +3328/20000 train_loss: 2.5676 train_time: 6.1m tok/s: 7150382 +3329/20000 train_loss: 2.4589 train_time: 6.1m tok/s: 7149820 +3330/20000 train_loss: 2.8022 train_time: 6.1m tok/s: 7149216 +3331/20000 train_loss: 2.6140 train_time: 6.1m tok/s: 7148634 +3332/20000 train_loss: 2.5293 train_time: 6.1m tok/s: 7148042 +3333/20000 train_loss: 2.3851 train_time: 6.1m tok/s: 7147498 +3334/20000 train_loss: 2.4664 train_time: 6.1m tok/s: 7146907 +3335/20000 train_loss: 2.5640 train_time: 6.1m tok/s: 7146345 +3336/20000 train_loss: 2.5167 train_time: 6.1m tok/s: 7145785 +3337/20000 train_loss: 2.2930 train_time: 6.1m tok/s: 7145230 +3338/20000 train_loss: 2.4640 train_time: 6.1m tok/s: 7144634 +3339/20000 train_loss: 2.4096 train_time: 6.1m tok/s: 7144032 +3340/20000 train_loss: 2.4313 train_time: 6.1m tok/s: 7143455 +3341/20000 train_loss: 2.4451 train_time: 6.1m tok/s: 7142874 +3342/20000 train_loss: 2.3835 train_time: 6.1m tok/s: 7142273 +3343/20000 train_loss: 2.3845 train_time: 6.1m tok/s: 7141699 +3344/20000 train_loss: 2.5397 train_time: 6.1m tok/s: 7141119 +3345/20000 train_loss: 2.4693 train_time: 6.1m tok/s: 7140553 +3346/20000 train_loss: 2.4839 train_time: 6.1m tok/s: 7139959 +3347/20000 train_loss: 2.5239 train_time: 6.1m tok/s: 7139382 +3348/20000 train_loss: 2.5812 train_time: 6.1m tok/s: 7138785 +3349/20000 train_loss: 2.4968 train_time: 6.1m tok/s: 7138208 +3350/20000 train_loss: 2.4481 train_time: 6.2m tok/s: 7137642 +3351/20000 train_loss: 2.4829 train_time: 6.2m tok/s: 7137099 +3352/20000 train_loss: 2.4912 train_time: 6.2m tok/s: 7136527 +3353/20000 train_loss: 2.4329 train_time: 6.2m tok/s: 7135911 +3354/20000 train_loss: 2.5235 train_time: 6.2m tok/s: 7135325 +3355/20000 train_loss: 2.5622 train_time: 6.2m tok/s: 7134734 +3356/20000 train_loss: 2.3523 train_time: 6.2m tok/s: 7134122 +3357/20000 train_loss: 2.3535 train_time: 6.2m tok/s: 7133544 +3358/20000 train_loss: 2.5081 train_time: 6.2m tok/s: 7132949 +3359/20000 train_loss: 2.4437 train_time: 6.2m tok/s: 7132382 +3360/20000 train_loss: 2.4519 train_time: 6.2m tok/s: 7131830 +3361/20000 train_loss: 2.5160 train_time: 6.2m tok/s: 7131287 +3362/20000 train_loss: 2.4883 train_time: 6.2m tok/s: 7130744 +3363/20000 train_loss: 2.4211 train_time: 6.2m tok/s: 7130130 +3364/20000 train_loss: 2.5065 train_time: 6.2m tok/s: 7129582 +3365/20000 train_loss: 2.5105 train_time: 6.2m tok/s: 7129025 +3366/20000 train_loss: 2.3797 train_time: 6.2m tok/s: 7128450 +3367/20000 train_loss: 2.4190 train_time: 6.2m tok/s: 7127864 +3368/20000 train_loss: 2.3751 train_time: 6.2m tok/s: 7127321 +3369/20000 train_loss: 2.6137 train_time: 6.2m tok/s: 7126709 +3370/20000 train_loss: 2.5867 train_time: 6.2m tok/s: 7126119 +3371/20000 train_loss: 2.5008 train_time: 6.2m tok/s: 7125561 +3372/20000 train_loss: 2.5788 train_time: 6.2m tok/s: 7125015 +3373/20000 train_loss: 2.4649 train_time: 6.2m tok/s: 7124450 +3374/20000 train_loss: 2.4657 train_time: 6.2m tok/s: 7123902 +3375/20000 train_loss: 2.4768 train_time: 6.2m tok/s: 7123315 +3376/20000 train_loss: 2.4241 train_time: 6.2m tok/s: 7122752 +3377/20000 train_loss: 2.5516 train_time: 6.2m tok/s: 7122199 +3378/20000 train_loss: 2.3387 train_time: 6.2m tok/s: 7121625 +3379/20000 train_loss: 2.4119 train_time: 6.2m tok/s: 7121077 +3380/20000 train_loss: 2.3512 train_time: 6.2m tok/s: 7120468 +3381/20000 train_loss: 2.3308 train_time: 6.2m tok/s: 7119902 +3382/20000 train_loss: 2.5739 train_time: 6.2m tok/s: 7119384 +3383/20000 train_loss: 2.4963 train_time: 6.2m tok/s: 7118815 +3384/20000 train_loss: 2.4819 train_time: 6.2m tok/s: 7118235 +3385/20000 train_loss: 2.4738 train_time: 6.2m tok/s: 7117710 +3386/20000 train_loss: 2.4925 train_time: 6.2m tok/s: 7117143 +3387/20000 train_loss: 2.5179 train_time: 6.2m tok/s: 7116575 +3388/20000 train_loss: 2.2938 train_time: 6.2m tok/s: 7115932 +3389/20000 train_loss: 2.4199 train_time: 6.2m tok/s: 7115408 +3390/20000 train_loss: 2.4964 train_time: 6.2m tok/s: 7114842 +3391/20000 train_loss: 2.4946 train_time: 6.2m tok/s: 7114318 +3392/20000 train_loss: 2.4608 train_time: 6.2m tok/s: 7113756 +3393/20000 train_loss: 2.4411 train_time: 6.3m tok/s: 7113209 +3394/20000 train_loss: 2.4623 train_time: 6.3m tok/s: 7112681 +3395/20000 train_loss: 2.4558 train_time: 6.3m tok/s: 7112116 +3396/20000 train_loss: 2.6042 train_time: 6.3m tok/s: 7111583 +3397/20000 train_loss: 2.5238 train_time: 6.3m tok/s: 7110996 +3398/20000 train_loss: 2.3634 train_time: 6.3m tok/s: 7110432 +3399/20000 train_loss: 2.4214 train_time: 6.3m tok/s: 7109852 +3400/20000 train_loss: 2.5236 train_time: 6.3m tok/s: 7109286 +3401/20000 train_loss: 2.4066 train_time: 6.3m tok/s: 7108735 +3402/20000 train_loss: 2.4460 train_time: 6.3m tok/s: 7108217 +3403/20000 train_loss: 2.4835 train_time: 6.3m tok/s: 7107675 +3404/20000 train_loss: 2.4491 train_time: 6.3m tok/s: 7107126 +3405/20000 train_loss: 2.6259 train_time: 6.3m tok/s: 7106562 +3406/20000 train_loss: 2.4740 train_time: 6.3m tok/s: 7106038 +3407/20000 train_loss: 2.5022 train_time: 6.3m tok/s: 7105486 +3408/20000 train_loss: 2.5675 train_time: 6.3m tok/s: 7104924 +3409/20000 train_loss: 2.4109 train_time: 6.3m tok/s: 7104363 +3410/20000 train_loss: 2.4201 train_time: 6.3m tok/s: 7103800 +3411/20000 train_loss: 2.3420 train_time: 6.3m tok/s: 7103251 +3412/20000 train_loss: 2.3418 train_time: 6.3m tok/s: 7102700 +3413/20000 train_loss: 2.3354 train_time: 6.3m tok/s: 7102115 +3414/20000 train_loss: 2.4392 train_time: 6.3m tok/s: 7101590 +3415/20000 train_loss: 2.5920 train_time: 6.3m tok/s: 7101048 +3416/20000 train_loss: 2.4995 train_time: 6.3m tok/s: 7100508 +3417/20000 train_loss: 2.5651 train_time: 6.3m tok/s: 7099983 +3418/20000 train_loss: 2.5026 train_time: 6.3m tok/s: 7099418 +3419/20000 train_loss: 2.5278 train_time: 6.3m tok/s: 7098852 +3420/20000 train_loss: 2.5372 train_time: 6.3m tok/s: 7098283 +3421/20000 train_loss: 2.3724 train_time: 6.3m tok/s: 7097739 +3422/20000 train_loss: 2.6820 train_time: 6.3m tok/s: 7097172 +3423/20000 train_loss: 2.4012 train_time: 6.3m tok/s: 7096621 +3424/20000 train_loss: 2.4492 train_time: 6.3m tok/s: 7096120 +3425/20000 train_loss: 2.4224 train_time: 6.3m tok/s: 7095582 +3426/20000 train_loss: 2.5048 train_time: 6.3m tok/s: 7095052 +3427/20000 train_loss: 2.4574 train_time: 6.3m tok/s: 7094517 +3428/20000 train_loss: 2.4555 train_time: 6.3m tok/s: 7093952 +3429/20000 train_loss: 2.5958 train_time: 6.3m tok/s: 7093416 +3430/20000 train_loss: 2.4138 train_time: 6.3m tok/s: 7092868 +3431/20000 train_loss: 2.4666 train_time: 6.3m tok/s: 7092326 +3432/20000 train_loss: 2.5927 train_time: 6.3m tok/s: 7091774 +3433/20000 train_loss: 2.4548 train_time: 6.3m tok/s: 7091237 +3434/20000 train_loss: 2.5114 train_time: 6.3m tok/s: 7090698 +3435/20000 train_loss: 2.4386 train_time: 6.4m tok/s: 7090129 +3436/20000 train_loss: 2.4114 train_time: 6.4m tok/s: 7089598 +3437/20000 train_loss: 2.5472 train_time: 6.4m tok/s: 7089048 +3438/20000 train_loss: 2.4870 train_time: 6.4m tok/s: 7088511 +3439/20000 train_loss: 2.3244 train_time: 6.4m tok/s: 7087965 +3440/20000 train_loss: 2.4800 train_time: 6.4m tok/s: 7087421 +3441/20000 train_loss: 2.3781 train_time: 6.4m tok/s: 7086902 +3442/20000 train_loss: 2.4075 train_time: 6.4m tok/s: 7086366 +3443/20000 train_loss: 2.6580 train_time: 6.4m tok/s: 7085749 +3444/20000 train_loss: 2.3229 train_time: 6.4m tok/s: 7085220 +3445/20000 train_loss: 2.4615 train_time: 6.4m tok/s: 7084707 +3446/20000 train_loss: 2.5265 train_time: 6.4m tok/s: 7084210 +3447/20000 train_loss: 2.4848 train_time: 6.4m tok/s: 7083667 +3448/20000 train_loss: 2.6048 train_time: 6.4m tok/s: 7083145 +3449/20000 train_loss: 2.4741 train_time: 6.4m tok/s: 7082623 +3450/20000 train_loss: 2.4281 train_time: 6.4m tok/s: 7082091 +3451/20000 train_loss: 2.4902 train_time: 6.4m tok/s: 7081574 +3452/20000 train_loss: 2.5142 train_time: 6.4m tok/s: 7081048 +3453/20000 train_loss: 2.4023 train_time: 6.4m tok/s: 7080490 +3454/20000 train_loss: 2.5197 train_time: 6.4m tok/s: 7079952 +3455/20000 train_loss: 2.4801 train_time: 6.4m tok/s: 7079417 +3456/20000 train_loss: 2.4824 train_time: 6.4m tok/s: 7078868 +3457/20000 train_loss: 2.4390 train_time: 6.4m tok/s: 7078340 +3458/20000 train_loss: 2.3951 train_time: 6.4m tok/s: 7077809 +3459/20000 train_loss: 2.3627 train_time: 6.4m tok/s: 7077260 +3460/20000 train_loss: 2.5176 train_time: 6.4m tok/s: 7076757 +3461/20000 train_loss: 2.6255 train_time: 6.4m tok/s: 7076238 +3462/20000 train_loss: 2.5459 train_time: 6.4m tok/s: 7075724 +3463/20000 train_loss: 2.5698 train_time: 6.4m tok/s: 7075195 +3464/20000 train_loss: 2.4813 train_time: 6.4m tok/s: 7074659 +3465/20000 train_loss: 2.5264 train_time: 6.4m tok/s: 7074112 +3466/20000 train_loss: 2.5895 train_time: 6.4m tok/s: 7073557 +3467/20000 train_loss: 2.4422 train_time: 6.4m tok/s: 7073037 +3468/20000 train_loss: 2.6316 train_time: 6.4m tok/s: 7072462 +3469/20000 train_loss: 2.3911 train_time: 6.4m tok/s: 7071924 +3470/20000 train_loss: 2.3986 train_time: 6.4m tok/s: 7071385 +3471/20000 train_loss: 2.3894 train_time: 6.4m tok/s: 7070864 +3472/20000 train_loss: 2.4341 train_time: 6.4m tok/s: 7070330 +3473/20000 train_loss: 2.3967 train_time: 6.4m tok/s: 7069795 +3474/20000 train_loss: 2.4729 train_time: 6.4m tok/s: 7069321 +3475/20000 train_loss: 2.5199 train_time: 6.4m tok/s: 7068809 +3476/20000 train_loss: 2.4534 train_time: 6.4m tok/s: 7068293 +3477/20000 train_loss: 2.5265 train_time: 6.4m tok/s: 7067796 +3478/20000 train_loss: 2.5563 train_time: 6.5m tok/s: 7067253 +3479/20000 train_loss: 2.4552 train_time: 6.5m tok/s: 7066724 +3480/20000 train_loss: 2.5218 train_time: 6.5m tok/s: 7066215 +3481/20000 train_loss: 2.4812 train_time: 6.5m tok/s: 7065689 +3482/20000 train_loss: 2.4246 train_time: 6.5m tok/s: 7065172 +3483/20000 train_loss: 2.4135 train_time: 6.5m tok/s: 7064639 +3484/20000 train_loss: 2.4664 train_time: 6.5m tok/s: 7064092 +3485/20000 train_loss: 2.3978 train_time: 6.5m tok/s: 7063548 +3486/20000 train_loss: 2.3790 train_time: 6.5m tok/s: 7063039 +3487/20000 train_loss: 2.4752 train_time: 6.5m tok/s: 7062537 +3488/20000 train_loss: 2.3655 train_time: 6.5m tok/s: 7062003 +3489/20000 train_loss: 2.3809 train_time: 6.5m tok/s: 7061508 +3490/20000 train_loss: 2.4803 train_time: 6.5m tok/s: 7060982 +3491/20000 train_loss: 2.5808 train_time: 6.5m tok/s: 7060473 +3492/20000 train_loss: 2.4219 train_time: 6.5m tok/s: 7059941 +3493/20000 train_loss: 2.5487 train_time: 6.5m tok/s: 7059420 +3494/20000 train_loss: 2.5215 train_time: 6.5m tok/s: 7058899 +3495/20000 train_loss: 2.4859 train_time: 6.5m tok/s: 7058394 +3496/20000 train_loss: 2.5887 train_time: 6.5m tok/s: 7057857 +3497/20000 train_loss: 2.6826 train_time: 6.5m tok/s: 7057333 +3498/20000 train_loss: 2.3489 train_time: 6.5m tok/s: 7056780 +3499/20000 train_loss: 2.4571 train_time: 6.5m tok/s: 7056263 +3500/20000 train_loss: 2.3643 train_time: 6.5m tok/s: 7055765 +3501/20000 train_loss: 2.4152 train_time: 6.5m tok/s: 7055250 +3502/20000 train_loss: 3.2537 train_time: 6.5m tok/s: 7054691 +3503/20000 train_loss: 2.4561 train_time: 6.5m tok/s: 7054189 +3504/20000 train_loss: 2.5520 train_time: 6.5m tok/s: 7053684 +3505/20000 train_loss: 2.5014 train_time: 6.5m tok/s: 7053158 +3506/20000 train_loss: 2.5595 train_time: 6.5m tok/s: 7052641 +3507/20000 train_loss: 2.5266 train_time: 6.5m tok/s: 7052153 +3508/20000 train_loss: 2.6145 train_time: 6.5m tok/s: 7051633 +3509/20000 train_loss: 2.4345 train_time: 6.5m tok/s: 7051124 +3510/20000 train_loss: 2.4274 train_time: 6.5m tok/s: 7050612 +3511/20000 train_loss: 2.4147 train_time: 6.5m tok/s: 7050096 +3512/20000 train_loss: 2.4612 train_time: 6.5m tok/s: 7049615 +3513/20000 train_loss: 2.4101 train_time: 6.5m tok/s: 7049102 +3514/20000 train_loss: 2.5405 train_time: 6.5m tok/s: 7048598 +3515/20000 train_loss: 2.3628 train_time: 6.5m tok/s: 7048087 +3516/20000 train_loss: 2.2678 train_time: 6.5m tok/s: 7047556 +3517/20000 train_loss: 2.4471 train_time: 6.5m tok/s: 7047047 +3518/20000 train_loss: 2.4150 train_time: 6.5m tok/s: 7046531 +3519/20000 train_loss: 2.8152 train_time: 6.5m tok/s: 7046012 +3520/20000 train_loss: 2.6382 train_time: 6.5m tok/s: 7045504 +3521/20000 train_loss: 2.4947 train_time: 6.6m tok/s: 7045002 +3522/20000 train_loss: 2.5101 train_time: 6.6m tok/s: 7044511 +3523/20000 train_loss: 2.3836 train_time: 6.6m tok/s: 7043984 +3524/20000 train_loss: 2.7616 train_time: 6.6m tok/s: 7043456 +3525/20000 train_loss: 2.5207 train_time: 6.6m tok/s: 7042973 +3526/20000 train_loss: 2.4885 train_time: 6.6m tok/s: 7042480 +3527/20000 train_loss: 2.4110 train_time: 6.6m tok/s: 7041958 +3528/20000 train_loss: 2.4551 train_time: 6.6m tok/s: 7041471 +3529/20000 train_loss: 2.4471 train_time: 6.6m tok/s: 7040972 +3530/20000 train_loss: 2.6740 train_time: 6.6m tok/s: 7040477 +3531/20000 train_loss: 2.3005 train_time: 6.6m tok/s: 7039962 +3532/20000 train_loss: 2.2754 train_time: 6.6m tok/s: 7039426 +3533/20000 train_loss: 2.4349 train_time: 6.6m tok/s: 7038943 +3534/20000 train_loss: 2.4674 train_time: 6.6m tok/s: 7038449 +3535/20000 train_loss: 2.4061 train_time: 6.6m tok/s: 7037905 +3536/20000 train_loss: 2.4711 train_time: 6.6m tok/s: 7037399 +3537/20000 train_loss: 2.6009 train_time: 6.6m tok/s: 7036911 +3538/20000 train_loss: 2.4461 train_time: 6.6m tok/s: 7036401 +3539/20000 train_loss: 2.4700 train_time: 6.6m tok/s: 7035894 +3540/20000 train_loss: 2.4923 train_time: 6.6m tok/s: 7035408 +3541/20000 train_loss: 2.2918 train_time: 6.6m tok/s: 7034869 +3542/20000 train_loss: 2.4060 train_time: 6.6m tok/s: 7034385 +3543/20000 train_loss: 2.5108 train_time: 6.6m tok/s: 7033919 +3544/20000 train_loss: 2.4217 train_time: 6.6m tok/s: 7033426 +3545/20000 train_loss: 2.4562 train_time: 6.6m tok/s: 7032932 +3546/20000 train_loss: 2.3594 train_time: 6.6m tok/s: 7032448 +3547/20000 train_loss: 2.3354 train_time: 6.6m tok/s: 7031939 +3548/20000 train_loss: 2.5740 train_time: 6.6m tok/s: 7031456 +3549/20000 train_loss: 2.5424 train_time: 6.6m tok/s: 7030946 +3550/20000 train_loss: 2.5435 train_time: 6.6m tok/s: 7030445 +3551/20000 train_loss: 2.5676 train_time: 6.6m tok/s: 7029952 +3552/20000 train_loss: 2.4730 train_time: 6.6m tok/s: 7029451 +3553/20000 train_loss: 2.5794 train_time: 6.6m tok/s: 7028928 +3554/20000 train_loss: 2.4968 train_time: 6.6m tok/s: 7028440 +3555/20000 train_loss: 2.5131 train_time: 6.6m tok/s: 7027919 +3556/20000 train_loss: 2.5184 train_time: 6.6m tok/s: 7027441 +3557/20000 train_loss: 2.4359 train_time: 6.6m tok/s: 7026950 +3558/20000 train_loss: 2.6188 train_time: 6.6m tok/s: 7026446 +3559/20000 train_loss: 2.4893 train_time: 6.6m tok/s: 7025958 +3560/20000 train_loss: 2.4495 train_time: 6.6m tok/s: 7025472 +3561/20000 train_loss: 3.1508 train_time: 6.6m tok/s: 7024970 +3562/20000 train_loss: 2.3820 train_time: 6.6m tok/s: 7024474 +3563/20000 train_loss: 2.4776 train_time: 6.6m tok/s: 7023990 +3564/20000 train_loss: 2.4798 train_time: 6.7m tok/s: 7023501 +3565/20000 train_loss: 2.4718 train_time: 6.7m tok/s: 7023002 +3566/20000 train_loss: 2.4701 train_time: 6.7m tok/s: 7022501 +3567/20000 train_loss: 2.5216 train_time: 6.7m tok/s: 7022009 +3568/20000 train_loss: 2.5333 train_time: 6.7m tok/s: 7021501 +3569/20000 train_loss: 2.3183 train_time: 6.7m tok/s: 7021010 +3570/20000 train_loss: 2.3002 train_time: 6.7m tok/s: 7020506 +3571/20000 train_loss: 2.3844 train_time: 6.7m tok/s: 7020037 +3572/20000 train_loss: 2.3876 train_time: 6.7m tok/s: 7019537 +3573/20000 train_loss: 2.2646 train_time: 6.7m tok/s: 7019025 +3574/20000 train_loss: 2.4195 train_time: 6.7m tok/s: 7018508 +3575/20000 train_loss: 2.5021 train_time: 6.7m tok/s: 7018028 +3576/20000 train_loss: 2.5048 train_time: 6.7m tok/s: 7017568 +3577/20000 train_loss: 2.4834 train_time: 6.7m tok/s: 7017078 +3578/20000 train_loss: 2.5625 train_time: 6.7m tok/s: 7016595 +3579/20000 train_loss: 2.5155 train_time: 6.7m tok/s: 7016114 +3580/20000 train_loss: 2.5200 train_time: 6.7m tok/s: 7015595 +3581/20000 train_loss: 2.5040 train_time: 6.7m tok/s: 7015139 +3582/20000 train_loss: 2.3786 train_time: 6.7m tok/s: 7014647 +3583/20000 train_loss: 2.4107 train_time: 6.7m tok/s: 7014167 +3584/20000 train_loss: 2.3550 train_time: 6.7m tok/s: 7013670 +3585/20000 train_loss: 2.3413 train_time: 6.7m tok/s: 7013149 +3586/20000 train_loss: 2.1771 train_time: 6.7m tok/s: 7012644 +3587/20000 train_loss: 2.3261 train_time: 6.7m tok/s: 7012166 +3588/20000 train_loss: 2.3762 train_time: 6.7m tok/s: 7011693 +3589/20000 train_loss: 2.6170 train_time: 6.7m tok/s: 7011165 +3590/20000 train_loss: 2.5090 train_time: 6.7m tok/s: 7010702 +3591/20000 train_loss: 2.5254 train_time: 6.7m tok/s: 7010232 +3592/20000 train_loss: 2.5125 train_time: 6.7m tok/s: 7009766 +3593/20000 train_loss: 2.5689 train_time: 6.7m tok/s: 7009283 +3594/20000 train_loss: 2.4895 train_time: 6.7m tok/s: 7008812 +3595/20000 train_loss: 2.5334 train_time: 6.7m tok/s: 7008336 +3596/20000 train_loss: 2.5142 train_time: 6.7m tok/s: 7007890 +3597/20000 train_loss: 2.5492 train_time: 6.7m tok/s: 7007376 +3598/20000 train_loss: 2.2644 train_time: 6.7m tok/s: 7006877 +3599/20000 train_loss: 2.4678 train_time: 6.7m tok/s: 7006418 +3600/20000 train_loss: 2.5597 train_time: 6.7m tok/s: 7005950 +3601/20000 train_loss: 2.4273 train_time: 6.7m tok/s: 7005494 +3602/20000 train_loss: 2.3135 train_time: 6.7m tok/s: 7004984 +3603/20000 train_loss: 2.5459 train_time: 6.7m tok/s: 7004497 +3604/20000 train_loss: 2.5742 train_time: 6.7m tok/s: 7004050 +3605/20000 train_loss: 2.4399 train_time: 6.7m tok/s: 7003573 +3606/20000 train_loss: 2.4455 train_time: 6.7m tok/s: 7003098 +3607/20000 train_loss: 2.5733 train_time: 6.8m tok/s: 7002621 +3608/20000 train_loss: 2.4212 train_time: 6.8m tok/s: 7002141 +3609/20000 train_loss: 2.4298 train_time: 6.8m tok/s: 7001646 +3610/20000 train_loss: 2.5449 train_time: 6.8m tok/s: 7001159 +3611/20000 train_loss: 2.5282 train_time: 6.8m tok/s: 7000707 +3612/20000 train_loss: 2.4062 train_time: 6.8m tok/s: 7000249 +3613/20000 train_loss: 2.4426 train_time: 6.8m tok/s: 6999777 +3614/20000 train_loss: 2.5392 train_time: 6.8m tok/s: 6999323 +3615/20000 train_loss: 2.3687 train_time: 6.8m tok/s: 6998835 +3616/20000 train_loss: 2.4741 train_time: 6.8m tok/s: 6998366 +3617/20000 train_loss: 2.4300 train_time: 6.8m tok/s: 6997864 +3618/20000 train_loss: 2.3676 train_time: 6.8m tok/s: 6997378 +3619/20000 train_loss: 2.6401 train_time: 6.8m tok/s: 6996900 +3620/20000 train_loss: 2.3782 train_time: 6.8m tok/s: 6996421 +3621/20000 train_loss: 2.4969 train_time: 6.8m tok/s: 6995967 +3622/20000 train_loss: 2.5241 train_time: 6.8m tok/s: 6995476 +3623/20000 train_loss: 2.5351 train_time: 6.8m tok/s: 6994968 +3624/20000 train_loss: 2.5898 train_time: 6.8m tok/s: 6994490 +3625/20000 train_loss: 2.4384 train_time: 6.8m tok/s: 6994023 +3626/20000 train_loss: 2.4041 train_time: 6.8m tok/s: 6993554 +3627/20000 train_loss: 2.3779 train_time: 6.8m tok/s: 6993110 +3628/20000 train_loss: 2.3938 train_time: 6.8m tok/s: 6992636 +3629/20000 train_loss: 2.5326 train_time: 6.8m tok/s: 6992155 +3630/20000 train_loss: 2.5355 train_time: 6.8m tok/s: 6991673 +3631/20000 train_loss: 2.5370 train_time: 6.8m tok/s: 6991226 +3632/20000 train_loss: 2.5801 train_time: 6.8m tok/s: 6990764 +3633/20000 train_loss: 2.3912 train_time: 6.8m tok/s: 6990296 +3634/20000 train_loss: 2.5125 train_time: 6.8m tok/s: 6989844 +3635/20000 train_loss: 2.4983 train_time: 6.8m tok/s: 6989403 +3636/20000 train_loss: 2.4837 train_time: 6.8m tok/s: 6988946 +3637/20000 train_loss: 2.4436 train_time: 6.8m tok/s: 6988485 +3638/20000 train_loss: 2.4264 train_time: 6.8m tok/s: 6988039 +3639/20000 train_loss: 2.4529 train_time: 6.8m tok/s: 6987574 +3640/20000 train_loss: 2.3957 train_time: 6.8m tok/s: 6987111 +3641/20000 train_loss: 2.3896 train_time: 6.8m tok/s: 6986669 +3642/20000 train_loss: 2.2978 train_time: 6.8m tok/s: 6986233 +3643/20000 train_loss: 2.3839 train_time: 6.8m tok/s: 6985763 +3644/20000 train_loss: 2.0912 train_time: 6.8m tok/s: 6985257 +3645/20000 train_loss: 2.5215 train_time: 6.8m tok/s: 6984811 +3646/20000 train_loss: 2.5317 train_time: 6.8m tok/s: 6984390 +3647/20000 train_loss: 2.5385 train_time: 6.8m tok/s: 6983956 +3648/20000 train_loss: 2.4802 train_time: 6.8m tok/s: 6983485 +3649/20000 train_loss: 2.4499 train_time: 6.8m tok/s: 6983065 +3650/20000 train_loss: 2.6527 train_time: 6.9m tok/s: 6982573 +3651/20000 train_loss: 2.5564 train_time: 6.9m tok/s: 6982131 +3652/20000 train_loss: 2.4341 train_time: 6.9m tok/s: 6981705 +3653/20000 train_loss: 2.4481 train_time: 6.9m tok/s: 6981251 +3654/20000 train_loss: 2.3805 train_time: 6.9m tok/s: 6980803 +3655/20000 train_loss: 2.4052 train_time: 6.9m tok/s: 6980297 +3656/20000 train_loss: 2.3830 train_time: 6.9m tok/s: 6979807 +3657/20000 train_loss: 2.4250 train_time: 6.9m tok/s: 6979381 +3658/20000 train_loss: 2.5595 train_time: 6.9m tok/s: 6978921 +3659/20000 train_loss: 2.4264 train_time: 6.9m tok/s: 6978479 +3660/20000 train_loss: 2.3356 train_time: 6.9m tok/s: 6978014 +3661/20000 train_loss: 2.4414 train_time: 6.9m tok/s: 6977537 +3662/20000 train_loss: 2.4615 train_time: 6.9m tok/s: 6977083 +3663/20000 train_loss: 2.0425 train_time: 6.9m tok/s: 6976618 +3664/20000 train_loss: 2.6435 train_time: 6.9m tok/s: 6976158 +3665/20000 train_loss: 2.4860 train_time: 6.9m tok/s: 6975703 +3666/20000 train_loss: 2.4433 train_time: 6.9m tok/s: 6975242 +3667/20000 train_loss: 2.3198 train_time: 6.9m tok/s: 6974778 +3668/20000 train_loss: 2.2215 train_time: 6.9m tok/s: 6974295 +3669/20000 train_loss: 2.3880 train_time: 6.9m tok/s: 6973809 +3670/20000 train_loss: 2.4223 train_time: 6.9m tok/s: 6973363 +3671/20000 train_loss: 2.4314 train_time: 6.9m tok/s: 6972903 +3672/20000 train_loss: 2.5391 train_time: 6.9m tok/s: 6972459 +3673/20000 train_loss: 2.4531 train_time: 6.9m tok/s: 6972016 +3674/20000 train_loss: 2.5365 train_time: 6.9m tok/s: 6971581 +3675/20000 train_loss: 2.5772 train_time: 6.9m tok/s: 6971130 +3676/20000 train_loss: 2.4693 train_time: 6.9m tok/s: 6970677 +3677/20000 train_loss: 2.5344 train_time: 6.9m tok/s: 6970230 +3678/20000 train_loss: 2.4734 train_time: 6.9m tok/s: 6969763 +3679/20000 train_loss: 2.5732 train_time: 6.9m tok/s: 6969270 +3680/20000 train_loss: 2.4711 train_time: 6.9m tok/s: 6968846 +3681/20000 train_loss: 2.5347 train_time: 6.9m tok/s: 6968397 +3682/20000 train_loss: 2.3534 train_time: 6.9m tok/s: 6967915 +3683/20000 train_loss: 2.5737 train_time: 6.9m tok/s: 6967459 +3684/20000 train_loss: 2.4020 train_time: 6.9m tok/s: 6967005 +3685/20000 train_loss: 2.6487 train_time: 6.9m tok/s: 6966543 +3686/20000 train_loss: 2.4893 train_time: 6.9m tok/s: 6966087 +3687/20000 train_loss: 2.5658 train_time: 6.9m tok/s: 6965663 +3688/20000 train_loss: 2.4730 train_time: 6.9m tok/s: 6965220 +3689/20000 train_loss: 2.5402 train_time: 6.9m tok/s: 6964771 +3690/20000 train_loss: 2.5979 train_time: 6.9m tok/s: 6964305 +3691/20000 train_loss: 2.5518 train_time: 6.9m tok/s: 6963853 +3692/20000 train_loss: 2.4890 train_time: 6.9m tok/s: 6963417 +3693/20000 train_loss: 2.5058 train_time: 7.0m tok/s: 6962968 +3694/20000 train_loss: 2.5065 train_time: 7.0m tok/s: 6962532 +3695/20000 train_loss: 2.4263 train_time: 7.0m tok/s: 6962069 +3696/20000 train_loss: 2.4121 train_time: 7.0m tok/s: 6961608 +3697/20000 train_loss: 2.4253 train_time: 7.0m tok/s: 6961134 +3698/20000 train_loss: 2.4699 train_time: 7.0m tok/s: 6960683 +3699/20000 train_loss: 2.4890 train_time: 7.0m tok/s: 6960255 +3700/20000 train_loss: 2.6283 train_time: 7.0m tok/s: 6959775 +3701/20000 train_loss: 2.5250 train_time: 7.0m tok/s: 6959320 +3702/20000 train_loss: 2.6027 train_time: 7.0m tok/s: 6958873 +3703/20000 train_loss: 2.5329 train_time: 7.0m tok/s: 6958421 +3704/20000 train_loss: 2.8432 train_time: 7.0m tok/s: 6957940 +3705/20000 train_loss: 2.4398 train_time: 7.0m tok/s: 6957511 +3706/20000 train_loss: 2.4593 train_time: 7.0m tok/s: 6957084 +3707/20000 train_loss: 2.4550 train_time: 7.0m tok/s: 6956649 +3708/20000 train_loss: 2.4043 train_time: 7.0m tok/s: 6956201 +3709/20000 train_loss: 2.5321 train_time: 7.0m tok/s: 6955778 +3710/20000 train_loss: 2.4011 train_time: 7.0m tok/s: 6955336 +3711/20000 train_loss: 2.5421 train_time: 7.0m tok/s: 6954882 +3712/20000 train_loss: 2.4515 train_time: 7.0m tok/s: 6954425 +3713/20000 train_loss: 2.4627 train_time: 7.0m tok/s: 6953986 +3714/20000 train_loss: 2.4293 train_time: 7.0m tok/s: 6953577 +3715/20000 train_loss: 2.4230 train_time: 7.0m tok/s: 6953135 +3716/20000 train_loss: 2.4205 train_time: 7.0m tok/s: 6952667 +3717/20000 train_loss: 2.3518 train_time: 7.0m tok/s: 6952208 +3718/20000 train_loss: 2.5453 train_time: 7.0m tok/s: 6951759 +3719/20000 train_loss: 2.4330 train_time: 7.0m tok/s: 6951334 +3720/20000 train_loss: 2.5002 train_time: 7.0m tok/s: 6950879 +3721/20000 train_loss: 2.5596 train_time: 7.0m tok/s: 6950430 +3722/20000 train_loss: 2.4446 train_time: 7.0m tok/s: 6949998 +3723/20000 train_loss: 2.6126 train_time: 7.0m tok/s: 6949554 +3724/20000 train_loss: 2.4246 train_time: 7.0m tok/s: 6949091 +3725/20000 train_loss: 2.4685 train_time: 7.0m tok/s: 6948649 +3726/20000 train_loss: 2.4302 train_time: 7.0m tok/s: 6948214 +3727/20000 train_loss: 2.4202 train_time: 7.0m tok/s: 6947763 +3728/20000 train_loss: 2.4435 train_time: 7.0m tok/s: 6947334 +3729/20000 train_loss: 2.5120 train_time: 7.0m tok/s: 6946901 +3730/20000 train_loss: 2.4277 train_time: 7.0m tok/s: 6946452 +3731/20000 train_loss: 2.5097 train_time: 7.0m tok/s: 6946005 +3732/20000 train_loss: 2.4297 train_time: 7.0m tok/s: 6945571 +3733/20000 train_loss: 2.5335 train_time: 7.0m tok/s: 6945145 +3734/20000 train_loss: 2.4514 train_time: 7.0m tok/s: 6944714 +3735/20000 train_loss: 2.5197 train_time: 7.0m tok/s: 6944263 +3736/20000 train_loss: 2.4304 train_time: 7.1m tok/s: 6943794 +3737/20000 train_loss: 2.4450 train_time: 7.1m tok/s: 6943358 +3738/20000 train_loss: 2.3873 train_time: 7.1m tok/s: 6942914 +3739/20000 train_loss: 2.4596 train_time: 7.1m tok/s: 6942484 +3740/20000 train_loss: 2.5310 train_time: 7.1m tok/s: 6942050 +3741/20000 train_loss: 2.3283 train_time: 7.1m tok/s: 6941585 +3742/20000 train_loss: 2.4582 train_time: 7.1m tok/s: 6941168 +3743/20000 train_loss: 2.4075 train_time: 7.1m tok/s: 6940709 +3744/20000 train_loss: 2.4315 train_time: 7.1m tok/s: 6940287 +3745/20000 train_loss: 2.5123 train_time: 7.1m tok/s: 6939852 +3746/20000 train_loss: 2.3687 train_time: 7.1m tok/s: 6939412 +3747/20000 train_loss: 2.4307 train_time: 7.1m tok/s: 6938974 +3748/20000 train_loss: 2.5475 train_time: 7.1m tok/s: 6938562 +3749/20000 train_loss: 2.5375 train_time: 7.1m tok/s: 6938136 +3750/20000 train_loss: 2.4781 train_time: 7.1m tok/s: 6937659 +3751/20000 train_loss: 2.5772 train_time: 7.1m tok/s: 6937221 +3752/20000 train_loss: 2.4319 train_time: 7.1m tok/s: 6936813 +3753/20000 train_loss: 2.3549 train_time: 7.1m tok/s: 6936370 +3754/20000 train_loss: 2.4404 train_time: 7.1m tok/s: 6935937 +3755/20000 train_loss: 2.3914 train_time: 7.1m tok/s: 6935496 +3756/20000 train_loss: 2.4776 train_time: 7.1m tok/s: 6935070 +3757/20000 train_loss: 2.3670 train_time: 7.1m tok/s: 6934630 +3758/20000 train_loss: 2.7205 train_time: 7.1m tok/s: 6934202 +3759/20000 train_loss: 2.6036 train_time: 7.1m tok/s: 6933782 +3760/20000 train_loss: 2.5442 train_time: 7.1m tok/s: 6933368 +3761/20000 train_loss: 2.6231 train_time: 7.1m tok/s: 6932894 +3762/20000 train_loss: 2.4668 train_time: 7.1m tok/s: 6932471 +3763/20000 train_loss: 2.4906 train_time: 7.1m tok/s: 6932055 +3764/20000 train_loss: 2.4499 train_time: 7.1m tok/s: 6931635 +3765/20000 train_loss: 2.4253 train_time: 7.1m tok/s: 6931196 +3766/20000 train_loss: 2.4115 train_time: 7.1m tok/s: 6930772 +3767/20000 train_loss: 2.4738 train_time: 7.1m tok/s: 6930355 +3768/20000 train_loss: 2.4678 train_time: 7.1m tok/s: 6929918 +3769/20000 train_loss: 2.4089 train_time: 7.1m tok/s: 6929493 +3770/20000 train_loss: 2.5564 train_time: 7.1m tok/s: 6929058 +3771/20000 train_loss: 2.4493 train_time: 7.1m tok/s: 6928630 +3772/20000 train_loss: 2.5157 train_time: 7.1m tok/s: 6928213 +3773/20000 train_loss: 2.4870 train_time: 7.1m tok/s: 6927754 +3774/20000 train_loss: 2.3880 train_time: 7.1m tok/s: 6927346 +3775/20000 train_loss: 2.6263 train_time: 7.1m tok/s: 6926933 +3776/20000 train_loss: 2.4962 train_time: 7.1m tok/s: 6926489 +3777/20000 train_loss: 2.4013 train_time: 7.1m tok/s: 6926047 +3778/20000 train_loss: 2.4621 train_time: 7.2m tok/s: 6925623 +3779/20000 train_loss: 2.4570 train_time: 7.2m tok/s: 6925203 +3780/20000 train_loss: 2.3775 train_time: 7.2m tok/s: 6924769 +3781/20000 train_loss: 2.4491 train_time: 7.2m tok/s: 6924323 +3782/20000 train_loss: 2.3837 train_time: 7.2m tok/s: 6923894 +3783/20000 train_loss: 2.4562 train_time: 7.2m tok/s: 6923473 +3784/20000 train_loss: 2.4641 train_time: 7.2m tok/s: 6923066 +3785/20000 train_loss: 2.4824 train_time: 7.2m tok/s: 6922653 +3786/20000 train_loss: 2.4988 train_time: 7.2m tok/s: 6922223 +3787/20000 train_loss: 2.5094 train_time: 7.2m tok/s: 6921795 +3788/20000 train_loss: 2.4189 train_time: 7.2m tok/s: 6921393 +3789/20000 train_loss: 2.3868 train_time: 7.2m tok/s: 6920984 +3790/20000 train_loss: 2.4480 train_time: 7.2m tok/s: 6920551 +3791/20000 train_loss: 2.3415 train_time: 7.2m tok/s: 6920095 +3792/20000 train_loss: 2.4607 train_time: 7.2m tok/s: 6919667 +3793/20000 train_loss: 2.3852 train_time: 7.2m tok/s: 6919223 +3794/20000 train_loss: 2.4770 train_time: 7.2m tok/s: 6918794 +3795/20000 train_loss: 2.4690 train_time: 7.2m tok/s: 6918385 +3796/20000 train_loss: 2.5346 train_time: 7.2m tok/s: 6917963 +3797/20000 train_loss: 2.5794 train_time: 7.2m tok/s: 6917559 +3798/20000 train_loss: 2.6839 train_time: 7.2m tok/s: 6917132 +3799/20000 train_loss: 2.4824 train_time: 7.2m tok/s: 6916707 +3800/20000 train_loss: 2.4848 train_time: 7.2m tok/s: 6916265 +3801/20000 train_loss: 2.2662 train_time: 7.2m tok/s: 6915814 +3802/20000 train_loss: 2.4853 train_time: 7.2m tok/s: 6915424 +3803/20000 train_loss: 2.3963 train_time: 7.2m tok/s: 6915029 +3804/20000 train_loss: 2.4667 train_time: 7.2m tok/s: 6914616 +3805/20000 train_loss: 2.3891 train_time: 7.2m tok/s: 6914201 +3806/20000 train_loss: 2.4756 train_time: 7.2m tok/s: 6913787 +3807/20000 train_loss: 2.4704 train_time: 7.2m tok/s: 6913376 +3808/20000 train_loss: 2.6572 train_time: 7.2m tok/s: 6912959 +3809/20000 train_loss: 2.5652 train_time: 7.2m tok/s: 6912552 +3810/20000 train_loss: 2.4263 train_time: 7.2m tok/s: 6912116 +3811/20000 train_loss: 2.4402 train_time: 7.2m tok/s: 6911682 +3812/20000 train_loss: 2.4376 train_time: 7.2m tok/s: 6911269 +3813/20000 train_loss: 2.4542 train_time: 7.2m tok/s: 6910844 +3814/20000 train_loss: 4.3540 train_time: 7.2m tok/s: 6910380 +3815/20000 train_loss: 2.4367 train_time: 7.2m tok/s: 6909972 +3816/20000 train_loss: 2.5520 train_time: 7.2m tok/s: 6909505 +3817/20000 train_loss: 2.4941 train_time: 7.2m tok/s: 6909128 +3818/20000 train_loss: 2.5115 train_time: 7.2m tok/s: 6908722 +3819/20000 train_loss: 2.3850 train_time: 7.2m tok/s: 6908313 +3820/20000 train_loss: 2.5549 train_time: 7.2m tok/s: 6907907 +3821/20000 train_loss: 2.4198 train_time: 7.3m tok/s: 6907518 +3822/20000 train_loss: 2.4549 train_time: 7.3m tok/s: 6907112 +3823/20000 train_loss: 2.4784 train_time: 7.3m tok/s: 6906685 +3824/20000 train_loss: 2.5046 train_time: 7.3m tok/s: 6906297 +3825/20000 train_loss: 2.4807 train_time: 7.3m tok/s: 6905889 +3826/20000 train_loss: 2.5548 train_time: 7.3m tok/s: 6905458 +3827/20000 train_loss: 2.5648 train_time: 7.3m tok/s: 6905035 +3828/20000 train_loss: 2.5500 train_time: 7.3m tok/s: 6904622 +3829/20000 train_loss: 2.5085 train_time: 7.3m tok/s: 6904230 +3830/20000 train_loss: 2.5433 train_time: 7.3m tok/s: 6903822 +3831/20000 train_loss: 2.5075 train_time: 7.3m tok/s: 6903406 +3832/20000 train_loss: 2.4524 train_time: 7.3m tok/s: 6902989 +3833/20000 train_loss: 2.4855 train_time: 7.3m tok/s: 6902579 +3834/20000 train_loss: 2.4184 train_time: 7.3m tok/s: 6902161 +3835/20000 train_loss: 2.4563 train_time: 7.3m tok/s: 6901741 +3836/20000 train_loss: 2.4222 train_time: 7.3m tok/s: 6901314 +3837/20000 train_loss: 2.4424 train_time: 7.3m tok/s: 6900909 +3838/20000 train_loss: 2.4498 train_time: 7.3m tok/s: 6900512 +3839/20000 train_loss: 2.4353 train_time: 7.3m tok/s: 6900090 +3840/20000 train_loss: 2.3913 train_time: 7.3m tok/s: 6899673 +3841/20000 train_loss: 2.5570 train_time: 7.3m tok/s: 6899256 +3842/20000 train_loss: 2.4179 train_time: 7.3m tok/s: 6898870 +3843/20000 train_loss: 2.4476 train_time: 7.3m tok/s: 6898463 +3844/20000 train_loss: 2.3891 train_time: 7.3m tok/s: 6898080 +3845/20000 train_loss: 2.5041 train_time: 7.3m tok/s: 6897690 +3846/20000 train_loss: 2.5958 train_time: 7.3m tok/s: 6897265 +3847/20000 train_loss: 2.4140 train_time: 7.3m tok/s: 6896858 +3848/20000 train_loss: 2.4390 train_time: 7.3m tok/s: 6896439 +3849/20000 train_loss: 2.5702 train_time: 7.3m tok/s: 6896044 +3850/20000 train_loss: 2.5412 train_time: 7.3m tok/s: 6895636 +3851/20000 train_loss: 2.4585 train_time: 7.3m tok/s: 6895214 +3852/20000 train_loss: 2.4434 train_time: 7.3m tok/s: 6894803 +3853/20000 train_loss: 2.3913 train_time: 7.3m tok/s: 6894411 +3854/20000 train_loss: 2.2869 train_time: 7.3m tok/s: 6894012 +3855/20000 train_loss: 2.4910 train_time: 7.3m tok/s: 6893621 +3856/20000 train_loss: 2.4100 train_time: 7.3m tok/s: 6893190 +3857/20000 train_loss: 2.2076 train_time: 7.3m tok/s: 6892755 +3858/20000 train_loss: 2.4858 train_time: 7.3m tok/s: 6892321 +3859/20000 train_loss: 2.3970 train_time: 7.3m tok/s: 6891911 +3860/20000 train_loss: 2.3303 train_time: 7.3m tok/s: 6891514 +3861/20000 train_loss: 2.4701 train_time: 7.3m tok/s: 6891104 +3862/20000 train_loss: 2.5971 train_time: 7.3m tok/s: 6890702 +3863/20000 train_loss: 2.5163 train_time: 7.3m tok/s: 6890313 +3864/20000 train_loss: 2.4629 train_time: 7.4m tok/s: 6889919 +3865/20000 train_loss: 2.4851 train_time: 7.4m tok/s: 6889534 +3866/20000 train_loss: 2.4061 train_time: 7.4m tok/s: 6889125 +3867/20000 train_loss: 2.8911 train_time: 7.4m tok/s: 6888687 +3868/20000 train_loss: 2.3163 train_time: 7.4m tok/s: 6888258 +3869/20000 train_loss: 2.4424 train_time: 7.4m tok/s: 6887867 +3870/20000 train_loss: 2.5593 train_time: 7.4m tok/s: 6887477 +3871/20000 train_loss: 2.4042 train_time: 7.4m tok/s: 6887083 +3872/20000 train_loss: 2.4479 train_time: 7.4m tok/s: 6886673 +3873/20000 train_loss: 2.3787 train_time: 7.4m tok/s: 6886295 +3874/20000 train_loss: 2.4422 train_time: 7.4m tok/s: 6885912 +3875/20000 train_loss: 2.4448 train_time: 7.4m tok/s: 6885508 +3876/20000 train_loss: 2.3304 train_time: 7.4m tok/s: 6885105 +3877/20000 train_loss: 2.6177 train_time: 7.4m tok/s: 6884681 +3878/20000 train_loss: 2.9861 train_time: 7.4m tok/s: 6884240 +3879/20000 train_loss: 2.4195 train_time: 7.4m tok/s: 6883842 +3880/20000 train_loss: 2.4129 train_time: 7.4m tok/s: 6883446 +3881/20000 train_loss: 2.4049 train_time: 7.4m tok/s: 6883047 +3882/20000 train_loss: 2.3135 train_time: 7.4m tok/s: 6882666 +3883/20000 train_loss: 2.4637 train_time: 7.4m tok/s: 6882272 +3884/20000 train_loss: 2.4648 train_time: 7.4m tok/s: 6881894 +3885/20000 train_loss: 2.4093 train_time: 7.4m tok/s: 6881488 +3886/20000 train_loss: 2.3933 train_time: 7.4m tok/s: 6881098 +3887/20000 train_loss: 2.4179 train_time: 7.4m tok/s: 6880683 +3888/20000 train_loss: 2.4244 train_time: 7.4m tok/s: 6880296 +3889/20000 train_loss: 2.4083 train_time: 7.4m tok/s: 6879922 +3890/20000 train_loss: 2.3527 train_time: 7.4m tok/s: 6879518 +3891/20000 train_loss: 2.5876 train_time: 7.4m tok/s: 6879117 +3892/20000 train_loss: 2.3937 train_time: 7.4m tok/s: 6878709 +3893/20000 train_loss: 2.4967 train_time: 7.4m tok/s: 6878344 +3894/20000 train_loss: 2.2295 train_time: 7.4m tok/s: 6877926 +3895/20000 train_loss: 2.5577 train_time: 7.4m tok/s: 6877551 +3896/20000 train_loss: 2.4659 train_time: 7.4m tok/s: 6877160 +3897/20000 train_loss: 2.4649 train_time: 7.4m tok/s: 6876760 +3898/20000 train_loss: 2.3873 train_time: 7.4m tok/s: 6876355 +3899/20000 train_loss: 2.6146 train_time: 7.4m tok/s: 6875967 +3900/20000 train_loss: 2.5389 train_time: 7.4m tok/s: 6875533 +3901/20000 train_loss: 2.4455 train_time: 7.4m tok/s: 6875147 +3902/20000 train_loss: 2.4826 train_time: 7.4m tok/s: 6874731 +3903/20000 train_loss: 2.4767 train_time: 7.4m tok/s: 6874320 +3904/20000 train_loss: 2.3054 train_time: 7.4m tok/s: 6873923 +3905/20000 train_loss: 2.4322 train_time: 7.4m tok/s: 6873532 +3906/20000 train_loss: 2.5454 train_time: 7.4m tok/s: 6873145 +3907/20000 train_loss: 2.4658 train_time: 7.5m tok/s: 6872766 +3908/20000 train_loss: 2.4627 train_time: 7.5m tok/s: 6872370 +3909/20000 train_loss: 2.4672 train_time: 7.5m tok/s: 6871988 +3910/20000 train_loss: 2.4461 train_time: 7.5m tok/s: 6871593 +3911/20000 train_loss: 2.5221 train_time: 7.5m tok/s: 6871202 +3912/20000 train_loss: 2.5765 train_time: 7.5m tok/s: 6870816 +3913/20000 train_loss: 2.4862 train_time: 7.5m tok/s: 6870462 +3914/20000 train_loss: 2.4576 train_time: 7.5m tok/s: 6870046 +3915/20000 train_loss: 2.3535 train_time: 7.5m tok/s: 6869653 +3916/20000 train_loss: 2.4130 train_time: 7.5m tok/s: 6869280 +3917/20000 train_loss: 2.5786 train_time: 7.5m tok/s: 6868867 +3918/20000 train_loss: 2.4445 train_time: 7.5m tok/s: 6868488 +3919/20000 train_loss: 2.4056 train_time: 7.5m tok/s: 6868121 +3920/20000 train_loss: 2.3557 train_time: 7.5m tok/s: 6867740 +3921/20000 train_loss: 2.4895 train_time: 7.5m tok/s: 6867348 +3922/20000 train_loss: 2.5776 train_time: 7.5m tok/s: 6866947 +3923/20000 train_loss: 2.4932 train_time: 7.5m tok/s: 6866576 +3924/20000 train_loss: 2.5565 train_time: 7.5m tok/s: 6866203 +3925/20000 train_loss: 2.4392 train_time: 7.5m tok/s: 6865833 +3926/20000 train_loss: 2.3595 train_time: 7.5m tok/s: 6865439 +3927/20000 train_loss: 2.3387 train_time: 7.5m tok/s: 6865064 +3928/20000 train_loss: 2.4440 train_time: 7.5m tok/s: 6864690 +3929/20000 train_loss: 2.4328 train_time: 7.5m tok/s: 6864335 +3930/20000 train_loss: 2.5181 train_time: 7.5m tok/s: 6863970 +3931/20000 train_loss: 2.4851 train_time: 7.5m tok/s: 6863581 +3932/20000 train_loss: 2.4717 train_time: 7.5m tok/s: 6863216 +3933/20000 train_loss: 2.5438 train_time: 7.5m tok/s: 6862852 +3934/20000 train_loss: 2.4872 train_time: 7.5m tok/s: 6862476 +3935/20000 train_loss: 2.4036 train_time: 7.5m tok/s: 6862054 +3936/20000 train_loss: 2.3821 train_time: 7.5m tok/s: 6861688 +3937/20000 train_loss: 2.4037 train_time: 7.5m tok/s: 6861306 +3938/20000 train_loss: 2.3515 train_time: 7.5m tok/s: 6860950 +3939/20000 train_loss: 2.4934 train_time: 7.5m tok/s: 6860586 +3940/20000 train_loss: 2.4185 train_time: 7.5m tok/s: 6860177 +3941/20000 train_loss: 2.5808 train_time: 7.5m tok/s: 6859790 +3942/20000 train_loss: 2.3511 train_time: 7.5m tok/s: 6859418 +3943/20000 train_loss: 2.4124 train_time: 7.5m tok/s: 6859042 +3944/20000 train_loss: 2.4939 train_time: 7.5m tok/s: 6858682 +3945/20000 train_loss: 2.4070 train_time: 7.5m tok/s: 6858290 +3946/20000 train_loss: 2.4450 train_time: 7.5m tok/s: 6857896 +3947/20000 train_loss: 2.4692 train_time: 7.5m tok/s: 6857542 +3948/20000 train_loss: 2.3502 train_time: 7.5m tok/s: 6857170 +3949/20000 train_loss: 2.3697 train_time: 7.5m tok/s: 6856786 +3950/20000 train_loss: 2.3389 train_time: 7.6m tok/s: 6856409 +3951/20000 train_loss: 2.4687 train_time: 7.6m tok/s: 6855983 +3952/20000 train_loss: 2.4443 train_time: 7.6m tok/s: 6855619 +3953/20000 train_loss: 2.4699 train_time: 7.6m tok/s: 6855248 +3954/20000 train_loss: 2.4840 train_time: 7.6m tok/s: 6854859 +3955/20000 train_loss: 2.4743 train_time: 7.6m tok/s: 6854485 +3956/20000 train_loss: 2.4576 train_time: 7.6m tok/s: 6854118 +3957/20000 train_loss: 2.3400 train_time: 7.6m tok/s: 6853739 +3958/20000 train_loss: 2.5264 train_time: 7.6m tok/s: 6853345 +3959/20000 train_loss: 2.2674 train_time: 7.6m tok/s: 6852945 +3960/20000 train_loss: 2.3958 train_time: 7.6m tok/s: 6852551 +3961/20000 train_loss: 2.3811 train_time: 7.6m tok/s: 6852173 +3962/20000 train_loss: 2.3895 train_time: 7.6m tok/s: 6851802 +3963/20000 train_loss: 2.4814 train_time: 7.6m tok/s: 6851423 +3964/20000 train_loss: 2.2918 train_time: 7.6m tok/s: 6851024 +3965/20000 train_loss: 2.5963 train_time: 7.6m tok/s: 6850667 +3966/20000 train_loss: 2.4597 train_time: 7.6m tok/s: 6850309 +3967/20000 train_loss: 2.4256 train_time: 7.6m tok/s: 6849938 +3968/20000 train_loss: 2.4689 train_time: 7.6m tok/s: 6849556 +3969/20000 train_loss: 2.4149 train_time: 7.6m tok/s: 6849187 +3970/20000 train_loss: 2.5131 train_time: 7.6m tok/s: 6848823 +3971/20000 train_loss: 2.4183 train_time: 7.6m tok/s: 6848462 +3972/20000 train_loss: 2.4326 train_time: 7.6m tok/s: 6848086 +3973/20000 train_loss: 2.3940 train_time: 7.6m tok/s: 6847718 +3974/20000 train_loss: 2.3530 train_time: 7.6m tok/s: 6847348 +3975/20000 train_loss: 2.5165 train_time: 7.6m tok/s: 6846973 +3976/20000 train_loss: 2.4940 train_time: 7.6m tok/s: 6846611 +3977/20000 train_loss: 2.7638 train_time: 7.6m tok/s: 6846258 +3978/20000 train_loss: 2.4883 train_time: 7.6m tok/s: 6845915 +3979/20000 train_loss: 3.1384 train_time: 7.6m tok/s: 6845486 +3980/20000 train_loss: 2.4497 train_time: 7.6m tok/s: 6845090 +3981/20000 train_loss: 2.3784 train_time: 7.6m tok/s: 6844725 +3982/20000 train_loss: 2.4772 train_time: 7.6m tok/s: 6844364 +3983/20000 train_loss: 2.3929 train_time: 7.6m tok/s: 6843978 +3984/20000 train_loss: 2.5175 train_time: 7.6m tok/s: 6843602 +3985/20000 train_loss: 2.4737 train_time: 7.6m tok/s: 6843233 +3986/20000 train_loss: 2.4027 train_time: 7.6m tok/s: 6842884 +3987/20000 train_loss: 2.5584 train_time: 7.6m tok/s: 6842523 +3988/20000 train_loss: 2.4898 train_time: 7.6m tok/s: 6842188 +3989/20000 train_loss: 2.4366 train_time: 7.6m tok/s: 6841802 +3990/20000 train_loss: 2.4252 train_time: 7.6m tok/s: 6841443 +3991/20000 train_loss: 2.4469 train_time: 7.6m tok/s: 6841101 +3992/20000 train_loss: 2.4494 train_time: 7.6m tok/s: 6840765 +3993/20000 train_loss: 2.3831 train_time: 7.7m tok/s: 6840426 +3994/20000 train_loss: 2.1488 train_time: 7.7m tok/s: 6840006 +3995/20000 train_loss: 2.4500 train_time: 7.7m tok/s: 6839648 +3996/20000 train_loss: 2.3635 train_time: 7.7m tok/s: 6839295 +3997/20000 train_loss: 2.4735 train_time: 7.7m tok/s: 6838904 +3998/20000 train_loss: 2.3975 train_time: 7.7m tok/s: 6838536 +3999/20000 train_loss: 2.4116 train_time: 7.7m tok/s: 6838187 +4000/20000 train_loss: 2.5072 train_time: 7.7m tok/s: 6837848 +4001/20000 train_loss: 2.4244 train_time: 7.7m tok/s: 6837479 +4002/20000 train_loss: 2.4292 train_time: 7.7m tok/s: 6837106 +4003/20000 train_loss: 2.3695 train_time: 7.7m tok/s: 6836743 +4004/20000 train_loss: 2.4847 train_time: 7.7m tok/s: 6836371 +4005/20000 train_loss: 2.4389 train_time: 7.7m tok/s: 6836009 +4006/20000 train_loss: 2.4662 train_time: 7.7m tok/s: 6835611 +4007/20000 train_loss: 2.4723 train_time: 7.7m tok/s: 6835226 +4008/20000 train_loss: 2.3589 train_time: 7.7m tok/s: 6834858 +4009/20000 train_loss: 2.3866 train_time: 7.7m tok/s: 6834499 +4010/20000 train_loss: 2.4901 train_time: 7.7m tok/s: 6834153 +4011/20000 train_loss: 2.4547 train_time: 7.7m tok/s: 6833791 +4012/20000 train_loss: 2.4684 train_time: 7.7m tok/s: 6833405 +4013/20000 train_loss: 2.3532 train_time: 7.7m tok/s: 6833001 +4014/20000 train_loss: 2.3977 train_time: 7.7m tok/s: 6832651 +4015/20000 train_loss: 2.4257 train_time: 7.7m tok/s: 6832262 +4016/20000 train_loss: 2.4218 train_time: 7.7m tok/s: 6831887 +4017/20000 train_loss: 2.5352 train_time: 7.7m tok/s: 6831533 +4018/20000 train_loss: 2.3891 train_time: 7.7m tok/s: 6831171 +4019/20000 train_loss: 2.2971 train_time: 7.7m tok/s: 6830777 +4020/20000 train_loss: 2.3329 train_time: 7.7m tok/s: 6830425 +4021/20000 train_loss: 2.3784 train_time: 7.7m tok/s: 6830046 +4022/20000 train_loss: 2.4642 train_time: 7.7m tok/s: 6829678 +4023/20000 train_loss: 2.4438 train_time: 7.7m tok/s: 6829322 +4024/20000 train_loss: 2.5052 train_time: 7.7m tok/s: 6828967 +4025/20000 train_loss: 2.4854 train_time: 7.7m tok/s: 6828632 +4026/20000 train_loss: 2.4137 train_time: 7.7m tok/s: 6828279 +4027/20000 train_loss: 2.3435 train_time: 7.7m tok/s: 6827892 +4028/20000 train_loss: 2.2444 train_time: 7.7m tok/s: 6827514 +4029/20000 train_loss: 2.3828 train_time: 7.7m tok/s: 6827143 +4030/20000 train_loss: 2.3988 train_time: 7.7m tok/s: 6826796 +4031/20000 train_loss: 2.4765 train_time: 7.7m tok/s: 6826444 +4032/20000 train_loss: 2.3699 train_time: 7.7m tok/s: 6826069 +4033/20000 train_loss: 2.4281 train_time: 7.7m tok/s: 6825730 +4034/20000 train_loss: 2.5976 train_time: 7.7m tok/s: 6825359 +4035/20000 train_loss: 2.5254 train_time: 7.7m tok/s: 6825010 +4036/20000 train_loss: 2.4899 train_time: 7.8m tok/s: 6824650 +4037/20000 train_loss: 2.4816 train_time: 7.8m tok/s: 6824300 +4038/20000 train_loss: 2.4912 train_time: 7.8m tok/s: 6823962 +4039/20000 train_loss: 2.4733 train_time: 7.8m tok/s: 6823607 +4040/20000 train_loss: 2.4227 train_time: 7.8m tok/s: 6823221 +4041/20000 train_loss: 2.3873 train_time: 7.8m tok/s: 6822864 +4042/20000 train_loss: 2.3595 train_time: 7.8m tok/s: 6822509 +4043/20000 train_loss: 2.3786 train_time: 7.8m tok/s: 6822134 +4044/20000 train_loss: 2.4185 train_time: 7.8m tok/s: 6821769 +4045/20000 train_loss: 2.4571 train_time: 7.8m tok/s: 6821436 +4046/20000 train_loss: 2.5233 train_time: 7.8m tok/s: 6821083 +4047/20000 train_loss: 2.4972 train_time: 7.8m tok/s: 6820739 +4048/20000 train_loss: 2.4226 train_time: 7.8m tok/s: 6820380 +4049/20000 train_loss: 2.5480 train_time: 7.8m tok/s: 6820048 +4050/20000 train_loss: 2.4407 train_time: 7.8m tok/s: 6819653 +4051/20000 train_loss: 2.4067 train_time: 7.8m tok/s: 6819285 +4052/20000 train_loss: 2.4907 train_time: 7.8m tok/s: 6818934 +4053/20000 train_loss: 2.4328 train_time: 7.8m tok/s: 6818584 +4054/20000 train_loss: 2.4165 train_time: 7.8m tok/s: 6818250 +4055/20000 train_loss: 2.4427 train_time: 7.8m tok/s: 6817912 +4056/20000 train_loss: 2.3208 train_time: 7.8m tok/s: 6817552 +4057/20000 train_loss: 2.4139 train_time: 7.8m tok/s: 6817217 +4058/20000 train_loss: 2.5559 train_time: 7.8m tok/s: 6816827 +4059/20000 train_loss: 2.2795 train_time: 7.8m tok/s: 6816495 +4060/20000 train_loss: 2.3949 train_time: 7.8m tok/s: 6816170 +4061/20000 train_loss: 2.4970 train_time: 7.8m tok/s: 6815841 +4062/20000 train_loss: 2.4855 train_time: 7.8m tok/s: 6815507 +4063/20000 train_loss: 2.4491 train_time: 7.8m tok/s: 6815146 +4064/20000 train_loss: 2.4283 train_time: 7.8m tok/s: 6814808 +4065/20000 train_loss: 2.4946 train_time: 7.8m tok/s: 6814462 +4066/20000 train_loss: 2.4064 train_time: 7.8m tok/s: 6814097 +4067/20000 train_loss: 2.4733 train_time: 7.8m tok/s: 6813742 +4068/20000 train_loss: 2.4797 train_time: 7.8m tok/s: 6813409 +4069/20000 train_loss: 2.3650 train_time: 7.8m tok/s: 6813049 +4070/20000 train_loss: 2.3680 train_time: 7.8m tok/s: 6812690 +4071/20000 train_loss: 2.7196 train_time: 7.8m tok/s: 6812310 +4072/20000 train_loss: 2.4502 train_time: 7.8m tok/s: 6811968 +4073/20000 train_loss: 2.4491 train_time: 7.8m tok/s: 6811593 +4074/20000 train_loss: 2.4050 train_time: 7.8m tok/s: 6811237 +4075/20000 train_loss: 2.5574 train_time: 7.8m tok/s: 6810882 +4076/20000 train_loss: 2.5832 train_time: 7.8m tok/s: 6810537 +4077/20000 train_loss: 2.4930 train_time: 7.8m tok/s: 6810189 +4078/20000 train_loss: 2.3754 train_time: 7.8m tok/s: 6809833 +4079/20000 train_loss: 3.0116 train_time: 7.9m tok/s: 6809409 +4080/20000 train_loss: 2.3635 train_time: 7.9m tok/s: 6809052 +4081/20000 train_loss: 2.3903 train_time: 7.9m tok/s: 6808707 +4082/20000 train_loss: 2.4031 train_time: 7.9m tok/s: 6808370 +4083/20000 train_loss: 2.3157 train_time: 7.9m tok/s: 6808037 +4084/20000 train_loss: 2.2543 train_time: 7.9m tok/s: 6807685 +4085/20000 train_loss: 2.3530 train_time: 7.9m tok/s: 6807308 +4086/20000 train_loss: 2.4932 train_time: 7.9m tok/s: 6806960 +4087/20000 train_loss: 2.5314 train_time: 7.9m tok/s: 6806618 +4088/20000 train_loss: 2.5333 train_time: 7.9m tok/s: 6806276 +4089/20000 train_loss: 2.4444 train_time: 7.9m tok/s: 6805933 +4090/20000 train_loss: 2.3707 train_time: 7.9m tok/s: 6805586 +4091/20000 train_loss: 2.5203 train_time: 7.9m tok/s: 6805236 +4092/20000 train_loss: 2.5277 train_time: 7.9m tok/s: 6804860 +4093/20000 train_loss: 2.4958 train_time: 7.9m tok/s: 6804511 +4094/20000 train_loss: 2.2944 train_time: 7.9m tok/s: 6804136 +4095/20000 train_loss: 2.4483 train_time: 7.9m tok/s: 6803784 +4096/20000 train_loss: 2.2595 train_time: 7.9m tok/s: 6803423 +4097/20000 train_loss: 2.3059 train_time: 7.9m tok/s: 6803083 +4098/20000 train_loss: 2.4173 train_time: 7.9m tok/s: 6802768 +4099/20000 train_loss: 2.1911 train_time: 7.9m tok/s: 6802385 +4100/20000 train_loss: 2.4199 train_time: 7.9m tok/s: 6802021 +4101/20000 train_loss: 2.5312 train_time: 7.9m tok/s: 6801705 +4102/20000 train_loss: 2.2882 train_time: 7.9m tok/s: 6801334 +4103/20000 train_loss: 2.4363 train_time: 7.9m tok/s: 6801001 +4104/20000 train_loss: 2.3843 train_time: 7.9m tok/s: 6800668 +4105/20000 train_loss: 2.4328 train_time: 7.9m tok/s: 6800262 +4106/20000 train_loss: 2.4629 train_time: 7.9m tok/s: 6799954 +4107/20000 train_loss: 2.4071 train_time: 7.9m tok/s: 6799614 +4108/20000 train_loss: 2.4177 train_time: 7.9m tok/s: 6799275 +4109/20000 train_loss: 2.3919 train_time: 7.9m tok/s: 6798943 +4110/20000 train_loss: 2.3435 train_time: 7.9m tok/s: 6798589 +4111/20000 train_loss: 2.3976 train_time: 7.9m tok/s: 6798254 +4112/20000 train_loss: 2.4437 train_time: 7.9m tok/s: 6797899 +4113/20000 train_loss: 2.4202 train_time: 7.9m tok/s: 6797550 +4114/20000 train_loss: 2.3915 train_time: 7.9m tok/s: 6797212 +4115/20000 train_loss: 2.5271 train_time: 7.9m tok/s: 6796871 +4116/20000 train_loss: 2.3672 train_time: 7.9m tok/s: 6796521 +4117/20000 train_loss: 2.3818 train_time: 7.9m tok/s: 6796170 +4118/20000 train_loss: 2.4307 train_time: 7.9m tok/s: 6795828 +4119/20000 train_loss: 2.4647 train_time: 7.9m tok/s: 6795475 +4120/20000 train_loss: 2.4346 train_time: 7.9m tok/s: 6795118 +4121/20000 train_loss: 2.4027 train_time: 7.9m tok/s: 6794746 +4122/20000 train_loss: 2.2857 train_time: 8.0m tok/s: 6794411 +4123/20000 train_loss: 2.3857 train_time: 8.0m tok/s: 6794072 +4124/20000 train_loss: 2.2550 train_time: 8.0m tok/s: 6793720 +4125/20000 train_loss: 2.4531 train_time: 8.0m tok/s: 6793374 +4126/20000 train_loss: 2.3729 train_time: 8.0m tok/s: 6793033 +4127/20000 train_loss: 2.3760 train_time: 8.0m tok/s: 6792704 +4128/20000 train_loss: 2.4355 train_time: 8.0m tok/s: 6792367 +4129/20000 train_loss: 2.4616 train_time: 8.0m tok/s: 6792024 +4130/20000 train_loss: 2.3970 train_time: 8.0m tok/s: 6791679 +4131/20000 train_loss: 2.4507 train_time: 8.0m tok/s: 6791334 +4132/20000 train_loss: 2.4222 train_time: 8.0m tok/s: 6791002 +4133/20000 train_loss: 2.4381 train_time: 8.0m tok/s: 6790631 +4134/20000 train_loss: 2.4765 train_time: 8.0m tok/s: 6790283 +4135/20000 train_loss: 2.4691 train_time: 8.0m tok/s: 6789955 +4136/20000 train_loss: 2.2932 train_time: 8.0m tok/s: 6789612 +4137/20000 train_loss: 2.4980 train_time: 8.0m tok/s: 6789299 +4138/20000 train_loss: 2.3044 train_time: 8.0m tok/s: 6788946 +4139/20000 train_loss: 2.4532 train_time: 8.0m tok/s: 6788592 +4140/20000 train_loss: 2.5205 train_time: 8.0m tok/s: 6788232 +4141/20000 train_loss: 2.3312 train_time: 8.0m tok/s: 6787897 +4142/20000 train_loss: 2.5032 train_time: 8.0m tok/s: 6787546 +4143/20000 train_loss: 2.4041 train_time: 8.0m tok/s: 6787232 +4144/20000 train_loss: 2.5081 train_time: 8.0m tok/s: 6786894 +4145/20000 train_loss: 2.2718 train_time: 8.0m tok/s: 6786518 +4146/20000 train_loss: 2.4525 train_time: 8.0m tok/s: 6786183 +4147/20000 train_loss: 2.5416 train_time: 8.0m tok/s: 6785852 +4148/20000 train_loss: 2.4351 train_time: 8.0m tok/s: 6785531 +4149/20000 train_loss: 2.2961 train_time: 8.0m tok/s: 6785192 +4150/20000 train_loss: 2.3497 train_time: 8.0m tok/s: 6784843 +4151/20000 train_loss: 2.4356 train_time: 8.0m tok/s: 6784515 +4152/20000 train_loss: 2.4178 train_time: 8.0m tok/s: 6784177 +4153/20000 train_loss: 2.5047 train_time: 8.0m tok/s: 6783842 +4154/20000 train_loss: 2.3565 train_time: 8.0m tok/s: 6783501 +4155/20000 train_loss: 2.4932 train_time: 8.0m tok/s: 6783183 +4156/20000 train_loss: 2.4259 train_time: 8.0m tok/s: 6782857 +4157/20000 train_loss: 2.4856 train_time: 8.0m tok/s: 6782541 +4158/20000 train_loss: 2.4438 train_time: 8.0m tok/s: 6782215 +4159/20000 train_loss: 2.3457 train_time: 8.0m tok/s: 6781869 +4160/20000 train_loss: 2.3518 train_time: 8.0m tok/s: 6781537 +4161/20000 train_loss: 2.4066 train_time: 8.0m tok/s: 6781208 +4162/20000 train_loss: 2.3186 train_time: 8.0m tok/s: 6780870 +4163/20000 train_loss: 2.5220 train_time: 8.0m tok/s: 6780526 +4164/20000 train_loss: 2.4399 train_time: 8.0m tok/s: 6780213 +4165/20000 train_loss: 2.4417 train_time: 8.1m tok/s: 6779894 +4166/20000 train_loss: 2.4449 train_time: 8.1m tok/s: 6779553 +4167/20000 train_loss: 2.5228 train_time: 8.1m tok/s: 6779223 +4168/20000 train_loss: 2.4345 train_time: 8.1m tok/s: 6778897 +4169/20000 train_loss: 2.4171 train_time: 8.1m tok/s: 6778576 +4170/20000 train_loss: 2.3374 train_time: 8.1m tok/s: 6778237 +4171/20000 train_loss: 2.4369 train_time: 8.1m tok/s: 6777911 +4172/20000 train_loss: 2.3924 train_time: 8.1m tok/s: 6777581 +4173/20000 train_loss: 2.5197 train_time: 8.1m tok/s: 6777257 +4174/20000 train_loss: 2.3053 train_time: 8.1m tok/s: 6776926 +4175/20000 train_loss: 2.4024 train_time: 8.1m tok/s: 6776598 +4176/20000 train_loss: 2.5054 train_time: 8.1m tok/s: 6776281 +4177/20000 train_loss: 2.3773 train_time: 8.1m tok/s: 6775970 +4178/20000 train_loss: 2.4486 train_time: 8.1m tok/s: 6775667 +4179/20000 train_loss: 2.3986 train_time: 8.1m tok/s: 6775333 +4180/20000 train_loss: 2.3862 train_time: 8.1m tok/s: 6774999 +4181/20000 train_loss: 2.5434 train_time: 8.1m tok/s: 6774677 +4182/20000 train_loss: 2.4704 train_time: 8.1m tok/s: 6774340 +4183/20000 train_loss: 2.4604 train_time: 8.1m tok/s: 6774014 +4184/20000 train_loss: 2.4723 train_time: 8.1m tok/s: 6773675 +4185/20000 train_loss: 2.5698 train_time: 8.1m tok/s: 6773341 +4186/20000 train_loss: 2.4666 train_time: 8.1m tok/s: 6773019 +4187/20000 train_loss: 2.2912 train_time: 8.1m tok/s: 6772690 +4188/20000 train_loss: 2.4573 train_time: 8.1m tok/s: 6772332 +4189/20000 train_loss: 2.4432 train_time: 8.1m tok/s: 6772001 +4190/20000 train_loss: 2.5328 train_time: 8.1m tok/s: 6771689 +4191/20000 train_loss: 2.4201 train_time: 8.1m tok/s: 6771376 +4192/20000 train_loss: 2.3685 train_time: 8.1m tok/s: 6771038 +4193/20000 train_loss: 2.4028 train_time: 8.1m tok/s: 6770723 +4194/20000 train_loss: 2.4676 train_time: 8.1m tok/s: 6770374 +4195/20000 train_loss: 2.4824 train_time: 8.1m tok/s: 6770028 +4196/20000 train_loss: 2.3096 train_time: 8.1m tok/s: 6769694 +4197/20000 train_loss: 2.3249 train_time: 8.1m tok/s: 6769355 +4198/20000 train_loss: 2.4285 train_time: 8.1m tok/s: 6769022 +4199/20000 train_loss: 2.2411 train_time: 8.1m tok/s: 6768698 +4200/20000 train_loss: 2.4335 train_time: 8.1m tok/s: 6768348 +4201/20000 train_loss: 2.3659 train_time: 8.1m tok/s: 6768013 +4202/20000 train_loss: 2.4515 train_time: 8.1m tok/s: 6767683 +4203/20000 train_loss: 2.5870 train_time: 8.1m tok/s: 6767361 +4204/20000 train_loss: 2.4889 train_time: 8.1m tok/s: 6767052 +4205/20000 train_loss: 2.4659 train_time: 8.1m tok/s: 6766710 +4206/20000 train_loss: 2.3878 train_time: 8.1m tok/s: 6766388 +4207/20000 train_loss: 2.4054 train_time: 8.1m tok/s: 6766070 +4208/20000 train_loss: 2.3865 train_time: 8.2m tok/s: 6765739 +4209/20000 train_loss: 2.3794 train_time: 8.2m tok/s: 6765418 +4210/20000 train_loss: 2.3809 train_time: 8.2m tok/s: 6765099 +4211/20000 train_loss: 2.5536 train_time: 8.2m tok/s: 6764752 +4212/20000 train_loss: 2.3714 train_time: 8.2m tok/s: 6764395 +4213/20000 train_loss: 2.3961 train_time: 8.2m tok/s: 6764096 +4214/20000 train_loss: 2.4199 train_time: 8.2m tok/s: 6763766 +4215/20000 train_loss: 2.4491 train_time: 8.2m tok/s: 6763463 +4216/20000 train_loss: 2.5494 train_time: 8.2m tok/s: 6763135 +4217/20000 train_loss: 2.2983 train_time: 8.2m tok/s: 6762798 +4218/20000 train_loss: 2.4529 train_time: 8.2m tok/s: 6762462 +4219/20000 train_loss: 2.3814 train_time: 8.2m tok/s: 6762118 +4220/20000 train_loss: 2.0826 train_time: 8.2m tok/s: 6761800 +4221/20000 train_loss: 2.4683 train_time: 8.2m tok/s: 6761480 +4222/20000 train_loss: 2.4651 train_time: 8.2m tok/s: 6761170 +4223/20000 train_loss: 2.5645 train_time: 8.2m tok/s: 6760838 +4224/20000 train_loss: 2.4495 train_time: 8.2m tok/s: 6760504 +4225/20000 train_loss: 2.4559 train_time: 8.2m tok/s: 6760171 +4226/20000 train_loss: 2.4249 train_time: 8.2m tok/s: 6759842 +4227/20000 train_loss: 2.4637 train_time: 8.2m tok/s: 6759505 +4228/20000 train_loss: 2.4434 train_time: 8.2m tok/s: 6759154 +4229/20000 train_loss: 2.4399 train_time: 8.2m tok/s: 6758676 +4230/20000 train_loss: 2.3728 train_time: 8.2m tok/s: 6758319 +4231/20000 train_loss: 2.3631 train_time: 8.2m tok/s: 6758021 +4232/20000 train_loss: 2.4057 train_time: 8.2m tok/s: 6757521 +4233/20000 train_loss: 2.5433 train_time: 8.2m tok/s: 6757213 +4234/20000 train_loss: 2.3417 train_time: 8.2m tok/s: 6756790 +4235/20000 train_loss: 2.5135 train_time: 8.2m tok/s: 6756453 +4236/20000 train_loss: 2.5582 train_time: 8.2m tok/s: 6756092 +4237/20000 train_loss: 2.4336 train_time: 8.2m tok/s: 6755764 +4238/20000 train_loss: 2.5923 train_time: 8.2m tok/s: 6755407 +4239/20000 train_loss: 2.4940 train_time: 8.2m tok/s: 6755077 +4240/20000 train_loss: 2.5675 train_time: 8.2m tok/s: 6754725 +4241/20000 train_loss: 2.3551 train_time: 8.2m tok/s: 6754353 +4242/20000 train_loss: 2.4594 train_time: 8.2m tok/s: 6754008 +4243/20000 train_loss: 2.4047 train_time: 8.2m tok/s: 6753622 +4244/20000 train_loss: 2.4563 train_time: 8.2m tok/s: 6753313 +4245/20000 train_loss: 2.4492 train_time: 8.2m tok/s: 6752832 +4246/20000 train_loss: 2.4946 train_time: 8.2m tok/s: 6752523 +4247/20000 train_loss: 2.3696 train_time: 8.2m tok/s: 6752180 +4248/20000 train_loss: 2.4087 train_time: 8.2m tok/s: 6751870 +4249/20000 train_loss: 2.4281 train_time: 8.2m tok/s: 6751538 +4250/20000 train_loss: 2.3873 train_time: 8.3m tok/s: 6751230 +4251/20000 train_loss: 2.4059 train_time: 8.3m tok/s: 6750905 +4252/20000 train_loss: 2.4843 train_time: 8.3m tok/s: 6750615 +4253/20000 train_loss: 2.5384 train_time: 8.3m tok/s: 6750277 +4254/20000 train_loss: 2.6599 train_time: 8.3m tok/s: 6749960 +4255/20000 train_loss: 2.4473 train_time: 8.3m tok/s: 6749671 +4256/20000 train_loss: 2.5086 train_time: 8.3m tok/s: 6749347 +4257/20000 train_loss: 2.5233 train_time: 8.3m tok/s: 6749032 +4258/20000 train_loss: 2.3286 train_time: 8.3m tok/s: 6748705 +4259/20000 train_loss: 2.4181 train_time: 8.3m tok/s: 6748375 +4260/20000 train_loss: 2.3916 train_time: 8.3m tok/s: 6748070 +4261/20000 train_loss: 2.2910 train_time: 8.3m tok/s: 6747752 +4262/20000 train_loss: 2.3727 train_time: 8.3m tok/s: 6747462 +4263/20000 train_loss: 2.3348 train_time: 8.3m tok/s: 6747142 +4264/20000 train_loss: 2.3663 train_time: 8.3m tok/s: 6746808 +4265/20000 train_loss: 2.4364 train_time: 8.3m tok/s: 6746500 +4266/20000 train_loss: 2.5118 train_time: 8.3m tok/s: 6746165 +4267/20000 train_loss: 2.4037 train_time: 8.3m tok/s: 6745863 +4268/20000 train_loss: 2.4635 train_time: 8.3m tok/s: 6745536 +4269/20000 train_loss: 2.4039 train_time: 8.3m tok/s: 6745228 +4270/20000 train_loss: 2.3166 train_time: 8.3m tok/s: 6744927 +4271/20000 train_loss: 2.4268 train_time: 8.3m tok/s: 6744614 +4272/20000 train_loss: 2.3836 train_time: 8.3m tok/s: 6744289 +4273/20000 train_loss: 2.4074 train_time: 8.3m tok/s: 6743956 +4274/20000 train_loss: 2.4539 train_time: 8.3m tok/s: 6743608 +4275/20000 train_loss: 2.3017 train_time: 8.3m tok/s: 6743294 +4276/20000 train_loss: 2.3849 train_time: 8.3m tok/s: 6742988 +4277/20000 train_loss: 2.3573 train_time: 8.3m tok/s: 6742687 +4278/20000 train_loss: 2.3804 train_time: 8.3m tok/s: 6742365 +4279/20000 train_loss: 2.3683 train_time: 8.3m tok/s: 6742047 +4280/20000 train_loss: 2.5229 train_time: 8.3m tok/s: 6741748 +4281/20000 train_loss: 2.6019 train_time: 8.3m tok/s: 6741447 +4282/20000 train_loss: 2.4264 train_time: 8.3m tok/s: 6741122 +4283/20000 train_loss: 2.5997 train_time: 8.3m tok/s: 6740808 +4284/20000 train_loss: 2.5164 train_time: 8.3m tok/s: 6740508 +4285/20000 train_loss: 2.3901 train_time: 8.3m tok/s: 6740202 +4286/20000 train_loss: 2.4813 train_time: 8.3m tok/s: 6739863 +4287/20000 train_loss: 2.3717 train_time: 8.3m tok/s: 6739535 +4288/20000 train_loss: 2.3968 train_time: 8.3m tok/s: 6739233 +4289/20000 train_loss: 2.5699 train_time: 8.3m tok/s: 6738856 +4290/20000 train_loss: 2.3607 train_time: 8.3m tok/s: 6738573 +4291/20000 train_loss: 2.3186 train_time: 8.3m tok/s: 6738280 +4292/20000 train_loss: 2.4555 train_time: 8.3m tok/s: 6737972 +4293/20000 train_loss: 2.4250 train_time: 8.4m tok/s: 6737648 +4294/20000 train_loss: 2.3751 train_time: 8.4m tok/s: 6737340 +4295/20000 train_loss: 2.4169 train_time: 8.4m tok/s: 6737034 +4296/20000 train_loss: 2.2180 train_time: 8.4m tok/s: 6736706 +4297/20000 train_loss: 2.4765 train_time: 8.4m tok/s: 6736412 +4298/20000 train_loss: 2.4062 train_time: 8.4m tok/s: 6736100 +4299/20000 train_loss: 2.3432 train_time: 8.4m tok/s: 6735778 +4300/20000 train_loss: 2.5165 train_time: 8.4m tok/s: 6735469 +4301/20000 train_loss: 2.1848 train_time: 8.4m tok/s: 6735117 +4302/20000 train_loss: 2.3400 train_time: 8.4m tok/s: 6734797 +4303/20000 train_loss: 2.3308 train_time: 8.4m tok/s: 6734503 +4304/20000 train_loss: 2.3514 train_time: 8.4m tok/s: 6734208 +4305/20000 train_loss: 2.3303 train_time: 8.4m tok/s: 6733905 +4306/20000 train_loss: 2.3861 train_time: 8.4m tok/s: 6733574 +4307/20000 train_loss: 2.3377 train_time: 8.4m tok/s: 6733275 +4308/20000 train_loss: 2.3711 train_time: 8.4m tok/s: 6732969 +4309/20000 train_loss: 2.5788 train_time: 8.4m tok/s: 6732656 +4310/20000 train_loss: 2.4527 train_time: 8.4m tok/s: 6732334 +4311/20000 train_loss: 2.5382 train_time: 8.4m tok/s: 6732054 +4312/20000 train_loss: 2.4152 train_time: 8.4m tok/s: 6731755 +4313/20000 train_loss: 2.2540 train_time: 8.4m tok/s: 6731439 +4314/20000 train_loss: 2.4276 train_time: 8.4m tok/s: 6731120 +4315/20000 train_loss: 2.4035 train_time: 8.4m tok/s: 6730830 +4316/20000 train_loss: 2.4145 train_time: 8.4m tok/s: 6730510 +4317/20000 train_loss: 2.4880 train_time: 8.4m tok/s: 6730193 +4318/20000 train_loss: 2.4834 train_time: 8.4m tok/s: 6729871 +4319/20000 train_loss: 2.1766 train_time: 8.4m tok/s: 6729549 +4320/20000 train_loss: 2.2638 train_time: 8.4m tok/s: 6729257 +4321/20000 train_loss: 2.3716 train_time: 8.4m tok/s: 6728956 +4322/20000 train_loss: 2.4197 train_time: 8.4m tok/s: 6728648 +4323/20000 train_loss: 2.4112 train_time: 8.4m tok/s: 6728352 +4324/20000 train_loss: 2.4374 train_time: 8.4m tok/s: 6728050 +4325/20000 train_loss: 2.4597 train_time: 8.4m tok/s: 6727766 +4326/20000 train_loss: 2.3643 train_time: 8.4m tok/s: 6727465 +4327/20000 train_loss: 2.4084 train_time: 8.4m tok/s: 6727172 +4328/20000 train_loss: 2.3819 train_time: 8.4m tok/s: 6726879 +4329/20000 train_loss: 2.4348 train_time: 8.4m tok/s: 6726591 +4330/20000 train_loss: 2.3009 train_time: 8.4m tok/s: 6726272 +4331/20000 train_loss: 2.3838 train_time: 8.4m tok/s: 6725974 +4332/20000 train_loss: 2.9158 train_time: 8.4m tok/s: 6725635 +4333/20000 train_loss: 2.3827 train_time: 8.4m tok/s: 6725335 +4334/20000 train_loss: 2.4441 train_time: 8.4m tok/s: 6725023 +4335/20000 train_loss: 2.5361 train_time: 8.4m tok/s: 6724714 +4336/20000 train_loss: 2.4145 train_time: 8.5m tok/s: 6724408 +4337/20000 train_loss: 2.5032 train_time: 8.5m tok/s: 6724125 +4338/20000 train_loss: 2.5365 train_time: 8.5m tok/s: 6723824 +4339/20000 train_loss: 2.4431 train_time: 8.5m tok/s: 6723516 +4340/20000 train_loss: 2.4233 train_time: 8.5m tok/s: 6723230 +4341/20000 train_loss: 2.4107 train_time: 8.5m tok/s: 6722949 +4342/20000 train_loss: 2.5157 train_time: 8.5m tok/s: 6722668 +4343/20000 train_loss: 2.4872 train_time: 8.5m tok/s: 6722365 +4344/20000 train_loss: 2.4865 train_time: 8.5m tok/s: 6722041 +4345/20000 train_loss: 2.3206 train_time: 8.5m tok/s: 6721725 +4346/20000 train_loss: 2.3915 train_time: 8.5m tok/s: 6721439 +4347/20000 train_loss: 2.4026 train_time: 8.5m tok/s: 6721161 +4348/20000 train_loss: 2.3976 train_time: 8.5m tok/s: 6720858 +4349/20000 train_loss: 1.8610 train_time: 8.5m tok/s: 6720515 +4350/20000 train_loss: 2.2267 train_time: 8.5m tok/s: 6720213 +4351/20000 train_loss: 2.4175 train_time: 8.5m tok/s: 6719933 +4352/20000 train_loss: 2.3706 train_time: 8.5m tok/s: 6719615 +4353/20000 train_loss: 2.4613 train_time: 8.5m tok/s: 6719340 +4354/20000 train_loss: 2.4761 train_time: 8.5m tok/s: 6719045 +4355/20000 train_loss: 2.4059 train_time: 8.5m tok/s: 6718778 +4356/20000 train_loss: 2.4682 train_time: 8.5m tok/s: 6718480 +4357/20000 train_loss: 2.5053 train_time: 8.5m tok/s: 6718173 +4358/20000 train_loss: 2.4054 train_time: 8.5m tok/s: 6717903 +4359/20000 train_loss: 2.4193 train_time: 8.5m tok/s: 6717596 +4360/20000 train_loss: 2.3197 train_time: 8.5m tok/s: 6717317 +4361/20000 train_loss: 2.2813 train_time: 8.5m tok/s: 6717027 +4362/20000 train_loss: 2.4558 train_time: 8.5m tok/s: 6716724 +4363/20000 train_loss: 2.2143 train_time: 8.5m tok/s: 6716406 +4364/20000 train_loss: 2.3503 train_time: 8.5m tok/s: 6716108 +4365/20000 train_loss: 2.5505 train_time: 8.5m tok/s: 6715827 +4366/20000 train_loss: 2.7721 train_time: 8.5m tok/s: 6715518 +4367/20000 train_loss: 2.5278 train_time: 8.5m tok/s: 6715227 +4368/20000 train_loss: 2.5053 train_time: 8.5m tok/s: 6714924 +4369/20000 train_loss: 2.3122 train_time: 8.5m tok/s: 6714618 +4370/20000 train_loss: 2.4949 train_time: 8.5m tok/s: 6714335 +4371/20000 train_loss: 2.3880 train_time: 8.5m tok/s: 6714022 +4372/20000 train_loss: 2.4783 train_time: 8.5m tok/s: 6713714 +4373/20000 train_loss: 2.2524 train_time: 8.5m tok/s: 6713408 +4374/20000 train_loss: 2.3128 train_time: 8.5m tok/s: 6713112 +4375/20000 train_loss: 2.4045 train_time: 8.5m tok/s: 6712816 +4376/20000 train_loss: 2.3834 train_time: 8.5m tok/s: 6712490 +4377/20000 train_loss: 2.5521 train_time: 8.5m tok/s: 6712204 +4378/20000 train_loss: 2.2660 train_time: 8.5m tok/s: 6711893 +4379/20000 train_loss: 2.3272 train_time: 8.6m tok/s: 6711617 +4380/20000 train_loss: 2.4335 train_time: 8.6m tok/s: 6711340 +4381/20000 train_loss: 2.5031 train_time: 8.6m tok/s: 6711068 +4382/20000 train_loss: 2.4871 train_time: 8.6m tok/s: 6710769 +4383/20000 train_loss: 2.4339 train_time: 8.6m tok/s: 6710451 +4384/20000 train_loss: 2.4952 train_time: 8.6m tok/s: 6710157 +4385/20000 train_loss: 2.4210 train_time: 8.6m tok/s: 6709852 +4386/20000 train_loss: 2.3982 train_time: 8.6m tok/s: 6709553 +4387/20000 train_loss: 2.4682 train_time: 8.6m tok/s: 6709247 +4388/20000 train_loss: 2.3857 train_time: 8.6m tok/s: 6708956 +4389/20000 train_loss: 2.2617 train_time: 8.6m tok/s: 6708661 +4390/20000 train_loss: 2.4042 train_time: 8.6m tok/s: 6708361 +4391/20000 train_loss: 2.2865 train_time: 8.6m tok/s: 6708058 +4392/20000 train_loss: 2.3929 train_time: 8.6m tok/s: 6707767 +4393/20000 train_loss: 2.3914 train_time: 8.6m tok/s: 6707459 +4394/20000 train_loss: 2.3708 train_time: 8.6m tok/s: 6707175 +4395/20000 train_loss: 2.3400 train_time: 8.6m tok/s: 6706891 +4396/20000 train_loss: 2.5816 train_time: 8.6m tok/s: 6706585 +4397/20000 train_loss: 2.4130 train_time: 8.6m tok/s: 6706301 +4398/20000 train_loss: 2.3429 train_time: 8.6m tok/s: 6706019 +4399/20000 train_loss: 2.3590 train_time: 8.6m tok/s: 6705737 +4400/20000 train_loss: 2.5065 train_time: 8.6m tok/s: 6705454 +4401/20000 train_loss: 2.4328 train_time: 8.6m tok/s: 6705171 +4402/20000 train_loss: 2.3784 train_time: 8.6m tok/s: 6704890 +4403/20000 train_loss: 2.3663 train_time: 8.6m tok/s: 6704606 +4404/20000 train_loss: 2.3194 train_time: 8.6m tok/s: 6704327 +4405/20000 train_loss: 2.5378 train_time: 8.6m tok/s: 6704032 +4406/20000 train_loss: 2.4058 train_time: 8.6m tok/s: 6703741 +4407/20000 train_loss: 2.4497 train_time: 8.6m tok/s: 6703462 +4408/20000 train_loss: 2.4758 train_time: 8.6m tok/s: 6703181 +4409/20000 train_loss: 2.4200 train_time: 8.6m tok/s: 6702915 +4410/20000 train_loss: 2.4457 train_time: 8.6m tok/s: 6702639 +4411/20000 train_loss: 2.4640 train_time: 8.6m tok/s: 6702387 +4412/20000 train_loss: 2.5236 train_time: 8.6m tok/s: 6702091 +4413/20000 train_loss: 2.3427 train_time: 8.6m tok/s: 6701799 +4414/20000 train_loss: 2.3517 train_time: 8.6m tok/s: 6701519 +4415/20000 train_loss: 2.5635 train_time: 8.6m tok/s: 6701197 +4416/20000 train_loss: 2.3844 train_time: 8.6m tok/s: 6700905 +4417/20000 train_loss: 2.2951 train_time: 8.6m tok/s: 6700616 +4418/20000 train_loss: 2.3700 train_time: 8.6m tok/s: 6700318 +4419/20000 train_loss: 2.3696 train_time: 8.6m tok/s: 6700043 +4420/20000 train_loss: 2.2846 train_time: 8.6m tok/s: 6699763 +4421/20000 train_loss: 2.2516 train_time: 8.6m tok/s: 6699475 +4422/20000 train_loss: 2.3867 train_time: 8.7m tok/s: 6699189 +4423/20000 train_loss: 2.5335 train_time: 8.7m tok/s: 6698885 +4424/20000 train_loss: 2.4636 train_time: 8.7m tok/s: 6698620 +4425/20000 train_loss: 2.5000 train_time: 8.7m tok/s: 6698338 +4426/20000 train_loss: 2.3037 train_time: 8.7m tok/s: 6698062 +4427/20000 train_loss: 2.3325 train_time: 8.7m tok/s: 6697773 +4428/20000 train_loss: 2.3992 train_time: 8.7m tok/s: 6697487 +4429/20000 train_loss: 2.4078 train_time: 8.7m tok/s: 6697205 +4430/20000 train_loss: 2.5593 train_time: 8.7m tok/s: 6696892 +4431/20000 train_loss: 2.4652 train_time: 8.7m tok/s: 6696552 +4432/20000 train_loss: 2.3570 train_time: 8.7m tok/s: 6696262 +4433/20000 train_loss: 2.2747 train_time: 8.7m tok/s: 6695970 +4434/20000 train_loss: 2.5701 train_time: 8.7m tok/s: 6695659 +4435/20000 train_loss: 2.4319 train_time: 8.7m tok/s: 6695376 +4436/20000 train_loss: 2.4043 train_time: 8.7m tok/s: 6695092 +4437/20000 train_loss: 2.2318 train_time: 8.7m tok/s: 6694798 +4438/20000 train_loss: 2.5273 train_time: 8.7m tok/s: 6694519 +4439/20000 train_loss: 2.4861 train_time: 8.7m tok/s: 6694225 +4440/20000 train_loss: 2.4758 train_time: 8.7m tok/s: 6693950 +4441/20000 train_loss: 2.3512 train_time: 8.7m tok/s: 6693680 +4442/20000 train_loss: 2.4496 train_time: 8.7m tok/s: 6693385 +4443/20000 train_loss: 2.5191 train_time: 8.7m tok/s: 6693119 +4444/20000 train_loss: 2.3710 train_time: 8.7m tok/s: 6692805 +4445/20000 train_loss: 2.3881 train_time: 8.7m tok/s: 6692527 +4446/20000 train_loss: 2.3101 train_time: 8.7m tok/s: 6692244 +4447/20000 train_loss: 2.3631 train_time: 8.7m tok/s: 6691951 +4448/20000 train_loss: 2.3067 train_time: 8.7m tok/s: 6691652 +4449/20000 train_loss: 2.5094 train_time: 8.7m tok/s: 6691358 +4450/20000 train_loss: 2.3468 train_time: 8.7m tok/s: 6691084 +4451/20000 train_loss: 2.4480 train_time: 8.7m tok/s: 6690788 +4452/20000 train_loss: 2.3652 train_time: 8.7m tok/s: 6690512 +4453/20000 train_loss: 2.4747 train_time: 8.7m tok/s: 6690217 +4454/20000 train_loss: 2.3302 train_time: 8.7m tok/s: 6689932 +4455/20000 train_loss: 2.4053 train_time: 8.7m tok/s: 6689656 +4456/20000 train_loss: 2.4194 train_time: 8.7m tok/s: 6689361 +4457/20000 train_loss: 2.6127 train_time: 8.7m tok/s: 6689091 +4458/20000 train_loss: 2.3862 train_time: 8.7m tok/s: 6688819 +4459/20000 train_loss: 2.2665 train_time: 8.7m tok/s: 6688528 +4460/20000 train_loss: 2.3211 train_time: 8.7m tok/s: 6688215 +4461/20000 train_loss: 2.3537 train_time: 8.7m tok/s: 6687931 +4462/20000 train_loss: 2.4765 train_time: 8.7m tok/s: 6687653 +4463/20000 train_loss: 2.2405 train_time: 8.7m tok/s: 6687334 +4464/20000 train_loss: 2.4953 train_time: 8.7m tok/s: 6687076 +4465/20000 train_loss: 2.4583 train_time: 8.8m tok/s: 6686788 +4466/20000 train_loss: 2.5095 train_time: 8.8m tok/s: 6686506 +4467/20000 train_loss: 2.4530 train_time: 8.8m tok/s: 6686235 +4468/20000 train_loss: 2.2166 train_time: 8.8m tok/s: 6685917 +4469/20000 train_loss: 2.3555 train_time: 8.8m tok/s: 6685606 +4470/20000 train_loss: 2.3491 train_time: 8.8m tok/s: 6685343 +4471/20000 train_loss: 2.4330 train_time: 8.8m tok/s: 6685070 +4472/20000 train_loss: 2.3990 train_time: 8.8m tok/s: 6684779 +4473/20000 train_loss: 2.5104 train_time: 8.8m tok/s: 6684495 +4474/20000 train_loss: 2.3578 train_time: 8.8m tok/s: 6684206 +4475/20000 train_loss: 2.4741 train_time: 8.8m tok/s: 6683929 +4476/20000 train_loss: 2.4173 train_time: 8.8m tok/s: 6683656 +4477/20000 train_loss: 2.3421 train_time: 8.8m tok/s: 6683360 +4478/20000 train_loss: 2.3640 train_time: 8.8m tok/s: 6683092 +4479/20000 train_loss: 2.4296 train_time: 8.8m tok/s: 6682820 +4480/20000 train_loss: 2.3867 train_time: 8.8m tok/s: 6682538 +4481/20000 train_loss: 2.4155 train_time: 8.8m tok/s: 6682280 +4482/20000 train_loss: 2.3762 train_time: 8.8m tok/s: 6681987 +4483/20000 train_loss: 2.4165 train_time: 8.8m tok/s: 6681707 +4484/20000 train_loss: 2.4974 train_time: 8.8m tok/s: 6681408 +4485/20000 train_loss: 2.3361 train_time: 8.8m tok/s: 6681121 +4486/20000 train_loss: 2.3906 train_time: 8.8m tok/s: 6680868 +4487/20000 train_loss: 2.3808 train_time: 8.8m tok/s: 6680568 +4488/20000 train_loss: 2.3757 train_time: 8.8m tok/s: 6680270 +4489/20000 train_loss: 2.2765 train_time: 8.8m tok/s: 6679985 +4490/20000 train_loss: 2.4607 train_time: 8.8m tok/s: 6679699 +4491/20000 train_loss: 2.3961 train_time: 8.8m tok/s: 6679412 +4492/20000 train_loss: 2.4482 train_time: 8.8m tok/s: 6679138 +4493/20000 train_loss: 2.4597 train_time: 8.8m tok/s: 6678865 +4494/20000 train_loss: 2.4887 train_time: 8.8m tok/s: 6678588 +4495/20000 train_loss: 2.4468 train_time: 8.8m tok/s: 6678309 +4496/20000 train_loss: 2.3280 train_time: 8.8m tok/s: 6678029 +4497/20000 train_loss: 2.4385 train_time: 8.8m tok/s: 6677742 +4498/20000 train_loss: 2.3704 train_time: 8.8m tok/s: 6677461 +4499/20000 train_loss: 2.3604 train_time: 8.8m tok/s: 6677186 +4500/20000 train_loss: 2.3922 train_time: 8.8m tok/s: 6676915 +4501/20000 train_loss: 2.0596 train_time: 8.8m tok/s: 6676590 +4502/20000 train_loss: 2.3531 train_time: 8.8m tok/s: 6676315 +4503/20000 train_loss: 2.2314 train_time: 8.8m tok/s: 6676049 +4504/20000 train_loss: 2.3194 train_time: 8.8m tok/s: 6675759 +4505/20000 train_loss: 2.3178 train_time: 8.8m tok/s: 6675474 +4506/20000 train_loss: 2.2608 train_time: 8.8m tok/s: 6675197 +4507/20000 train_loss: 2.3522 train_time: 8.9m tok/s: 6674926 +4508/20000 train_loss: 2.4639 train_time: 8.9m tok/s: 6674660 +4509/20000 train_loss: 2.4258 train_time: 8.9m tok/s: 6674388 +4510/20000 train_loss: 2.4530 train_time: 8.9m tok/s: 6674125 +4511/20000 train_loss: 2.2494 train_time: 8.9m tok/s: 6673846 +4512/20000 train_loss: 2.3563 train_time: 8.9m tok/s: 6673565 +4513/20000 train_loss: 2.3731 train_time: 8.9m tok/s: 6673278 +4514/20000 train_loss: 2.3872 train_time: 8.9m tok/s: 6672990 +4515/20000 train_loss: 2.3132 train_time: 8.9m tok/s: 6672716 +4516/20000 train_loss: 2.5643 train_time: 8.9m tok/s: 6672423 +4517/20000 train_loss: 2.6638 train_time: 8.9m tok/s: 6672138 +4518/20000 train_loss: 2.4458 train_time: 8.9m tok/s: 6671879 +4519/20000 train_loss: 2.3627 train_time: 8.9m tok/s: 6671618 +4520/20000 train_loss: 2.3099 train_time: 8.9m tok/s: 6671337 +4521/20000 train_loss: 2.4271 train_time: 8.9m tok/s: 6671055 +4522/20000 train_loss: 2.3445 train_time: 8.9m tok/s: 6670782 +4523/20000 train_loss: 2.4164 train_time: 8.9m tok/s: 6670499 +4524/20000 train_loss: 2.3124 train_time: 8.9m tok/s: 6670231 +4525/20000 train_loss: 2.4880 train_time: 8.9m tok/s: 6669979 +4526/20000 train_loss: 2.4301 train_time: 8.9m tok/s: 6669718 +4527/20000 train_loss: 2.4127 train_time: 8.9m tok/s: 6669444 +4528/20000 train_loss: 2.3276 train_time: 8.9m tok/s: 6669168 +4529/20000 train_loss: 2.3791 train_time: 8.9m tok/s: 6668875 +4530/20000 train_loss: 2.2905 train_time: 8.9m tok/s: 6668587 +4531/20000 train_loss: 2.5287 train_time: 8.9m tok/s: 6668292 +4532/20000 train_loss: 2.3981 train_time: 8.9m tok/s: 6668023 +4533/20000 train_loss: 2.2795 train_time: 8.9m tok/s: 6667772 +4534/20000 train_loss: 2.4270 train_time: 8.9m tok/s: 6667512 +4535/20000 train_loss: 2.4913 train_time: 8.9m tok/s: 6667230 +4536/20000 train_loss: 2.2861 train_time: 8.9m tok/s: 6666948 +4537/20000 train_loss: 2.3806 train_time: 8.9m tok/s: 6666659 +4538/20000 train_loss: 2.1558 train_time: 8.9m tok/s: 6666376 +4539/20000 train_loss: 2.4297 train_time: 8.9m tok/s: 6666110 +4540/20000 train_loss: 2.3672 train_time: 8.9m tok/s: 6665834 +4541/20000 train_loss: 2.2974 train_time: 8.9m tok/s: 6665566 +4542/20000 train_loss: 2.3860 train_time: 8.9m tok/s: 6665299 +4543/20000 train_loss: 2.2957 train_time: 8.9m tok/s: 6665019 +4544/20000 train_loss: 2.6316 train_time: 8.9m tok/s: 6664734 +4545/20000 train_loss: 2.3768 train_time: 8.9m tok/s: 6664476 +4546/20000 train_loss: 2.3791 train_time: 8.9m tok/s: 6664201 +4547/20000 train_loss: 2.3396 train_time: 8.9m tok/s: 6663912 +4548/20000 train_loss: 2.2313 train_time: 8.9m tok/s: 6663663 +4549/20000 train_loss: 2.4267 train_time: 8.9m tok/s: 6663399 +4550/20000 train_loss: 2.4522 train_time: 9.0m tok/s: 6663137 +4551/20000 train_loss: 2.3482 train_time: 9.0m tok/s: 6662893 +4552/20000 train_loss: 2.3445 train_time: 9.0m tok/s: 6662608 +4553/20000 train_loss: 2.2841 train_time: 9.0m tok/s: 6662362 +4554/20000 train_loss: 2.4559 train_time: 9.0m tok/s: 6662056 +4555/20000 train_loss: 2.4510 train_time: 9.0m tok/s: 6661751 +4556/20000 train_loss: 2.4111 train_time: 9.0m tok/s: 6661464 +4557/20000 train_loss: 2.4583 train_time: 9.0m tok/s: 6661206 +4558/20000 train_loss: 2.5731 train_time: 9.0m tok/s: 6660947 +4559/20000 train_loss: 2.4103 train_time: 9.0m tok/s: 6660674 +4560/20000 train_loss: 2.4260 train_time: 9.0m tok/s: 6660408 +4561/20000 train_loss: 2.4416 train_time: 9.0m tok/s: 6660161 +4562/20000 train_loss: 2.3871 train_time: 9.0m tok/s: 6659924 +4563/20000 train_loss: 2.4996 train_time: 9.0m tok/s: 6659641 +4564/20000 train_loss: 2.3199 train_time: 9.0m tok/s: 6659375 +4565/20000 train_loss: 2.3918 train_time: 9.0m tok/s: 6659119 +4566/20000 train_loss: 2.3614 train_time: 9.0m tok/s: 6658858 +4567/20000 train_loss: 2.2262 train_time: 9.0m tok/s: 6658611 +4568/20000 train_loss: 2.2567 train_time: 9.0m tok/s: 6658323 +4569/20000 train_loss: 2.4755 train_time: 9.0m tok/s: 6658062 +4570/20000 train_loss: 2.3418 train_time: 9.0m tok/s: 6657820 +4571/20000 train_loss: 2.4145 train_time: 9.0m tok/s: 6657565 +4572/20000 train_loss: 2.3961 train_time: 9.0m tok/s: 6657303 +4573/20000 train_loss: 2.3895 train_time: 9.0m tok/s: 6657045 +4574/20000 train_loss: 2.4003 train_time: 9.0m tok/s: 6656783 +4575/20000 train_loss: 2.2791 train_time: 9.0m tok/s: 6656516 +4576/20000 train_loss: 2.4175 train_time: 9.0m tok/s: 6656271 +4577/20000 train_loss: 2.4102 train_time: 9.0m tok/s: 6656011 +4578/20000 train_loss: 2.3658 train_time: 9.0m tok/s: 6655757 +4579/20000 train_loss: 2.4701 train_time: 9.0m tok/s: 6655467 +4580/20000 train_loss: 2.2497 train_time: 9.0m tok/s: 6655187 +4581/20000 train_loss: 1.9333 train_time: 9.0m tok/s: 6654883 +4582/20000 train_loss: 2.3701 train_time: 9.0m tok/s: 6654618 +4583/20000 train_loss: 2.4078 train_time: 9.0m tok/s: 6654390 +4584/20000 train_loss: 2.4583 train_time: 9.0m tok/s: 6654147 +4585/20000 train_loss: 2.3146 train_time: 9.0m tok/s: 6653882 +4586/20000 train_loss: 2.3567 train_time: 9.0m tok/s: 6653634 +4587/20000 train_loss: 2.5047 train_time: 9.0m tok/s: 6653346 +4588/20000 train_loss: 2.3948 train_time: 9.0m tok/s: 6653088 +4589/20000 train_loss: 2.4592 train_time: 9.0m tok/s: 6652826 +4590/20000 train_loss: 2.3049 train_time: 9.0m tok/s: 6652564 +4591/20000 train_loss: 2.3639 train_time: 9.0m tok/s: 6652323 +4592/20000 train_loss: 2.2539 train_time: 9.0m tok/s: 6652065 +4593/20000 train_loss: 2.4555 train_time: 9.1m tok/s: 6651812 +4594/20000 train_loss: 2.3171 train_time: 9.1m tok/s: 6651546 +4595/20000 train_loss: 2.1855 train_time: 9.1m tok/s: 6651261 +4596/20000 train_loss: 2.4285 train_time: 9.1m tok/s: 6651008 +4597/20000 train_loss: 2.4130 train_time: 9.1m tok/s: 6650698 +4598/20000 train_loss: 2.4603 train_time: 9.1m tok/s: 6650447 +4599/20000 train_loss: 2.3748 train_time: 9.1m tok/s: 6650199 +4600/20000 train_loss: 2.5591 train_time: 9.1m tok/s: 6649943 +4601/20000 train_loss: 2.4829 train_time: 9.1m tok/s: 6649695 +4602/20000 train_loss: 2.5079 train_time: 9.1m tok/s: 6649446 +4603/20000 train_loss: 2.3784 train_time: 9.1m tok/s: 6649205 +4604/20000 train_loss: 2.3440 train_time: 9.1m tok/s: 6648926 +4605/20000 train_loss: 2.3469 train_time: 9.1m tok/s: 6648671 +4606/20000 train_loss: 2.3461 train_time: 9.1m tok/s: 6648419 +4607/20000 train_loss: 2.3485 train_time: 9.1m tok/s: 6648184 +4608/20000 train_loss: 2.3692 train_time: 9.1m tok/s: 6647902 +4609/20000 train_loss: 2.3181 train_time: 9.1m tok/s: 6647644 +4610/20000 train_loss: 2.4194 train_time: 9.1m tok/s: 6647405 +4611/20000 train_loss: 2.5795 train_time: 9.1m tok/s: 6647137 +4612/20000 train_loss: 2.4497 train_time: 9.1m tok/s: 6646884 +4613/20000 train_loss: 2.5011 train_time: 9.1m tok/s: 6646622 +4614/20000 train_loss: 2.4568 train_time: 9.1m tok/s: 6646368 +4615/20000 train_loss: 2.3050 train_time: 9.1m tok/s: 6646082 +4616/20000 train_loss: 2.2722 train_time: 9.1m tok/s: 6645843 +4617/20000 train_loss: 2.3128 train_time: 9.1m tok/s: 6645587 +4618/20000 train_loss: 2.3701 train_time: 9.1m tok/s: 6645351 +4619/20000 train_loss: 2.2857 train_time: 9.1m tok/s: 6645101 +4620/20000 train_loss: 2.3530 train_time: 9.1m tok/s: 6644849 +4621/20000 train_loss: 2.3891 train_time: 9.1m tok/s: 6644565 +4622/20000 train_loss: 2.4144 train_time: 9.1m tok/s: 6644293 +4623/20000 train_loss: 2.3717 train_time: 9.1m tok/s: 6644048 +4624/20000 train_loss: 2.5495 train_time: 9.1m tok/s: 6643793 +4625/20000 train_loss: 2.5831 train_time: 9.1m tok/s: 6643528 +4626/20000 train_loss: 2.4950 train_time: 9.1m tok/s: 6643261 +4627/20000 train_loss: 2.3984 train_time: 9.1m tok/s: 6643019 +4628/20000 train_loss: 2.4493 train_time: 9.1m tok/s: 6642740 +4629/20000 train_loss: 2.3749 train_time: 9.1m tok/s: 6642495 +4630/20000 train_loss: 2.1516 train_time: 9.1m tok/s: 6642231 +4631/20000 train_loss: 2.3874 train_time: 9.1m tok/s: 6641970 +4632/20000 train_loss: 2.3126 train_time: 9.1m tok/s: 6641711 +4633/20000 train_loss: 2.2328 train_time: 9.1m tok/s: 6641444 +4634/20000 train_loss: 2.3448 train_time: 9.1m tok/s: 6641201 +4635/20000 train_loss: 2.4744 train_time: 9.1m tok/s: 6640922 +4636/20000 train_loss: 2.3821 train_time: 9.2m tok/s: 6640661 +4637/20000 train_loss: 2.4636 train_time: 9.2m tok/s: 6640406 +4638/20000 train_loss: 2.4506 train_time: 9.2m tok/s: 6640153 +4639/20000 train_loss: 2.3899 train_time: 9.2m tok/s: 6639905 +4640/20000 train_loss: 2.4049 train_time: 9.2m tok/s: 6639643 +4641/20000 train_loss: 2.3465 train_time: 9.2m tok/s: 6639380 +4642/20000 train_loss: 2.3124 train_time: 9.2m tok/s: 6639109 +4643/20000 train_loss: 2.4205 train_time: 9.2m tok/s: 6638841 +4644/20000 train_loss: 2.2553 train_time: 9.2m tok/s: 6638579 +4645/20000 train_loss: 2.3492 train_time: 9.2m tok/s: 6638319 +4646/20000 train_loss: 2.2886 train_time: 9.2m tok/s: 6638060 +4647/20000 train_loss: 2.1952 train_time: 9.2m tok/s: 6637810 +4648/20000 train_loss: 2.3751 train_time: 9.2m tok/s: 6637537 +4649/20000 train_loss: 2.3902 train_time: 9.2m tok/s: 6637287 +4650/20000 train_loss: 2.4108 train_time: 9.2m tok/s: 6637026 +4651/20000 train_loss: 2.4822 train_time: 9.2m tok/s: 6636753 +4652/20000 train_loss: 2.4624 train_time: 9.2m tok/s: 6636479 +4653/20000 train_loss: 2.4322 train_time: 9.2m tok/s: 6636229 +4654/20000 train_loss: 2.2968 train_time: 9.2m tok/s: 6635973 +4655/20000 train_loss: 2.2708 train_time: 9.2m tok/s: 6635708 +4656/20000 train_loss: 2.4646 train_time: 9.2m tok/s: 6635454 +4657/20000 train_loss: 2.3532 train_time: 9.2m tok/s: 6635199 +4658/20000 train_loss: 2.2959 train_time: 9.2m tok/s: 6634945 +4659/20000 train_loss: 2.3288 train_time: 9.2m tok/s: 6634709 +4660/20000 train_loss: 2.3410 train_time: 9.2m tok/s: 6634454 +4661/20000 train_loss: 2.0325 train_time: 9.2m tok/s: 6634138 +4662/20000 train_loss: 2.3542 train_time: 9.2m tok/s: 6633875 +4663/20000 train_loss: 2.3865 train_time: 9.2m tok/s: 6633654 +4664/20000 train_loss: 2.2924 train_time: 9.2m tok/s: 6633395 +4665/20000 train_loss: 2.4440 train_time: 9.2m tok/s: 6633140 +4666/20000 train_loss: 2.3333 train_time: 9.2m tok/s: 6632885 +4667/20000 train_loss: 2.4309 train_time: 9.2m tok/s: 6632617 +4668/20000 train_loss: 2.3975 train_time: 9.2m tok/s: 6632384 +4669/20000 train_loss: 2.6144 train_time: 9.2m tok/s: 6632120 +4670/20000 train_loss: 2.2799 train_time: 9.2m tok/s: 6631862 +4671/20000 train_loss: 2.4520 train_time: 9.2m tok/s: 6631600 +4672/20000 train_loss: 2.3491 train_time: 9.2m tok/s: 6631341 +4673/20000 train_loss: 2.3259 train_time: 9.2m tok/s: 6631078 +4674/20000 train_loss: 2.4326 train_time: 9.2m tok/s: 6630834 +4675/20000 train_loss: 2.3725 train_time: 9.2m tok/s: 6630570 +4676/20000 train_loss: 2.3605 train_time: 9.2m tok/s: 6630319 +4677/20000 train_loss: 2.4168 train_time: 9.2m tok/s: 6630076 +4678/20000 train_loss: 2.3839 train_time: 9.2m tok/s: 6629827 +4679/20000 train_loss: 2.2910 train_time: 9.3m tok/s: 6629586 +4680/20000 train_loss: 2.3024 train_time: 9.3m tok/s: 6629336 +4681/20000 train_loss: 2.3740 train_time: 9.3m tok/s: 6629077 +4682/20000 train_loss: 2.2895 train_time: 9.3m tok/s: 6628821 +4683/20000 train_loss: 2.3611 train_time: 9.3m tok/s: 6628567 +4684/20000 train_loss: 2.3500 train_time: 9.3m tok/s: 6628315 +4685/20000 train_loss: 2.3215 train_time: 9.3m tok/s: 6628046 +4686/20000 train_loss: 2.2539 train_time: 9.3m tok/s: 6627787 +4687/20000 train_loss: 2.3871 train_time: 9.3m tok/s: 6627517 +4688/20000 train_loss: 2.4456 train_time: 9.3m tok/s: 6627262 +4689/20000 train_loss: 2.4388 train_time: 9.3m tok/s: 6627013 +4690/20000 train_loss: 2.4983 train_time: 9.3m tok/s: 6626767 +4691/20000 train_loss: 2.4024 train_time: 9.3m tok/s: 6626509 +4692/20000 train_loss: 2.4186 train_time: 9.3m tok/s: 6626259 +4693/20000 train_loss: 2.3652 train_time: 9.3m tok/s: 6625999 +4694/20000 train_loss: 2.4527 train_time: 9.3m tok/s: 6625750 +4695/20000 train_loss: 2.4378 train_time: 9.3m tok/s: 6625477 +4696/20000 train_loss: 2.3405 train_time: 9.3m tok/s: 6625225 +4697/20000 train_loss: 2.3377 train_time: 9.3m tok/s: 6624994 +4698/20000 train_loss: 2.3723 train_time: 9.3m tok/s: 6624750 +4699/20000 train_loss: 2.3659 train_time: 9.3m tok/s: 6624519 +4700/20000 train_loss: 2.1580 train_time: 9.3m tok/s: 6624269 +4701/20000 train_loss: 2.4155 train_time: 9.3m tok/s: 6624018 +4702/20000 train_loss: 2.5016 train_time: 9.3m tok/s: 6623718 +4703/20000 train_loss: 2.4466 train_time: 9.3m tok/s: 6623457 +4704/20000 train_loss: 2.3639 train_time: 9.3m tok/s: 6623217 +4705/20000 train_loss: 2.3338 train_time: 9.3m tok/s: 6622974 +4706/20000 train_loss: 2.3281 train_time: 9.3m tok/s: 6622719 +4707/20000 train_loss: 2.4484 train_time: 9.3m tok/s: 6622482 +4708/20000 train_loss: 2.3554 train_time: 9.3m tok/s: 6622204 +4709/20000 train_loss: 2.3504 train_time: 9.3m tok/s: 6621957 +4710/20000 train_loss: 2.3936 train_time: 9.3m tok/s: 6621698 +4711/20000 train_loss: 2.3593 train_time: 9.3m tok/s: 6621439 +4712/20000 train_loss: 2.3171 train_time: 9.3m tok/s: 6621207 +4713/20000 train_loss: 2.4409 train_time: 9.3m tok/s: 6620946 +4714/20000 train_loss: 2.3416 train_time: 9.3m tok/s: 6620677 +4715/20000 train_loss: 2.4660 train_time: 9.3m tok/s: 6620423 +4716/20000 train_loss: 2.4928 train_time: 9.3m tok/s: 6620177 +4717/20000 train_loss: 2.3211 train_time: 9.3m tok/s: 6619935 +4718/20000 train_loss: 2.3110 train_time: 9.3m tok/s: 6619691 +4719/20000 train_loss: 2.3513 train_time: 9.3m tok/s: 6619457 +4720/20000 train_loss: 2.3034 train_time: 9.3m tok/s: 6619185 +4721/20000 train_loss: 2.2790 train_time: 9.3m tok/s: 6618930 +4722/20000 train_loss: 2.4753 train_time: 9.4m tok/s: 6618690 +4723/20000 train_loss: 2.3220 train_time: 9.4m tok/s: 6618443 +4724/20000 train_loss: 2.3629 train_time: 9.4m tok/s: 6618200 +4725/20000 train_loss: 2.2543 train_time: 9.4m tok/s: 6617928 +4726/20000 train_loss: 2.3963 train_time: 9.4m tok/s: 6617671 +4727/20000 train_loss: 2.3740 train_time: 9.4m tok/s: 6617429 +4728/20000 train_loss: 2.3554 train_time: 9.4m tok/s: 6617183 +4729/20000 train_loss: 2.3643 train_time: 9.4m tok/s: 6616932 +4730/20000 train_loss: 2.2017 train_time: 9.4m tok/s: 6616687 +4731/20000 train_loss: 2.3617 train_time: 9.4m tok/s: 6616426 +4732/20000 train_loss: 2.3129 train_time: 9.4m tok/s: 6616189 +4733/20000 train_loss: 2.4162 train_time: 9.4m tok/s: 6615943 +4734/20000 train_loss: 2.3951 train_time: 9.4m tok/s: 6615702 +4735/20000 train_loss: 2.1973 train_time: 9.4m tok/s: 6615434 +4736/20000 train_loss: 2.3372 train_time: 9.4m tok/s: 6615202 +4737/20000 train_loss: 2.4776 train_time: 9.4m tok/s: 6614947 +4738/20000 train_loss: 2.3508 train_time: 9.4m tok/s: 6614703 +4739/20000 train_loss: 2.4810 train_time: 9.4m tok/s: 6614405 +4740/20000 train_loss: 2.4548 train_time: 9.4m tok/s: 6614156 +4741/20000 train_loss: 2.3779 train_time: 9.4m tok/s: 6613886 +4742/20000 train_loss: 2.3399 train_time: 9.4m tok/s: 6613656 +4743/20000 train_loss: 2.2966 train_time: 9.4m tok/s: 6613399 +4744/20000 train_loss: 2.3667 train_time: 9.4m tok/s: 6613160 +4745/20000 train_loss: 2.2543 train_time: 9.4m tok/s: 6612915 +4746/20000 train_loss: 2.2238 train_time: 9.4m tok/s: 6612680 +4747/20000 train_loss: 2.4582 train_time: 9.4m tok/s: 6612444 +4748/20000 train_loss: 2.3972 train_time: 9.4m tok/s: 6612177 +4749/20000 train_loss: 2.3641 train_time: 9.4m tok/s: 6611914 +4750/20000 train_loss: 2.4015 train_time: 9.4m tok/s: 6611689 +4751/20000 train_loss: 2.3246 train_time: 9.4m tok/s: 6611453 +4752/20000 train_loss: 2.3775 train_time: 9.4m tok/s: 6611215 +4753/20000 train_loss: 2.2574 train_time: 9.4m tok/s: 6610970 +4754/20000 train_loss: 2.2225 train_time: 9.4m tok/s: 6610727 +4755/20000 train_loss: 2.4098 train_time: 9.4m tok/s: 6610480 +4756/20000 train_loss: 2.3780 train_time: 9.4m tok/s: 6610231 +4757/20000 train_loss: 2.3282 train_time: 9.4m tok/s: 6609989 +4758/20000 train_loss: 2.5541 train_time: 9.4m tok/s: 6609746 +4759/20000 train_loss: 2.3841 train_time: 9.4m tok/s: 6609522 +4760/20000 train_loss: 2.2599 train_time: 9.4m tok/s: 6609280 +4761/20000 train_loss: 2.4405 train_time: 9.4m tok/s: 6609041 +4762/20000 train_loss: 2.3450 train_time: 9.4m tok/s: 6608793 +4763/20000 train_loss: 2.4769 train_time: 9.4m tok/s: 6608545 +4764/20000 train_loss: 2.4395 train_time: 9.4m tok/s: 6608300 +4765/20000 train_loss: 2.4230 train_time: 9.5m tok/s: 6608049 +4766/20000 train_loss: 2.3487 train_time: 9.5m tok/s: 6607807 +4767/20000 train_loss: 2.3050 train_time: 9.5m tok/s: 6607563 +4768/20000 train_loss: 2.3768 train_time: 9.5m tok/s: 6607323 +4769/20000 train_loss: 2.3496 train_time: 9.5m tok/s: 6607077 +4770/20000 train_loss: 2.4166 train_time: 9.5m tok/s: 6606825 +4771/20000 train_loss: 2.3464 train_time: 9.5m tok/s: 6606600 +4772/20000 train_loss: 2.4227 train_time: 9.5m tok/s: 6606339 +4773/20000 train_loss: 2.4359 train_time: 9.5m tok/s: 6606106 +4774/20000 train_loss: 2.5465 train_time: 9.5m tok/s: 6605853 +4775/20000 train_loss: 2.4233 train_time: 9.5m tok/s: 6605580 +4776/20000 train_loss: 2.5193 train_time: 9.5m tok/s: 6605356 +4777/20000 train_loss: 2.3943 train_time: 9.5m tok/s: 6605120 +4778/20000 train_loss: 2.3614 train_time: 9.5m tok/s: 6604883 +4779/20000 train_loss: 2.1982 train_time: 9.5m tok/s: 6604628 +4780/20000 train_loss: 2.3159 train_time: 9.5m tok/s: 6604394 +4781/20000 train_loss: 2.3072 train_time: 9.5m tok/s: 6604164 +4782/20000 train_loss: 2.7087 train_time: 9.5m tok/s: 6603895 +4783/20000 train_loss: 2.4208 train_time: 9.5m tok/s: 6603654 +4784/20000 train_loss: 2.3756 train_time: 9.5m tok/s: 6603414 +4785/20000 train_loss: 2.4229 train_time: 9.5m tok/s: 6603165 +4786/20000 train_loss: 2.4212 train_time: 9.5m tok/s: 6602936 +4787/20000 train_loss: 2.3573 train_time: 9.5m tok/s: 6602698 +4788/20000 train_loss: 2.3737 train_time: 9.5m tok/s: 6602449 +4789/20000 train_loss: 2.1518 train_time: 9.5m tok/s: 6602205 +4790/20000 train_loss: 2.4080 train_time: 9.5m tok/s: 6601964 +4791/20000 train_loss: 2.3496 train_time: 9.5m tok/s: 6601716 +4792/20000 train_loss: 2.2430 train_time: 9.5m tok/s: 6601478 +4793/20000 train_loss: 2.3481 train_time: 9.5m tok/s: 6601236 +4794/20000 train_loss: 2.2226 train_time: 9.5m tok/s: 6601005 +4795/20000 train_loss: 2.3902 train_time: 9.5m tok/s: 6600768 +4796/20000 train_loss: 2.2539 train_time: 9.5m tok/s: 6600554 +4797/20000 train_loss: 2.4014 train_time: 9.5m tok/s: 6600306 +4798/20000 train_loss: 2.5482 train_time: 9.5m tok/s: 6600047 +4799/20000 train_loss: 2.4734 train_time: 9.5m tok/s: 6599818 +4800/20000 train_loss: 2.4457 train_time: 9.5m tok/s: 6599591 +4801/20000 train_loss: 2.4180 train_time: 9.5m tok/s: 6599333 +4802/20000 train_loss: 2.3239 train_time: 9.5m tok/s: 6599110 +4803/20000 train_loss: 2.5116 train_time: 9.5m tok/s: 6598875 +4804/20000 train_loss: 1.9468 train_time: 9.5m tok/s: 6598590 +4805/20000 train_loss: 2.3877 train_time: 9.5m tok/s: 6598332 +4806/20000 train_loss: 2.3722 train_time: 9.5m tok/s: 6598102 +4807/20000 train_loss: 2.3611 train_time: 9.5m tok/s: 6597888 +4808/20000 train_loss: 2.2900 train_time: 9.6m tok/s: 6597647 +4809/20000 train_loss: 2.4079 train_time: 9.6m tok/s: 6597416 +4810/20000 train_loss: 2.4536 train_time: 9.6m tok/s: 6597200 +4811/20000 train_loss: 2.4003 train_time: 9.6m tok/s: 6596954 +4812/20000 train_loss: 2.3534 train_time: 9.6m tok/s: 6596718 +4813/20000 train_loss: 2.3524 train_time: 9.6m tok/s: 6596481 +4814/20000 train_loss: 2.3808 train_time: 9.6m tok/s: 6596262 +4815/20000 train_loss: 2.4411 train_time: 9.6m tok/s: 6596026 +4816/20000 train_loss: 2.3612 train_time: 9.6m tok/s: 6595779 +4817/20000 train_loss: 2.4890 train_time: 9.6m tok/s: 6595547 +4818/20000 train_loss: 2.2813 train_time: 9.6m tok/s: 6595318 +4819/20000 train_loss: 2.4325 train_time: 9.6m tok/s: 6595092 +4820/20000 train_loss: 2.3077 train_time: 9.6m tok/s: 6594840 +4821/20000 train_loss: 2.3921 train_time: 9.6m tok/s: 6594608 +4822/20000 train_loss: 2.4091 train_time: 9.6m tok/s: 6594381 +4823/20000 train_loss: 2.3777 train_time: 9.6m tok/s: 6594145 +4824/20000 train_loss: 2.4769 train_time: 9.6m tok/s: 6593901 +4825/20000 train_loss: 2.5617 train_time: 9.6m tok/s: 6593648 +4826/20000 train_loss: 2.2990 train_time: 9.6m tok/s: 6593399 +4827/20000 train_loss: 2.2797 train_time: 9.6m tok/s: 6593164 +4828/20000 train_loss: 2.3190 train_time: 9.6m tok/s: 6592919 +4829/20000 train_loss: 2.3610 train_time: 9.6m tok/s: 6592674 +4830/20000 train_loss: 2.3417 train_time: 9.6m tok/s: 6592434 +4831/20000 train_loss: 2.3005 train_time: 9.6m tok/s: 6592194 +4832/20000 train_loss: 2.2504 train_time: 9.6m tok/s: 6591956 +4833/20000 train_loss: 2.2702 train_time: 9.6m tok/s: 6591725 +4834/20000 train_loss: 2.3891 train_time: 9.6m tok/s: 6591485 +4835/20000 train_loss: 2.3358 train_time: 9.6m tok/s: 6591246 +4836/20000 train_loss: 2.3487 train_time: 9.6m tok/s: 6590982 +4837/20000 train_loss: 2.5204 train_time: 9.6m tok/s: 6590744 +4838/20000 train_loss: 2.2807 train_time: 9.6m tok/s: 6590513 +4839/20000 train_loss: 2.4294 train_time: 9.6m tok/s: 6590287 +4840/20000 train_loss: 2.2677 train_time: 9.6m tok/s: 6590049 +4841/20000 train_loss: 2.4232 train_time: 9.6m tok/s: 6589810 +4842/20000 train_loss: 2.6843 train_time: 9.6m tok/s: 6589568 +4843/20000 train_loss: 2.3009 train_time: 9.6m tok/s: 6589330 +4844/20000 train_loss: 2.2899 train_time: 9.6m tok/s: 6589101 +4845/20000 train_loss: 2.3239 train_time: 9.6m tok/s: 6588881 +4846/20000 train_loss: 2.3398 train_time: 9.6m tok/s: 6588622 +4847/20000 train_loss: 2.5501 train_time: 9.6m tok/s: 6588406 +4848/20000 train_loss: 2.3077 train_time: 9.6m tok/s: 6588162 +4849/20000 train_loss: 2.4476 train_time: 9.6m tok/s: 6587948 +4850/20000 train_loss: 2.2920 train_time: 9.6m tok/s: 6587695 +4851/20000 train_loss: 2.3365 train_time: 9.7m tok/s: 6587462 +4852/20000 train_loss: 2.3513 train_time: 9.7m tok/s: 6587234 +4853/20000 train_loss: 2.2910 train_time: 9.7m tok/s: 6586981 +4854/20000 train_loss: 2.4535 train_time: 9.7m tok/s: 6586751 +4855/20000 train_loss: 2.3458 train_time: 9.7m tok/s: 6586530 +4856/20000 train_loss: 2.2774 train_time: 9.7m tok/s: 6586306 +4857/20000 train_loss: 2.2852 train_time: 9.7m tok/s: 6586070 +4858/20000 train_loss: 2.2667 train_time: 9.7m tok/s: 6585820 +4859/20000 train_loss: 2.3087 train_time: 9.7m tok/s: 6585573 +4860/20000 train_loss: 2.4699 train_time: 9.7m tok/s: 6585352 +4861/20000 train_loss: 2.4042 train_time: 9.7m tok/s: 6585123 +4862/20000 train_loss: 2.3105 train_time: 9.7m tok/s: 6584890 +4863/20000 train_loss: 2.5780 train_time: 9.7m tok/s: 6584662 +4864/20000 train_loss: 2.3987 train_time: 9.7m tok/s: 6584432 +4865/20000 train_loss: 2.5059 train_time: 9.7m tok/s: 6584176 +4866/20000 train_loss: 2.2645 train_time: 9.7m tok/s: 6583920 +4867/20000 train_loss: 2.2462 train_time: 9.7m tok/s: 6583696 +4868/20000 train_loss: 2.3088 train_time: 9.7m tok/s: 6583469 +4869/20000 train_loss: 2.4410 train_time: 9.7m tok/s: 6583215 +4870/20000 train_loss: 2.3563 train_time: 9.7m tok/s: 6582997 +4871/20000 train_loss: 2.3606 train_time: 9.7m tok/s: 6582768 +4872/20000 train_loss: 2.2728 train_time: 9.7m tok/s: 6582509 +4873/20000 train_loss: 2.5435 train_time: 9.7m tok/s: 6582277 +4874/20000 train_loss: 2.4189 train_time: 9.7m tok/s: 6582066 +4875/20000 train_loss: 2.4035 train_time: 9.7m tok/s: 6581848 +4876/20000 train_loss: 2.4026 train_time: 9.7m tok/s: 6581629 +4877/20000 train_loss: 2.3003 train_time: 9.7m tok/s: 6581403 +4878/20000 train_loss: 2.4007 train_time: 9.7m tok/s: 6581162 +4879/20000 train_loss: 2.3613 train_time: 9.7m tok/s: 6580938 +4880/20000 train_loss: 2.4316 train_time: 9.7m tok/s: 6580702 +4881/20000 train_loss: 2.3413 train_time: 9.7m tok/s: 6580473 +4882/20000 train_loss: 2.2732 train_time: 9.7m tok/s: 6580241 +4883/20000 train_loss: 2.2500 train_time: 9.7m tok/s: 6580017 +4884/20000 train_loss: 2.6043 train_time: 9.7m tok/s: 6579799 +4885/20000 train_loss: 2.4654 train_time: 9.7m tok/s: 6579567 +4886/20000 train_loss: 2.4352 train_time: 9.7m tok/s: 6579351 +4887/20000 train_loss: 2.4040 train_time: 9.7m tok/s: 6579120 +4888/20000 train_loss: 2.4027 train_time: 9.7m tok/s: 6578891 +4889/20000 train_loss: 2.3467 train_time: 9.7m tok/s: 6578671 +4890/20000 train_loss: 2.4002 train_time: 9.7m tok/s: 6578435 +4891/20000 train_loss: 2.1710 train_time: 9.7m tok/s: 6578195 +4892/20000 train_loss: 1.9570 train_time: 9.7m tok/s: 6577905 +4893/20000 train_loss: 2.4379 train_time: 9.8m tok/s: 6577666 +4894/20000 train_loss: 2.3804 train_time: 9.8m tok/s: 6577465 +4895/20000 train_loss: 2.3734 train_time: 9.8m tok/s: 6577250 +4895/20000 val_loss: 2.3562 val_bpb: 1.0766 +stopping_early: wallclock_cap train_time: 585331ms step: 4895/20000 +peak memory allocated: 41707 MiB reserved: 47048 MiB +ema:applying EMA weights +diagnostic pre-quantization post-ema val_loss:2.33221133 val_bpb:1.06566160 eval_time:7561ms +Serialized model: 135418111 bytes +Code size (uncompressed): 182796 bytes +Code size (compressed): 45910 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 4.1s +Quantized weights: + gate_int8_row: blocks.attn.attn_gate_w + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int6)+lqer_asym: blocks.mlp.fc.weight + gptq (int7)+awqgrpint8+lqer_asym: tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, parallel_post_lambdas, parallel_resid_lambdas, skip_gates, skip_weights, smear_gate.weight, smear_lambda, softcap_neg, softcap_pos +Serialize: per-group lrzip compression... +Serialize: per-group compression done in 119.5s +Serialized model quantized+pergroup: 15949305 bytes +Total submission size quantized+pergroup: 15995215 bytes +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 21.1s +diagnostic quantized val_loss:2.34933075 val_bpb:1.07348401 eval_time:10651ms +Deserialize: per-group lrzip decompression... +Deserialize: decompression done in 21.1s +ttt_lora:warming up compile (random tokens, no val data) +ttt_lora:compile warmup done (104.8s) +v5:precomputing ngram hints OUTSIDE eval timer +ngram_tilt:hints total=47851520 gated=13023303 token_gate=628130 within_gate=9866847 word_gate=2891588 agree2plus=303177 +ngram_tilt:precompute_outside_timer_done elapsed=160.59s total_targets=47851520 + +beginning TTT eval timer +ngram_tilt:using_precomputed_hints total_targets=47851520 (precompute time excluded from eval) +ttt_phased: total_docs:50000 prefix_docs:2500 suffix_docs:47500 num_phases:3 boundaries:[833, 1666, 2500] +ttp: b777/782 bl:2.3077 bb:1.0815 rl:2.3077 rb:1.0815 dl:8452-9229 gd:0 +ttp: b772/782 bl:2.3220 bb:1.0944 rl:2.3134 rb:1.0866 dl:5762-6095 gd:0 +ttp: b767/782 bl:2.2635 bb:1.0710 rl:2.3012 rb:1.0828 dl:4681-4858 gd:0 +ttpp: phase:1/3 pd:1296 gd:833 t:225.4s +tttg: c1/131 lr:0.001000 t:1.8s +tttg: c2/131 lr:0.001000 t:1.9s +tttg: c3/131 lr:0.000999 t:2.0s +tttg: c4/131 lr:0.000999 t:2.1s +tttg: c5/131 lr:0.000998 t:2.1s +tttg: c6/131 lr:0.000996 t:2.2s +tttg: c7/131 lr:0.000995 t:2.3s +tttg: c8/131 lr:0.000993 t:2.4s +tttg: c9/131 lr:0.000991 t:2.4s +tttg: c10/131 lr:0.000988 t:2.5s +tttg: c11/131 lr:0.000985 t:2.6s +tttg: c12/131 lr:0.000982 t:2.7s +tttg: c13/131 lr:0.000979 t:2.7s +tttg: c14/131 lr:0.000976 t:2.8s +tttg: c15/131 lr:0.000972 t:2.9s +tttg: c16/131 lr:0.000968 t:3.0s +tttg: c17/131 lr:0.000963 t:3.0s +tttg: c18/131 lr:0.000958 t:3.1s +tttg: c19/131 lr:0.000953 t:3.2s +tttg: c20/131 lr:0.000948 t:3.2s +tttg: c21/131 lr:0.000943 t:3.3s +tttg: c22/131 lr:0.000937 t:3.4s +tttg: c23/131 lr:0.000931 t:3.5s +tttg: c24/131 lr:0.000925 t:3.5s +tttg: c25/131 lr:0.000918 t:3.6s +tttg: c26/131 lr:0.000911 t:3.7s +tttg: c27/131 lr:0.000905 t:3.8s +tttg: c28/131 lr:0.000897 t:3.9s +tttg: c29/131 lr:0.000890 t:3.9s +tttg: c30/131 lr:0.000882 t:4.0s +tttg: c31/131 lr:0.000874 t:4.1s +tttg: c32/131 lr:0.000866 t:4.2s +tttg: c33/131 lr:0.000858 t:4.2s +tttg: c34/131 lr:0.000849 t:4.3s +tttg: c35/131 lr:0.000841 t:4.4s +tttg: c36/131 lr:0.000832 t:4.4s +tttg: c37/131 lr:0.000822 t:4.5s +tttg: c38/131 lr:0.000813 t:4.6s +tttg: c39/131 lr:0.000804 t:4.7s +tttg: c40/131 lr:0.000794 t:4.7s +tttg: c41/131 lr:0.000784 t:4.8s +tttg: c42/131 lr:0.000774 t:4.9s +tttg: c43/131 lr:0.000764 t:5.0s +tttg: c44/131 lr:0.000753 t:5.0s +tttg: c45/131 lr:0.000743 t:5.1s +tttg: c46/131 lr:0.000732 t:5.2s +tttg: c47/131 lr:0.000722 t:5.3s +tttg: c48/131 lr:0.000711 t:5.3s +tttg: c49/131 lr:0.000700 t:5.4s +tttg: c50/131 lr:0.000689 t:5.5s +tttg: c51/131 lr:0.000677 t:5.6s +tttg: c52/131 lr:0.000666 t:5.6s +tttg: c53/131 lr:0.000655 t:5.7s +tttg: c54/131 lr:0.000643 t:5.8s +tttg: c55/131 lr:0.000631 t:5.9s +tttg: c56/131 lr:0.000620 t:5.9s +tttg: c57/131 lr:0.000608 t:6.0s +tttg: c58/131 lr:0.000596 t:6.1s +tttg: c59/131 lr:0.000584 t:6.2s +tttg: c60/131 lr:0.000572 t:6.2s +tttg: c61/131 lr:0.000560 t:6.3s +tttg: c62/131 lr:0.000548 t:6.4s +tttg: c63/131 lr:0.000536 t:6.5s +tttg: c64/131 lr:0.000524 t:6.5s +tttg: c65/131 lr:0.000512 t:6.6s +tttg: c66/131 lr:0.000500 t:6.7s +tttg: c67/131 lr:0.000488 t:6.8s +tttg: c68/131 lr:0.000476 t:6.8s +tttg: c69/131 lr:0.000464 t:6.9s +tttg: c70/131 lr:0.000452 t:7.0s +tttg: c71/131 lr:0.000440 t:7.0s +tttg: c72/131 lr:0.000428 t:7.1s +tttg: c73/131 lr:0.000416 t:7.2s +tttg: c74/131 lr:0.000404 t:7.3s +tttg: c75/131 lr:0.000392 t:7.4s +tttg: c76/131 lr:0.000380 t:7.4s +tttg: c77/131 lr:0.000369 t:7.5s +tttg: c78/131 lr:0.000357 t:7.6s +tttg: c79/131 lr:0.000345 t:7.7s +tttg: c80/131 lr:0.000334 t:7.7s +tttg: c81/131 lr:0.000323 t:7.8s +tttg: c82/131 lr:0.000311 t:7.9s +tttg: c83/131 lr:0.000300 t:8.0s +tttg: c84/131 lr:0.000289 t:8.0s +tttg: c85/131 lr:0.000278 t:8.1s +tttg: c86/131 lr:0.000268 t:8.2s +tttg: c87/131 lr:0.000257 t:8.3s +tttg: c88/131 lr:0.000247 t:8.3s +tttg: c89/131 lr:0.000236 t:8.4s +tttg: c90/131 lr:0.000226 t:8.5s +tttg: c91/131 lr:0.000216 t:8.5s +tttg: c92/131 lr:0.000206 t:8.6s +tttg: c93/131 lr:0.000196 t:8.7s +tttg: c94/131 lr:0.000187 t:8.8s +tttg: c95/131 lr:0.000178 t:8.9s +tttg: c96/131 lr:0.000168 t:8.9s +tttg: c97/131 lr:0.000159 t:9.0s +tttg: c98/131 lr:0.000151 t:9.1s +tttg: c99/131 lr:0.000142 t:9.1s +tttg: c100/131 lr:0.000134 t:9.2s +tttg: c101/131 lr:0.000126 t:9.3s +tttg: c102/131 lr:0.000118 t:9.4s +tttg: c103/131 lr:0.000110 t:9.5s +tttg: c104/131 lr:0.000103 t:9.5s +tttg: c105/131 lr:0.000095 t:9.6s +tttg: c106/131 lr:0.000089 t:9.7s +tttg: c107/131 lr:0.000082 t:9.7s +tttg: c108/131 lr:0.000075 t:9.8s +tttg: c109/131 lr:0.000069 t:9.9s +tttg: c110/131 lr:0.000063 t:10.0s +tttg: c111/131 lr:0.000057 t:10.0s +tttg: c112/131 lr:0.000052 t:10.1s +tttg: c113/131 lr:0.000047 t:10.2s +tttg: c114/131 lr:0.000042 t:10.3s +tttg: c115/131 lr:0.000037 t:10.3s +tttg: c116/131 lr:0.000032 t:10.4s +tttg: c117/131 lr:0.000028 t:10.5s +tttg: c118/131 lr:0.000024 t:10.5s +tttg: c119/131 lr:0.000021 t:10.6s +tttg: c120/131 lr:0.000018 t:10.7s +tttg: c121/131 lr:0.000015 t:10.8s +tttg: c122/131 lr:0.000012 t:10.8s +tttg: c123/131 lr:0.000009 t:10.9s +tttg: c124/131 lr:0.000007 t:11.0s +tttg: c125/131 lr:0.000005 t:11.1s +tttg: c126/131 lr:0.000004 t:11.1s +tttg: c127/131 lr:0.000002 t:11.2s +tttg: c128/131 lr:0.000001 t:11.3s +tttg: c129/131 lr:0.000001 t:11.4s +tttg: c130/131 lr:0.000000 t:11.4s +ttpr: phase:1/3 t:238.5s +ttp: b757/782 bl:2.2770 bb:1.0599 rl:2.2975 rb:1.0793 dl:3550-3633 gd:0 +ttp: b753/782 bl:2.2098 bb:0.9976 rl:2.2865 rb:1.0686 dl:3284-3344 gd:0 +ttpp: phase:2/3 pd:2128 gd:1666 t:410.7s +tttg: c1/219 lr:0.001000 t:0.1s +tttg: c2/219 lr:0.001000 t:0.2s +tttg: c3/219 lr:0.001000 t:0.2s +tttg: c4/219 lr:0.001000 t:0.3s +tttg: c5/219 lr:0.000999 t:0.4s +tttg: c6/219 lr:0.000999 t:0.4s +tttg: c7/219 lr:0.000998 t:0.5s +tttg: c8/219 lr:0.000997 t:0.6s +tttg: c9/219 lr:0.000997 t:0.7s +tttg: c10/219 lr:0.000996 t:0.7s +tttg: c11/219 lr:0.000995 t:0.8s +tttg: c12/219 lr:0.000994 t:0.9s +tttg: c13/219 lr:0.000993 t:1.0s +tttg: c14/219 lr:0.000991 t:1.0s +tttg: c15/219 lr:0.000990 t:1.1s +tttg: c16/219 lr:0.000988 t:1.2s +tttg: c17/219 lr:0.000987 t:1.3s +tttg: c18/219 lr:0.000985 t:1.3s +tttg: c19/219 lr:0.000983 t:1.4s +tttg: c20/219 lr:0.000981 t:1.5s +tttg: c21/219 lr:0.000979 t:1.6s +tttg: c22/219 lr:0.000977 t:1.6s +tttg: c23/219 lr:0.000975 t:1.7s +tttg: c24/219 lr:0.000973 t:1.8s +tttg: c25/219 lr:0.000970 t:1.9s +tttg: c26/219 lr:0.000968 t:1.9s +tttg: c27/219 lr:0.000965 t:2.0s +tttg: c28/219 lr:0.000963 t:2.1s +tttg: c29/219 lr:0.000960 t:2.1s +tttg: c30/219 lr:0.000957 t:2.2s +tttg: c31/219 lr:0.000954 t:2.3s +tttg: c32/219 lr:0.000951 t:2.4s +tttg: c33/219 lr:0.000948 t:2.5s +tttg: c34/219 lr:0.000945 t:2.5s +tttg: c35/219 lr:0.000941 t:2.6s +tttg: c36/219 lr:0.000938 t:2.7s +tttg: c37/219 lr:0.000934 t:2.8s +tttg: c38/219 lr:0.000931 t:2.8s +tttg: c39/219 lr:0.000927 t:2.9s +tttg: c40/219 lr:0.000923 t:3.0s +tttg: c41/219 lr:0.000919 t:3.1s +tttg: c42/219 lr:0.000915 t:3.1s +tttg: c43/219 lr:0.000911 t:3.2s +tttg: c44/219 lr:0.000907 t:3.3s +tttg: c45/219 lr:0.000903 t:3.4s +tttg: c46/219 lr:0.000898 t:3.4s +tttg: c47/219 lr:0.000894 t:3.5s +tttg: c48/219 lr:0.000890 t:3.6s +tttg: c49/219 lr:0.000885 t:3.7s +tttg: c50/219 lr:0.000880 t:3.7s +tttg: c51/219 lr:0.000876 t:3.8s +tttg: c52/219 lr:0.000871 t:3.9s +tttg: c53/219 lr:0.000866 t:4.0s +tttg: c54/219 lr:0.000861 t:4.0s +tttg: c55/219 lr:0.000856 t:4.1s +tttg: c56/219 lr:0.000851 t:4.2s +tttg: c57/219 lr:0.000846 t:4.3s +tttg: c58/219 lr:0.000841 t:4.3s +tttg: c59/219 lr:0.000835 t:4.4s +tttg: c60/219 lr:0.000830 t:4.5s +tttg: c61/219 lr:0.000824 t:4.6s +tttg: c62/219 lr:0.000819 t:4.6s +tttg: c63/219 lr:0.000813 t:4.7s +tttg: c64/219 lr:0.000808 t:4.8s +tttg: c65/219 lr:0.000802 t:4.9s +tttg: c66/219 lr:0.000796 t:4.9s +tttg: c67/219 lr:0.000790 t:5.0s +tttg: c68/219 lr:0.000784 t:5.1s +tttg: c69/219 lr:0.000779 t:5.2s +tttg: c70/219 lr:0.000773 t:5.2s +tttg: c71/219 lr:0.000766 t:5.3s +tttg: c72/219 lr:0.000760 t:5.4s +tttg: c73/219 lr:0.000754 t:5.5s +tttg: c74/219 lr:0.000748 t:5.5s +tttg: c75/219 lr:0.000742 t:5.6s +tttg: c76/219 lr:0.000735 t:5.7s +tttg: c77/219 lr:0.000729 t:5.8s +tttg: c78/219 lr:0.000722 t:5.8s +tttg: c79/219 lr:0.000716 t:5.9s +tttg: c80/219 lr:0.000709 t:6.0s +tttg: c81/219 lr:0.000703 t:6.1s +tttg: c82/219 lr:0.000696 t:6.1s +tttg: c83/219 lr:0.000690 t:6.2s +tttg: c84/219 lr:0.000683 t:6.3s +tttg: c85/219 lr:0.000676 t:6.4s +tttg: c86/219 lr:0.000670 t:6.4s +tttg: c87/219 lr:0.000663 t:6.5s +tttg: c88/219 lr:0.000656 t:6.6s +tttg: c89/219 lr:0.000649 t:6.6s +tttg: c90/219 lr:0.000642 t:6.7s +tttg: c91/219 lr:0.000635 t:6.8s +tttg: c92/219 lr:0.000628 t:6.9s +tttg: c93/219 lr:0.000621 t:7.0s +tttg: c94/219 lr:0.000614 t:7.0s +tttg: c95/219 lr:0.000607 t:7.1s +tttg: c96/219 lr:0.000600 t:7.2s +tttg: c97/219 lr:0.000593 t:7.2s +tttg: c98/219 lr:0.000586 t:7.3s +tttg: c99/219 lr:0.000579 t:7.4s +tttg: c100/219 lr:0.000572 t:7.5s +tttg: c101/219 lr:0.000565 t:7.5s +tttg: c102/219 lr:0.000558 t:7.6s +tttg: c103/219 lr:0.000550 t:7.7s +tttg: c104/219 lr:0.000543 t:7.8s +tttg: c105/219 lr:0.000536 t:7.8s +tttg: c106/219 lr:0.000529 t:7.9s +tttg: c107/219 lr:0.000522 t:8.0s +tttg: c108/219 lr:0.000514 t:8.0s +tttg: c109/219 lr:0.000507 t:8.1s +tttg: c110/219 lr:0.000500 t:8.2s +tttg: c111/219 lr:0.000493 t:8.3s +tttg: c112/219 lr:0.000486 t:8.4s +tttg: c113/219 lr:0.000478 t:8.4s +tttg: c114/219 lr:0.000471 t:8.5s +tttg: c115/219 lr:0.000464 t:8.6s +tttg: c116/219 lr:0.000457 t:8.6s +tttg: c117/219 lr:0.000450 t:8.7s +tttg: c118/219 lr:0.000442 t:8.8s +tttg: c119/219 lr:0.000435 t:8.9s +tttg: c120/219 lr:0.000428 t:8.9s +tttg: c121/219 lr:0.000421 t:9.0s +tttg: c122/219 lr:0.000414 t:9.1s +tttg: c123/219 lr:0.000407 t:9.2s +tttg: c124/219 lr:0.000400 t:9.2s +tttg: c125/219 lr:0.000393 t:9.3s +tttg: c126/219 lr:0.000386 t:9.4s +tttg: c127/219 lr:0.000379 t:9.5s +tttg: c128/219 lr:0.000372 t:9.5s +tttg: c129/219 lr:0.000365 t:9.6s +tttg: c130/219 lr:0.000358 t:9.7s +tttg: c131/219 lr:0.000351 t:9.8s +tttg: c132/219 lr:0.000344 t:9.8s +tttg: c133/219 lr:0.000337 t:9.9s +tttg: c134/219 lr:0.000330 t:10.0s +tttg: c135/219 lr:0.000324 t:10.1s +tttg: c136/219 lr:0.000317 t:10.1s +tttg: c137/219 lr:0.000310 t:10.2s +tttg: c138/219 lr:0.000304 t:10.3s +tttg: c139/219 lr:0.000297 t:10.3s +tttg: c140/219 lr:0.000291 t:10.4s +tttg: c141/219 lr:0.000284 t:10.5s +tttg: c142/219 lr:0.000278 t:10.6s +tttg: c143/219 lr:0.000271 t:10.7s +tttg: c144/219 lr:0.000265 t:10.7s +tttg: c145/219 lr:0.000258 t:10.8s +tttg: c146/219 lr:0.000252 t:10.9s +tttg: c147/219 lr:0.000246 t:11.0s +tttg: c148/219 lr:0.000240 t:11.0s +tttg: c149/219 lr:0.000234 t:11.1s +tttg: c150/219 lr:0.000227 t:11.2s +tttg: c151/219 lr:0.000221 t:11.2s +tttg: c152/219 lr:0.000216 t:11.3s +tttg: c153/219 lr:0.000210 t:11.4s +tttg: c154/219 lr:0.000204 t:11.5s +tttg: c155/219 lr:0.000198 t:11.6s +tttg: c156/219 lr:0.000192 t:11.6s +tttg: c157/219 lr:0.000187 t:11.7s +tttg: c158/219 lr:0.000181 t:11.8s +tttg: c159/219 lr:0.000176 t:11.9s +tttg: c160/219 lr:0.000170 t:12.0s +tttg: c161/219 lr:0.000165 t:12.0s +tttg: c162/219 lr:0.000159 t:12.1s +tttg: c163/219 lr:0.000154 t:12.2s +tttg: c164/219 lr:0.000149 t:12.2s +tttg: c165/219 lr:0.000144 t:12.3s +tttg: c166/219 lr:0.000139 t:12.4s +tttg: c167/219 lr:0.000134 t:12.5s +tttg: c168/219 lr:0.000129 t:12.5s +tttg: c169/219 lr:0.000124 t:12.6s +tttg: c170/219 lr:0.000120 t:12.7s +tttg: c171/219 lr:0.000115 t:12.8s +tttg: c172/219 lr:0.000110 t:12.8s +tttg: c173/219 lr:0.000106 t:12.9s +tttg: c174/219 lr:0.000102 t:13.0s +tttg: c175/219 lr:0.000097 t:13.1s +tttg: c176/219 lr:0.000093 t:13.1s +tttg: c177/219 lr:0.000089 t:13.2s +tttg: c178/219 lr:0.000085 t:13.3s +tttg: c179/219 lr:0.000081 t:13.4s +tttg: c180/219 lr:0.000077 t:13.4s +tttg: c181/219 lr:0.000073 t:13.5s +tttg: c182/219 lr:0.000069 t:13.6s +tttg: c183/219 lr:0.000066 t:13.6s +tttg: c184/219 lr:0.000062 t:13.7s +tttg: c185/219 lr:0.000059 t:13.8s +tttg: c186/219 lr:0.000055 t:13.9s +tttg: c187/219 lr:0.000052 t:13.9s +tttg: c188/219 lr:0.000049 t:14.0s +tttg: c189/219 lr:0.000046 t:14.1s +tttg: c190/219 lr:0.000043 t:14.2s +tttg: c191/219 lr:0.000040 t:14.3s +tttg: c192/219 lr:0.000037 t:14.4s +tttg: c193/219 lr:0.000035 t:14.4s +tttg: c194/219 lr:0.000032 t:14.5s +tttg: c195/219 lr:0.000030 t:14.6s +tttg: c196/219 lr:0.000027 t:14.7s +tttg: c197/219 lr:0.000025 t:14.8s +tttg: c198/219 lr:0.000023 t:14.8s +tttg: c199/219 lr:0.000021 t:14.9s +tttg: c200/219 lr:0.000019 t:15.0s +tttg: c201/219 lr:0.000017 t:15.1s +tttg: c202/219 lr:0.000015 t:15.2s +tttg: c203/219 lr:0.000013 t:15.2s +tttg: c204/219 lr:0.000012 t:15.3s +tttg: c205/219 lr:0.000010 t:15.4s +tttg: c206/219 lr:0.000009 t:15.5s +tttg: c207/219 lr:0.000007 t:15.5s +tttg: c208/219 lr:0.000006 t:15.6s +tttg: c209/219 lr:0.000005 t:15.7s +tttg: c210/219 lr:0.000004 t:15.8s +tttg: c211/219 lr:0.000003 t:15.8s +tttg: c212/219 lr:0.000003 t:15.9s +tttg: c213/219 lr:0.000002 t:16.0s +tttg: c214/219 lr:0.000001 t:16.1s +tttg: c215/219 lr:0.000001 t:16.1s +tttg: c216/219 lr:0.000000 t:16.2s +tttg: c217/219 lr:0.000000 t:16.3s +tttg: c218/219 lr:0.000000 t:16.4s +ttpr: phase:2/3 t:428.8s +ttp: b748/782 bl:2.3170 bb:1.0813 rl:2.2896 rb:1.0699 dl:2992-3039 gd:0 +ttpp: phase:3/3 pd:2960 gd:2500 t:444.9s +tttg: c1/289 lr:0.001000 t:0.1s +tttg: c2/289 lr:0.001000 t:0.2s +tttg: c3/289 lr:0.001000 t:0.2s +tttg: c4/289 lr:0.001000 t:0.3s +tttg: c5/289 lr:0.001000 t:0.4s +tttg: c6/289 lr:0.000999 t:0.4s +tttg: c7/289 lr:0.000999 t:0.5s +tttg: c8/289 lr:0.000999 t:0.6s +tttg: c9/289 lr:0.000998 t:0.7s +tttg: c10/289 lr:0.000998 t:0.7s +tttg: c11/289 lr:0.000997 t:0.8s +tttg: c12/289 lr:0.000996 t:0.9s +tttg: c13/289 lr:0.000996 t:1.0s +tttg: c14/289 lr:0.000995 t:1.1s +tttg: c15/289 lr:0.000994 t:1.1s +tttg: c16/289 lr:0.000993 t:1.2s +tttg: c17/289 lr:0.000992 t:1.3s +tttg: c18/289 lr:0.000991 t:1.3s +tttg: c19/289 lr:0.000990 t:1.4s +tttg: c20/289 lr:0.000989 t:1.5s +tttg: c21/289 lr:0.000988 t:1.6s +tttg: c22/289 lr:0.000987 t:1.6s +tttg: c23/289 lr:0.000986 t:1.7s +tttg: c24/289 lr:0.000984 t:1.8s +tttg: c25/289 lr:0.000983 t:1.9s +tttg: c26/289 lr:0.000982 t:1.9s +tttg: c27/289 lr:0.000980 t:2.0s +tttg: c28/289 lr:0.000978 t:2.1s +tttg: c29/289 lr:0.000977 t:2.2s +tttg: c30/289 lr:0.000975 t:2.2s +tttg: c31/289 lr:0.000973 t:2.3s +tttg: c32/289 lr:0.000972 t:2.4s +tttg: c33/289 lr:0.000970 t:2.5s +tttg: c34/289 lr:0.000968 t:2.6s +tttg: c35/289 lr:0.000966 t:2.6s +tttg: c36/289 lr:0.000964 t:2.7s +tttg: c37/289 lr:0.000962 t:2.8s +tttg: c38/289 lr:0.000960 t:2.8s +tttg: c39/289 lr:0.000958 t:2.9s +tttg: c40/289 lr:0.000955 t:3.0s +tttg: c41/289 lr:0.000953 t:3.1s +tttg: c42/289 lr:0.000951 t:3.1s +tttg: c43/289 lr:0.000948 t:3.2s +tttg: c44/289 lr:0.000946 t:3.3s +tttg: c45/289 lr:0.000944 t:3.4s +tttg: c46/289 lr:0.000941 t:3.4s +tttg: c47/289 lr:0.000938 t:3.5s +tttg: c48/289 lr:0.000936 t:3.6s +tttg: c49/289 lr:0.000933 t:3.7s +tttg: c50/289 lr:0.000930 t:3.7s +tttg: c51/289 lr:0.000927 t:3.8s +tttg: c52/289 lr:0.000925 t:3.9s +tttg: c53/289 lr:0.000922 t:4.0s +tttg: c54/289 lr:0.000919 t:4.0s +tttg: c55/289 lr:0.000916 t:4.1s +tttg: c56/289 lr:0.000913 t:4.2s +tttg: c57/289 lr:0.000910 t:4.3s +tttg: c58/289 lr:0.000906 t:4.3s +tttg: c59/289 lr:0.000903 t:4.4s +tttg: c60/289 lr:0.000900 t:4.5s +tttg: c61/289 lr:0.000897 t:4.6s +tttg: c62/289 lr:0.000893 t:4.6s +tttg: c63/289 lr:0.000890 t:4.7s +tttg: c64/289 lr:0.000887 t:4.8s +tttg: c65/289 lr:0.000883 t:4.9s +tttg: c66/289 lr:0.000879 t:4.9s +tttg: c67/289 lr:0.000876 t:5.0s +tttg: c68/289 lr:0.000872 t:5.1s +tttg: c69/289 lr:0.000869 t:5.2s +tttg: c70/289 lr:0.000865 t:5.2s +tttg: c71/289 lr:0.000861 t:5.3s +tttg: c72/289 lr:0.000857 t:5.4s +tttg: c73/289 lr:0.000854 t:5.5s +tttg: c74/289 lr:0.000850 t:5.5s +tttg: c75/289 lr:0.000846 t:5.6s +tttg: c76/289 lr:0.000842 t:5.7s +tttg: c77/289 lr:0.000838 t:5.8s +tttg: c78/289 lr:0.000834 t:5.8s +tttg: c79/289 lr:0.000830 t:5.9s +tttg: c80/289 lr:0.000826 t:6.0s +tttg: c81/289 lr:0.000821 t:6.1s +tttg: c82/289 lr:0.000817 t:6.1s +tttg: c83/289 lr:0.000813 t:6.2s +tttg: c84/289 lr:0.000809 t:6.3s +tttg: c85/289 lr:0.000804 t:6.3s +tttg: c86/289 lr:0.000800 t:6.4s +tttg: c87/289 lr:0.000796 t:6.5s +tttg: c88/289 lr:0.000791 t:6.6s +tttg: c89/289 lr:0.000787 t:6.7s +tttg: c90/289 lr:0.000782 t:6.7s +tttg: c91/289 lr:0.000778 t:6.8s +tttg: c92/289 lr:0.000773 t:6.9s +tttg: c93/289 lr:0.000769 t:7.0s +tttg: c94/289 lr:0.000764 t:7.0s +tttg: c95/289 lr:0.000759 t:7.1s +tttg: c96/289 lr:0.000755 t:7.2s +tttg: c97/289 lr:0.000750 t:7.2s +tttg: c98/289 lr:0.000745 t:7.3s +tttg: c99/289 lr:0.000740 t:7.4s +tttg: c100/289 lr:0.000736 t:7.5s +tttg: c101/289 lr:0.000731 t:7.5s +tttg: c102/289 lr:0.000726 t:7.6s +tttg: c103/289 lr:0.000721 t:7.7s +tttg: c104/289 lr:0.000716 t:7.8s +tttg: c105/289 lr:0.000711 t:7.8s +tttg: c106/289 lr:0.000706 t:7.9s +tttg: c107/289 lr:0.000701 t:8.0s +tttg: c108/289 lr:0.000696 t:8.1s +tttg: c109/289 lr:0.000691 t:8.1s +tttg: c110/289 lr:0.000686 t:8.2s +tttg: c111/289 lr:0.000681 t:8.3s +tttg: c112/289 lr:0.000676 t:8.4s +tttg: c113/289 lr:0.000671 t:8.4s +tttg: c114/289 lr:0.000666 t:8.5s +tttg: c115/289 lr:0.000661 t:8.6s +tttg: c116/289 lr:0.000656 t:8.7s +tttg: c117/289 lr:0.000650 t:8.7s +tttg: c118/289 lr:0.000645 t:8.8s +tttg: c119/289 lr:0.000640 t:8.9s +tttg: c120/289 lr:0.000635 t:9.0s +tttg: c121/289 lr:0.000629 t:9.0s +tttg: c122/289 lr:0.000624 t:9.1s +tttg: c123/289 lr:0.000619 t:9.2s +tttg: c124/289 lr:0.000614 t:9.3s +tttg: c125/289 lr:0.000608 t:9.3s +tttg: c126/289 lr:0.000603 t:9.4s +tttg: c127/289 lr:0.000598 t:9.5s +tttg: c128/289 lr:0.000592 t:9.6s +tttg: c129/289 lr:0.000587 t:9.7s +tttg: c130/289 lr:0.000581 t:9.7s +tttg: c131/289 lr:0.000576 t:9.8s +tttg: c132/289 lr:0.000571 t:9.9s +tttg: c133/289 lr:0.000565 t:10.0s +tttg: c134/289 lr:0.000560 t:10.0s +tttg: c135/289 lr:0.000554 t:10.1s +tttg: c136/289 lr:0.000549 t:10.2s +tttg: c137/289 lr:0.000544 t:10.3s +tttg: c138/289 lr:0.000538 t:10.3s +tttg: c139/289 lr:0.000533 t:10.4s +tttg: c140/289 lr:0.000527 t:10.5s +tttg: c141/289 lr:0.000522 t:10.6s +tttg: c142/289 lr:0.000516 t:10.6s +tttg: c143/289 lr:0.000511 t:10.7s +tttg: c144/289 lr:0.000505 t:10.8s +tttg: c145/289 lr:0.000500 t:10.9s +tttg: c146/289 lr:0.000495 t:10.9s +tttg: c147/289 lr:0.000489 t:11.0s +tttg: c148/289 lr:0.000484 t:11.1s +tttg: c149/289 lr:0.000478 t:11.2s +tttg: c150/289 lr:0.000473 t:11.2s +tttg: c151/289 lr:0.000467 t:11.3s +tttg: c152/289 lr:0.000462 t:11.4s +tttg: c153/289 lr:0.000456 t:11.5s +tttg: c154/289 lr:0.000451 t:11.5s +tttg: c155/289 lr:0.000446 t:11.6s +tttg: c156/289 lr:0.000440 t:11.7s +tttg: c157/289 lr:0.000435 t:11.8s +tttg: c158/289 lr:0.000429 t:11.8s +tttg: c159/289 lr:0.000424 t:11.9s +tttg: c160/289 lr:0.000419 t:12.0s +tttg: c161/289 lr:0.000413 t:12.1s +tttg: c162/289 lr:0.000408 t:12.1s +tttg: c163/289 lr:0.000402 t:12.2s +tttg: c164/289 lr:0.000397 t:12.3s +tttg: c165/289 lr:0.000392 t:12.4s +tttg: c166/289 lr:0.000386 t:12.4s +tttg: c167/289 lr:0.000381 t:12.5s +tttg: c168/289 lr:0.000376 t:12.6s +tttg: c169/289 lr:0.000371 t:12.7s +tttg: c170/289 lr:0.000365 t:12.7s +tttg: c171/289 lr:0.000360 t:12.8s +tttg: c172/289 lr:0.000355 t:12.9s +tttg: c173/289 lr:0.000350 t:13.0s +tttg: c174/289 lr:0.000344 t:13.0s +tttg: c175/289 lr:0.000339 t:13.1s +tttg: c176/289 lr:0.000334 t:13.2s +tttg: c177/289 lr:0.000329 t:13.3s +tttg: c178/289 lr:0.000324 t:13.3s +tttg: c179/289 lr:0.000319 t:13.4s +tttg: c180/289 lr:0.000314 t:13.5s +tttg: c181/289 lr:0.000309 t:13.6s +tttg: c182/289 lr:0.000304 t:13.6s +tttg: c183/289 lr:0.000299 t:13.7s +tttg: c184/289 lr:0.000294 t:13.8s +tttg: c185/289 lr:0.000289 t:13.9s +tttg: c186/289 lr:0.000284 t:13.9s +tttg: c187/289 lr:0.000279 t:14.0s +tttg: c188/289 lr:0.000274 t:14.1s +tttg: c189/289 lr:0.000269 t:14.2s +tttg: c190/289 lr:0.000264 t:14.2s +tttg: c191/289 lr:0.000260 t:14.3s +tttg: c192/289 lr:0.000255 t:14.4s +tttg: c193/289 lr:0.000250 t:14.5s +tttg: c194/289 lr:0.000245 t:14.5s +tttg: c195/289 lr:0.000241 t:14.6s +tttg: c196/289 lr:0.000236 t:14.7s +tttg: c197/289 lr:0.000231 t:14.8s +tttg: c198/289 lr:0.000227 t:14.8s +tttg: c199/289 lr:0.000222 t:14.9s +tttg: c200/289 lr:0.000218 t:15.0s +tttg: c201/289 lr:0.000213 t:15.1s +tttg: c202/289 lr:0.000209 t:15.1s +tttg: c203/289 lr:0.000204 t:15.2s +tttg: c204/289 lr:0.000200 t:15.3s +tttg: c205/289 lr:0.000196 t:15.4s +tttg: c206/289 lr:0.000191 t:15.4s +tttg: c207/289 lr:0.000187 t:15.5s +tttg: c208/289 lr:0.000183 t:15.6s +tttg: c209/289 lr:0.000179 t:15.6s +tttg: c210/289 lr:0.000174 t:15.7s +tttg: c211/289 lr:0.000170 t:15.8s +tttg: c212/289 lr:0.000166 t:15.9s +tttg: c213/289 lr:0.000162 t:16.0s +tttg: c214/289 lr:0.000158 t:16.0s +tttg: c215/289 lr:0.000154 t:16.1s +tttg: c216/289 lr:0.000150 t:16.2s +tttg: c217/289 lr:0.000146 t:16.3s +tttg: c218/289 lr:0.000143 t:16.3s +tttg: c219/289 lr:0.000139 t:16.4s +tttg: c220/289 lr:0.000135 t:16.5s +tttg: c221/289 lr:0.000131 t:16.5s +tttg: c222/289 lr:0.000128 t:16.6s +tttg: c223/289 lr:0.000124 t:16.7s +tttg: c224/289 lr:0.000121 t:16.8s +tttg: c225/289 lr:0.000117 t:16.9s +tttg: c226/289 lr:0.000113 t:16.9s +tttg: c227/289 lr:0.000110 t:17.0s +tttg: c228/289 lr:0.000107 t:17.1s +tttg: c229/289 lr:0.000103 t:17.2s +tttg: c230/289 lr:0.000100 t:17.2s +tttg: c231/289 lr:0.000097 t:17.3s +tttg: c232/289 lr:0.000094 t:17.4s +tttg: c233/289 lr:0.000090 t:17.5s +tttg: c234/289 lr:0.000087 t:17.5s +tttg: c235/289 lr:0.000084 t:17.6s +tttg: c236/289 lr:0.000081 t:17.7s +tttg: c237/289 lr:0.000078 t:17.8s +tttg: c238/289 lr:0.000075 t:17.8s +tttg: c239/289 lr:0.000073 t:17.9s +tttg: c240/289 lr:0.000070 t:18.0s +tttg: c241/289 lr:0.000067 t:18.1s +tttg: c242/289 lr:0.000064 t:18.1s +tttg: c243/289 lr:0.000062 t:18.2s +tttg: c244/289 lr:0.000059 t:18.3s +tttg: c245/289 lr:0.000056 t:18.4s +tttg: c246/289 lr:0.000054 t:18.5s +tttg: c247/289 lr:0.000052 t:18.5s +tttg: c248/289 lr:0.000049 t:18.6s +tttg: c249/289 lr:0.000047 t:18.7s +tttg: c250/289 lr:0.000045 t:18.7s +tttg: c251/289 lr:0.000042 t:18.8s +tttg: c252/289 lr:0.000040 t:18.9s +tttg: c253/289 lr:0.000038 t:19.0s +tttg: c254/289 lr:0.000036 t:19.0s +tttg: c255/289 lr:0.000034 t:19.1s +tttg: c256/289 lr:0.000032 t:19.2s +tttg: c257/289 lr:0.000030 t:19.3s +tttg: c258/289 lr:0.000028 t:19.4s +tttg: c259/289 lr:0.000027 t:19.4s +tttg: c260/289 lr:0.000025 t:19.5s +tttg: c261/289 lr:0.000023 t:19.6s +tttg: c262/289 lr:0.000022 t:19.6s +tttg: c263/289 lr:0.000020 t:19.7s +tttg: c264/289 lr:0.000018 t:19.8s +tttg: c265/289 lr:0.000017 t:19.9s +tttg: c266/289 lr:0.000016 t:20.0s +tttg: c267/289 lr:0.000014 t:20.0s +tttg: c268/289 lr:0.000013 t:20.1s +tttg: c269/289 lr:0.000012 t:20.2s +tttg: c270/289 lr:0.000011 t:20.3s +tttg: c271/289 lr:0.000010 t:20.3s +tttg: c272/289 lr:0.000009 t:20.4s +tttg: c273/289 lr:0.000008 t:20.5s +tttg: c274/289 lr:0.000007 t:20.6s +tttg: c275/289 lr:0.000006 t:20.7s +tttg: c276/289 lr:0.000005 t:20.7s +tttg: c277/289 lr:0.000004 t:20.8s +tttg: c278/289 lr:0.000004 t:20.9s +tttg: c279/289 lr:0.000003 t:21.0s +tttg: c280/289 lr:0.000002 t:21.0s +tttg: c281/289 lr:0.000002 t:21.1s +tttg: c282/289 lr:0.000001 t:21.2s +tttg: c283/289 lr:0.000001 t:21.2s +tttg: c284/289 lr:0.000001 t:21.3s +tttg: c285/289 lr:0.000000 t:21.4s +tttg: c286/289 lr:0.000000 t:21.5s +tttg: c287/289 lr:0.000000 t:21.5s +tttg: c288/289 lr:0.000000 t:21.6s +ttpr: phase:3/3 t:468.1s +ttp: b731/782 bl:2.3393 bb:1.0433 rl:2.2933 rb:1.0678 dl:2377-2414 gd:1 +ttp: b723/782 bl:2.2926 bb:1.0292 rl:2.2933 rb:1.0653 dl:2185-2203 gd:1 +ttp: b716/782 bl:2.2477 bb:1.0386 rl:2.2907 rb:1.0637 dl:2054-2069 gd:1 +ttp: b705/782 bl:2.3600 bb:1.0608 rl:2.2941 rb:1.0636 dl:1885-1898 gd:1 +ttp: b701/782 bl:2.3053 bb:1.0336 rl:2.2947 rb:1.0622 dl:1835-1847 gd:1 +ttp: b689/782 bl:2.3828 bb:1.0728 rl:2.2983 rb:1.0626 dl:1706-1715 gd:1 +ttp: b685/782 bl:2.2948 bb:1.0270 rl:2.2981 rb:1.0612 dl:1665-1675 gd:1 +ttp: b678/782 bl:2.3419 bb:1.0251 rl:2.2997 rb:1.0598 dl:1601-1610 gd:1 +ttp: b666/782 bl:2.4056 bb:1.0618 rl:2.3032 rb:1.0599 dl:1507-1514 gd:1 +ttp: b659/782 bl:2.3017 bb:1.0387 rl:2.3031 rb:1.0592 dl:1459-1466 gd:1 +ttp: b651/782 bl:2.3859 bb:1.0427 rl:2.3055 rb:1.0587 dl:1406-1411 gd:1 +ttp: b642/782 bl:2.3170 bb:1.0374 rl:2.3058 rb:1.0582 dl:1349-1356 gd:1 +ttp: b633/782 bl:2.2716 bb:1.0207 rl:2.3049 rb:1.0572 dl:1297-1302 gd:1 +ttp: b624/782 bl:2.3487 bb:1.0632 rl:2.3060 rb:1.0573 dl:1249-1255 gd:1 +ttp: b618/782 bl:2.3965 bb:1.0666 rl:2.3080 rb:1.0576 dl:1216-1221 gd:1 +ttp: b610/782 bl:2.2426 bb:1.0028 rl:2.3066 rb:1.0564 dl:1177-1182 gd:1 +ttp: b604/782 bl:2.3734 bb:1.0418 rl:2.3080 rb:1.0561 dl:1150-1154 gd:1 +ttp: b594/782 bl:2.3320 bb:1.0646 rl:2.3084 rb:1.0562 dl:1107-1110 gd:1 +ttp: b586/782 bl:2.2510 bb:1.0293 rl:2.3074 rb:1.0557 dl:1073-1076 gd:1 +ttp: b579/782 bl:2.3396 bb:1.0341 rl:2.3079 rb:1.0553 dl:1044-1048 gd:1 +ttp: b575/782 bl:2.2811 bb:1.0381 rl:2.3075 rb:1.0550 dl:1029-1033 gd:1 +ttp: b568/782 bl:2.3537 bb:1.0805 rl:2.3082 rb:1.0555 dl:1004-1007 gd:1 +ttp: b561/782 bl:2.2398 bb:1.0103 rl:2.3072 rb:1.0547 dl:979-983 gd:1 +ttp: b551/782 bl:2.3273 bb:1.0518 rl:2.3075 rb:1.0547 dl:946-949 gd:1 +ttp: b545/782 bl:2.3282 bb:1.0295 rl:2.3078 rb:1.0543 dl:927-930 gd:1 +ttp: b536/782 bl:2.3104 bb:1.0404 rl:2.3078 rb:1.0541 dl:899-902 gd:1 +ttp: b515/782 bl:2.3412 bb:1.0425 rl:2.3082 rb:1.0540 dl:838-841 gd:1 +ttp: b508/782 bl:2.3835 bb:1.0479 rl:2.3091 rb:1.0539 dl:817-820 gd:1 +ttp: b501/782 bl:2.3728 bb:1.0483 rl:2.3099 rb:1.0538 dl:799-802 gd:1 +ttp: b493/782 bl:2.3531 bb:1.0387 rl:2.3104 rb:1.0537 dl:778-780 gd:1 +ttp: b485/782 bl:2.2904 bb:1.0318 rl:2.3102 rb:1.0534 dl:759-761 gd:1 +ttp: b477/782 bl:2.3951 bb:1.0315 rl:2.3111 rb:1.0532 dl:740-742 gd:1 +ttp: b469/782 bl:2.3207 bb:1.0206 rl:2.3112 rb:1.0528 dl:721-724 gd:1 +ttp: b463/782 bl:2.3068 bb:1.0380 rl:2.3111 rb:1.0527 dl:708-710 gd:1 +ttp: b457/782 bl:2.2485 bb:1.0294 rl:2.3105 rb:1.0525 dl:695-697 gd:1 +ttp: b451/782 bl:2.3973 bb:1.0847 rl:2.3113 rb:1.0528 dl:682-685 gd:1 +ttp: b444/782 bl:2.3013 bb:1.0603 rl:2.3112 rb:1.0528 dl:668-670 gd:1 +ttp: b438/782 bl:2.3048 bb:1.0518 rl:2.3112 rb:1.0528 dl:655-657 gd:1 +ttp: b429/782 bl:2.2382 bb:1.0209 rl:2.3106 rb:1.0525 dl:638-640 gd:1 +ttp: b421/782 bl:2.2867 bb:1.0012 rl:2.3104 rb:1.0521 dl:622-624 gd:1 +ttp: b414/782 bl:2.2026 bb:1.0085 rl:2.3095 rb:1.0518 dl:609-611 gd:1 +ttp: b406/782 bl:2.3010 bb:1.0596 rl:2.3094 rb:1.0518 dl:593-595 gd:1 +ttp: b398/782 bl:2.2357 bb:0.9984 rl:2.3089 rb:1.0514 dl:579-581 gd:1 +ttp: b390/782 bl:2.3446 bb:1.0563 rl:2.3091 rb:1.0515 dl:564-566 gd:1 +ttp: b381/782 bl:2.4191 bb:1.0996 rl:2.3099 rb:1.0518 dl:549-550 gd:1 +ttp: b374/782 bl:2.2883 bb:1.0316 rl:2.3098 rb:1.0516 dl:537-538 gd:1 +ttp: b366/782 bl:2.3354 bb:1.0699 rl:2.3099 rb:1.0518 dl:524-525 gd:1 +ttp: b358/782 bl:2.3992 bb:1.0767 rl:2.3105 rb:1.0519 dl:510-512 gd:1 +ttp: b350/782 bl:2.3182 bb:1.0535 rl:2.3105 rb:1.0519 dl:497-498 gd:1 +ttp: b342/782 bl:2.3683 bb:1.1204 rl:2.3109 rb:1.0523 dl:485-486 gd:1 +ttp: b334/782 bl:2.3688 bb:1.0648 rl:2.3112 rb:1.0524 dl:472-474 gd:1 +ttp: b326/782 bl:2.2961 bb:1.0515 rl:2.3111 rb:1.0524 dl:461-462 gd:1 +ttp: b318/782 bl:2.3368 bb:1.0679 rl:2.3113 rb:1.0525 dl:448-450 gd:1 +ttp: b310/782 bl:2.2858 bb:1.0958 rl:2.3111 rb:1.0527 dl:437-438 gd:1 +ttp: b302/782 bl:2.2940 bb:1.0550 rl:2.3111 rb:1.0527 dl:424-426 gd:1 +ttp: b294/782 bl:2.3079 bb:1.0781 rl:2.3110 rb:1.0528 dl:412-414 gd:1 +ttp: b286/782 bl:2.3680 bb:1.1046 rl:2.3113 rb:1.0531 dl:400-402 gd:1 +ttp: b278/782 bl:2.2520 bb:1.0549 rl:2.3110 rb:1.0531 dl:389-391 gd:1 +ttp: b270/782 bl:2.3069 bb:1.0555 rl:2.3110 rb:1.0531 dl:379-380 gd:1 +ttp: b262/782 bl:2.4400 bb:1.1415 rl:2.3116 rb:1.0535 dl:369-370 gd:1 +ttp: b254/782 bl:2.3487 bb:1.1134 rl:2.3117 rb:1.0537 dl:358-360 gd:1 +ttp: b246/782 bl:2.3436 bb:1.0954 rl:2.3119 rb:1.0539 dl:349-350 gd:1 +ttp: b238/782 bl:2.3102 bb:1.1018 rl:2.3119 rb:1.0540 dl:338-340 gd:1 +ttp: b230/782 bl:2.4474 bb:1.1484 rl:2.3124 rb:1.0544 dl:329-330 gd:1 +ttp: b222/782 bl:2.3724 bb:1.1089 rl:2.3126 rb:1.0546 dl:320-321 gd:1 +ttp: b214/782 bl:2.3336 bb:1.1167 rl:2.3127 rb:1.0548 dl:310-312 gd:1 +ttp: b207/782 bl:2.3363 bb:1.1228 rl:2.3127 rb:1.0550 dl:303-304 gd:1 +ttp: b197/782 bl:2.3396 bb:1.1060 rl:2.3128 rb:1.0552 dl:292-294 gd:1 +ttp: b189/782 bl:2.4077 bb:1.1360 rl:2.3131 rb:1.0554 dl:283-284 gd:1 +ttp: b181/782 bl:2.3302 bb:1.1252 rl:2.3132 rb:1.0556 dl:275-276 gd:1 +ttp: b172/782 bl:2.5140 bb:1.1526 rl:2.3138 rb:1.0559 dl:266-267 gd:1 +ttp: b165/782 bl:2.3348 bb:1.1088 rl:2.3138 rb:1.0561 dl:260-260 gd:1 +ttp: b158/782 bl:2.3356 bb:1.1043 rl:2.3139 rb:1.0562 dl:253-254 gd:1 +ttp: b150/782 bl:2.3307 bb:1.1067 rl:2.3140 rb:1.0563 dl:245-246 gd:1 +ttp: b141/782 bl:2.4556 bb:1.1206 rl:2.3143 rb:1.0565 dl:236-237 gd:1 +ttp: b133/782 bl:2.3595 bb:1.1318 rl:2.3144 rb:1.0567 dl:229-230 gd:1 +ttp: b124/782 bl:2.3657 bb:1.1556 rl:2.3146 rb:1.0569 dl:220-222 gd:1 +ttp: b117/782 bl:2.4743 bb:1.2023 rl:2.3149 rb:1.0572 dl:214-215 gd:1 +ttp: b110/782 bl:2.3620 bb:1.1210 rl:2.3151 rb:1.0574 dl:208-208 gd:1 +ttp: b103/782 bl:2.4338 bb:1.1715 rl:2.3153 rb:1.0576 dl:202-202 gd:1 +ttp: b94/782 bl:2.5516 bb:1.2057 rl:2.3158 rb:1.0579 dl:193-194 gd:1 +ttp: b85/782 bl:2.5024 bb:1.1984 rl:2.3162 rb:1.0582 dl:185-186 gd:1 +ttp: b77/782 bl:2.5039 bb:1.2299 rl:2.3166 rb:1.0585 dl:178-179 gd:1 +ttp: b69/782 bl:2.4660 bb:1.2037 rl:2.3168 rb:1.0588 dl:171-172 gd:1 +ttp: b61/782 bl:2.4397 bb:1.2076 rl:2.3171 rb:1.0590 dl:164-165 gd:1 +ttp: b53/782 bl:2.5003 bb:1.1914 rl:2.3174 rb:1.0592 dl:156-157 gd:1 +ttp: b45/782 bl:2.4477 bb:1.1712 rl:2.3176 rb:1.0594 dl:148-149 gd:1 +ttp: b37/782 bl:2.5714 bb:1.2120 rl:2.3180 rb:1.0596 dl:140-141 gd:1 +ttp: b29/782 bl:2.6203 bb:1.2122 rl:2.3184 rb:1.0598 dl:132-133 gd:1 +ttp: b21/782 bl:2.6003 bb:1.2267 rl:2.3188 rb:1.0600 dl:123-124 gd:1 +ttp: b13/782 bl:2.6787 bb:1.2136 rl:2.3192 rb:1.0602 dl:112-114 gd:1 +ttp: b6/782 bl:2.7123 bb:1.2093 rl:2.3196 rb:1.0604 dl:99-101 gd:1 +quantized_ttt_phased val_loss:2.31451121 val_bpb:1.05764263 eval_time:575915ms +total_eval_time:575.9s From ab06b03082afb7f96f1e5c6f2ec5e808a1da524e Mon Sep 17 00:00:00 2001 From: ndokutovich Date: Thu, 30 Apr 2026 17:48:04 +0300 Subject: [PATCH 2/2] =?UTF-8?q?submission:=20add=20Welch=20t-test=20calc?= =?UTF-8?q?=20vs=20PR=20#1855=20merged=20top=20=E2=80=94=20p=E2=89=880.006?= =?UTF-8?q?=20passes=200.25=20cutoff?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../README.md | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md index fa52e7cb3a..6e6112e756 100644 --- a/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md +++ b/records/track_10min_16mb/2026-04-30_NgramTilt_V21_LeakyReLU_1.05851/README.md @@ -40,6 +40,26 @@ The static n-gram hint table is built in a single L→R causal pass over val tok | PR #1956 (AayushBaniya2006) | 1.06044 | +0.00193 | | PR #1908 (romeerp) | 1.06081 | +0.00230 | +## Statistical significance vs merged leaderboard top (PR #1902 policy) + +Per the chronological frontier policy adopted in [PR #1902](https://github.com/openai/parameter-golf/pull/1902) (one-sided Welch's two-sample t-test, **p < 0.25** progression cutoff), this submission is tested against the current merged top row, **PR #1855**, using its 6-sample evidence (3 submitted + 3 independent reproduction by @okezue, [#1855 comment](https://github.com/openai/parameter-golf/pull/1855#issuecomment-4336629746)): + +| Submission | n | mean val_bpb | std (n−1) | +|------------|--:|-------------:|----------:| +| **This submission (#1967)** | 3 | 1.05851479 | 0.000762 | +| PR #1855 (merged top, 6-sample) | 6 | 1.06075500 | 0.000933 | + +``` +mean_diff = 0.00224 BPB (~0.00488 nats) +SE = sqrt(0.000762²/3 + 0.000933²/6) + = 0.000582 +t-stat = 3.850 +Welch df = 5.00 +one-sided p ≈ 0.0060 +``` + +**p ≈ 0.0060**, vs the 0.25 cutoff: **passes by ~42× margin**. The 3-seed sample is enough on its own to establish significance against the merged frontier; independent reproduction at any seed would further tighten the bound. + ## System dependencies - gcc + lrzip (`apt-get install -y build-essential lrzip` on Debian/Ubuntu).