Qwen 3 1.7B Offline tool calling Android by vkkhare · Pull Request #165 · NimbleEdge/deliteAI

vkkhare · 2025-07-23T10:01:26Z

Here's a comprehensive PR description for all the changes:

Description

This PR adds comprehensive support for Qwen 3 1.7B model with tool calling capabilities, uses native ONNX runtime isntead of onnxruntime_genai, and adds the export script for model enhancements.

Key Features Added

Adds mlc/tokenizer-cpp as third party dependency which brings cpp bindings for rust implementations in HF
Adds Qwen 3 1.7B support with tool calling in nimblenet_py with module support via zip
FP16 data type support with proper uint16_t handling in binary operations
ONNX model export with simplified token_id predictions instead of logits

Cpp bindings

// - dist/tokenizer.json
void HuggingFaceTokenizerExample() {
  // Read blob from file.
  auto blob = LoadBytesFromFile("dist/tokenizer.json");
  // Note: all the current factory APIs takes in-memory blob as input.
  // This gives some flexibility on how these blobs can be read.
  auto tok = Tokenizer::FromBlobJSON(blob);
  std::string prompt = "What is the capital of Canada?";
  // call Encode to turn prompt into token ids
  std::vector<int> ids = tok->Encode(prompt);
  // call Decode to turn ids into string
  std::string decoded_prompt = tok->Decode(ids);
}

void SentencePieceTokenizerExample() {
  // Read blob from file.
  auto blob = LoadBytesFromFile("dist/tokenizer.model");
  // Note: all the current factory APIs takes in-memory blob as input.
  // This gives some flexibility on how these blobs can be read.
  auto tok = Tokenizer::FromBlobSentencePiece(blob);
  std::string prompt = "What is the capital of Canada?";
  // call Encode to turn prompt into token ids
  std::vector<int> ids = tok->Encode(prompt);
  // call Decode to turn ids into string
  std::string decoded_prompt = tok->Decode(ids);
}

Delitepy Bindings

from delitepy import tokenizers

tokenizer = tokenizers.from_json(<tokenizer.json>)
token_ids = tokenizer.encode(text)
input_ids = nm.tensor([token_ids], "int64")
response = tokenizer.decode(input_ids)

FP16 Support

Enhanced binary operations now support FP16 data type through uint16_t:

# FP16 tensors now supported in all binary operations
fp16_tensor = nm.tensor(data, "float16")  # Uses uint16_t internally
result = fp16_tensor + other_tensor  # Works with add, sub, mult, div, pow, mod

Kotlin Interface

Reverse stream of generation from python and subscription in kotlin flows.

    private fun createNimbleNetTensorFromForeignFunction(fn: (String?) -> Unit) : NimbleNetTensor {
        val callbackDelitePy : DelitePyForeignFunction =  fun(input: NimbleNetTensorMap?): NimbleNetTensorMap? {
            val outputStream = input?.get("token_stream")?.data as String?
            fn(outputStream)
            return hashMapOf("result" to NimbleNetTensor(data = true, datatype = DATATYPE.BOOL, shape = intArrayOf()))
        }
        return NimbleNetTensor(data = callbackDelitePy, datatype = DATATYPE.FUNCTION, shape = intArrayOf())
    }

    suspend fun feedInput(input: String, isVoiceInitiated: Boolean, callback: (String?)->Unit) : String? {
        val res = NimbleNet.runMethod(
            "prompt_for_tool_calling",
            inputs = hashMapOf(
                "prompt" to NimbleNetTensor(input, DATATYPE.STRING, null),
                "output_stream_callback" to  createNimbleNetTensorFromForeignFunction(callback)
            ),
        )
        assert(res.status) { "NimbleNet.runMethod('prompt_for_tool_calling') failed with status: ${res.status}" }
        return res.payload?.get("results")?.data as String?
    }

Qwen Demo Setup

The Qwen demo uses a zip-based modules in delitePy

cd nimblenet_py/simulation_assets/qwen_demo
zip -j qwen_modules.zip qwen_modules/*.py
python run_demo.py

Tool Calling Features

Multi-step conversation support with automatic tool execution
JSON-based tool calling with <tool_call> XML tags
Built-in tools: weather, math calculator, time, location
Error handling and recovery for failed tool calls

Checklist:

I have added tests that prove my fix is effective or that my feature works
Has user-facing changes. This may include API or behavior changes and performance improvements, etc

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>

add tokenizer-cpp add jinja template for qwen and dict support for tokenizer:from_json Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

Signed-off-by: Varun Khare <varunkhare1234@gmail.com>

jpuneet · 2025-07-25T09:58:36Z

coreruntime/delitepy/library_stubs/setup.py

-            f"{library_stubs_dir}/src_gen",
-            coreruntime_dir,
-        ],
+        ["cp", "-r", f"{library_stubs_dir}/src_template", f"{library_stubs_dir}/src_gen"],


Is this accidental change?

cp -R is the portable form, compared to cp -r.

jpuneet · 2025-07-25T10:21:16Z

coreruntime/nimblenet/task_manager/operators/include/operator_types.hpp

 * Compares two data types and returns the one with higher precedence
 * for automatic type promotion in operations. The precedence order is:
- * BOOLEAN (0) < INT32 (3) < INT64 (4) < FLOAT (5) < DOUBLE (6)
+ * BOOLEAN (0) < INT32 (3) < INT64 (4) < FLOAT16 (4.5) < FLOAT (5) < DOUBLE (6)
 *
 * @param dataType1 First data type to compare
 * @param dataType2 Second data type to compare
 * @return The data type with higher precedence
 */
 inline int get_max_dataType(int dataType1, int dataType2) {
  std::map<int, int> _typeScore = {
-      {DATATYPE::BOOLEAN, 0}, {DATATYPE::INT32, 3},  {DATATYPE::INT64, 4},
-      {DATATYPE::FLOAT, 5},   {DATATYPE::DOUBLE, 6},
+      {DATATYPE::BOOLEAN, 0}, {DATATYPE::INT32, 3},    {DATATYPE::INT64, 4},
+      {DATATYPE::FLOAT16, 45}, {DATATYPE::FLOAT, 5},   {DATATYPE::DOUBLE, 6},
  };


This doesn't look correct.

yup will update this

# This is the 1st commit message: add support for dictionary indexing in onnx executor Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> # This is the commit message NimbleEdge#2: add dictionary input support to model.run() for kv_cache Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> # This is the commit message NimbleEdge#3: add fp16 support in delitepy Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> # This is the commit message NimbleEdge#4: Qwen with tool calling functional in delitePy Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> # This is the commit message NimbleEdge#5: Implemented enumerate and next in DelitePy (NimbleEdge#162) * Implemented enumerate and next in DelitePy Signed-off-by: Atul Jain <atul.jain@nimbleedgehq.ai> * Cosmetics Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai> --------- Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai> Co-authored-by: Atul Jain <atul.jain@nimbleedgehq.ai> Co-authored-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>

Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai> modular qwen demo structure Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> wip handle attention cache Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai> resume from last postion for multi-step run Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

initiate qwen 1.7 Agent scripts

c2e00db

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>

vkkhare self-assigned this Jul 23, 2025

vkkhare force-pushed the tokenizers branch from 4dc35e9 to 603406e Compare July 23, 2025 10:09

add onnx tests and lfm models

c7c425c

add tokenizer-cpp add jinja template for qwen and dict support for tokenizer:from_json Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

vkkhare force-pushed the tokenizers branch from 603406e to c7c425c Compare July 23, 2025 10:15

Merge branch 'main' into tokenizers

d61c475

Signed-off-by: Varun Khare <varunkhare1234@gmail.com>

jpuneet reviewed Jul 25, 2025

View reviewed changes

vkkhare force-pushed the tokenizers branch from f994877 to cd611c4 Compare July 29, 2025 18:28

vkkhare and others added 5 commits July 30, 2025 00:06

Redo deliteai.dev website (NimbleEdge#163)

c274f0c

Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>

Upgrade Python version in GitHub workflows (NimbleEdge#166)

b42f0c6

Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>

udpate tokenizers submodule

d342cad

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>

vkkhare force-pushed the tokenizers branch from cd611c4 to d342cad Compare July 29, 2025 18:37

vkkhare marked this pull request as ready for review July 29, 2025 20:29

vkkhare requested review from a team and nrjpoddar as code owners July 29, 2025 20:29

vkkhare changed the title ~~Adding HF_Tokenizers support to delitepy~~ Qwen 3 1.7B Offline tool Calling Android Jul 29, 2025

vkkhare changed the title ~~Qwen 3 1.7B Offline tool Calling Android~~ Qwen 3 1.7B Offline tool calling Android Jul 29, 2025

vkkhare added 3 commits July 30, 2025 02:46

Merge branch 'main' into tokenizers

53c8aa5

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

Merge branch 'main' into tokenizers

7882b83

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

Merge branch 'main' into tokenizers

c04611d

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen 3 1.7B Offline tool calling Android#165

Qwen 3 1.7B Offline tool calling Android#165
vkkhare wants to merge 11 commits intoNimbleEdge:mainfrom
vkkhare:tokenizers

vkkhare commented Jul 23, 2025 •

edited

Loading

Uh oh!

jpuneet Jul 25, 2025

Uh oh!

vkkhare Jul 29, 2025

Uh oh!

jpuneet Jul 25, 2025

Uh oh!

vkkhare Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkkhare commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Features Added

Cpp bindings

Delitepy Bindings

FP16 Support

Kotlin Interface

Qwen Demo Setup

Tool Calling Features

Checklist:

Uh oh!

jpuneet Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

vkkhare Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

jpuneet Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

vkkhare Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkkhare commented Jul 23, 2025 •

edited

Loading