Skip to content

Isocroft-syntax.md#3

Open
ladaposamuel wants to merge 1 commit intomasterfrom
isocroft
Open

Isocroft-syntax.md#3
ladaposamuel wants to merge 1 commit intomasterfrom
isocroft

Conversation

@ladaposamuel
Copy link
Copy Markdown
Collaborator

No description provided.

Comment on lines +58 to +62
- The `const` keyword is used for subject which value cannot change.
```cpp
const name = "John"
name = "Doe" // Invalid!
```
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't support using const for constants because it is verbose and would be tiresome to keep typing since it is going to be used a lot code. let as a constant is not a new syntax on the block. They have been popularised by Swift and Rust. We could also use val as popularised by Kotlin and Scala, but I think let is more expressive.

In modern languages, constants tend to be used quite a lot, because there is these notion of immutability by default, therefore these languages try to keep the keyword as short as possible. Usually as three letter words. val and letbeing the notable ones.

If you do a quick grep on any repo with code written in modern languages, you will notice constants are declared way more than variables.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the case you present for let OR val is quite subjective. Maybe because you prefer or are more used to let than const. Nonetheless, const (i believe) is more descriptive of the use. However, whatever we choose for Ratio will have to be debated more on concrete and objective reason than subjective preference.

Copy link
Copy Markdown

@appcypher appcypher Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"let is used by several modern languages" is a good enough objective reason to choose it. It is true that I prefer let over const but if const is what the majority wants, then we go with it.

Comment on lines +31 to +37
* int > a sequence of symbols from the standard number set (e.g. 2489)
* long > two or more `int` type(s) put together as a sequence (e.g. 1267383994747489948947333344747)
* float >
* char > a single symbol from the standard chracter set ('$')
* str > two or more `char` type(s) put together ('$rw32')
* bool > true, false
* byte > |01011010, |10010011
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I advocate consistency. I suggest that builtin types and user-defined types look the same. In short, all types ought to start with uppercase letters. There shouldn't be a distinction.

Part of this principle is reinforced by having operator overloading to blur the difference between builtin types and user-defined types. Whatever you can do with a builtin type, you should be able to do with a user-defined one. Builtin types shouldn't have special properties or names.

For example, why should bool have a special connotation from a user-defined type named Switch

type Switch { 
    value: byte,
}

They are not really different. They occupy the same space in memory.

Another point is that, the primitive types defined above are not enough for a statically-typed language. Statically-typed languages usually provide signed and unsigned variations of integers and integers with different bit-width for low-level flexibility.

I also have a case against the str data type, but that will be discussed below.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A byte is 8 bits (10010110) and a bool (1 or 0 | true or false) is a flag or 1 bit. I need more clarification on how you arrived at the conclusion that byte and bool occupy the same memory space.

Yes, i did intentionally leave that out as the focus was syntax and not semantics or compiler specifics. I would be happy to add info about signed and unsigned integers.

About str, i actually restrained myself a lot from including it. i knew that an array of char type could do the job as it is like in C. However, i need much more ideas and discussion around this

Copy link
Copy Markdown

@appcypher appcypher Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is in fact a byte. Practically all modern CPUs are byte-addressable.

The least number of bits you can load from or store to in memory is 8 bits. So bools are stored as bytes.

https://stackoverflow.com/questions/4626815/why-is-a-boolean-1-byte-and-not-1-bit-of-size

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... well yes you are right (in terms of addressable memory and not actual storage actually). You do know that the remaining seven bits are useless especially if we can make use of bit fields in C.

Can't we ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean actual storage. bools are stored as bytes. You need bitwise operations to access a bit in a byte. Bitwise operations are expensive. It is faster to just access a byte, so bools are stored as bytes and this what most languages do.

Bitfields are opt-in feature available to the user through bitwise ops.

This is how C++ does it. Same goes for practically every other statically-typed languages. Some languages even store it in spaces larger than a byte. In C'sstdbool.h, bool is represented by an int which is larger than a byte. https://sites.uclouvain.be/SystInfo/usr/include/stdbool.h.html

- Subjects can be explicitly type-annotated.

```js
var int identification = 60
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't support this annotation method because it is confusing.

var x = 23

x is an identifier here.

var x y = 23

It is easy to confuse x as the identifier here as well.

A punctuator would help vividly show the separation between an identifier and a type.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how you mean. There are languages that do this. The int token is a reserved keyword in such languages and can't be used as a user-defined identifier. I would need evidence from you on such confusion that might arise from this syntax production. I do however agree with you on the need for a punctuator. I will work on that

Copy link
Copy Markdown

@appcypher appcypher Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example I gave

var x y = 23

Either x or y are both valid identifiers. So it is easy to confuse which one is the type and which one is the subject name.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not support the syntax var int identification = 60 if we are going with that syntax the declaration keyword var will just be useless and just extra typing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright... we do agree on one thing though. The variables defined / declared can have implicit OR explicit static type(s). I think we should go with @appcypher 's proposal on this

>mutable arrays (lists) are defined using `var`
```js

var int<4> list = [1, 2, 3, 4]
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax int<8> looks special to the array type. So I am assuming this creates a collection of type array with 8 elements.

What if I have my own collection type, say hashset. How do I create a collection of type hashset with 8 elements. Can I take advantage of this special syntax?

Is there a way to generalize this to other collection types?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your first question: Yes

To your second question: No... but this could be further discussed

>the pipe operator `~>` can also be used to sort the resulting list
```elixir

var int<8> numbers = [1,3,5,7] ++ [0,2,4,6] ~> sort
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an array, which I assume in your proposal means it has a fixed length, then how do you determine statically (i.e., at compile time) that this array will have a length of 8 items if you do runtime concatenation operation on it.

If I assumed wrong and your arrays don't have fixed length at compile-time, then what would be the point of specifying it?

Will there be runtime checks to make sure an array's length stays the same, which I frankly think would be inefficient considering we have runtime bounds checking to worry about as well.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct... At compile time, this program will raise an error in a language like Java or C++ as at compile time the size of the array cannot be determined until runtime.

But i wrote that line in the hope that we could design a multipass and AST-driven compiler such that the AST can be modified at the entry point into the Code Generation (Backend) to find expressions that can be "executed" or "reduced statically" to a literal value (So in the example above, the AST nodes are reduced to "[" "1" "," "3" "," "5" "," "7" "," "0" "," "2" "," "4" "," "6" "]" "~" ">" "sort").

However this feature will work for only literal primitives and not non-literal primitives to avoid unbearable/excessive overhead to discover such expressions and "reduce" them. There are other use cases for such a feature... but this is just a simple example.


```js

const [...chars] = "Happy Birthday!"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will chars type and value be after the destructure?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of the feature i spoke of. The literal ( "Happy Birthday!" ) on the RHS will be "reduced statically" into the LHS. on the AST, the "Happy Birthday!" will be "pre-executed" and we will have the below:

chars -> identifier
char<15> -> type
[ "H", "a", "p", "p", "y", " ", "B", "i", "r", "t", "h", "d", "a", "y", "!" ] -> value

Copy link
Copy Markdown

@isocroft isocroft Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature allows users/programmers to write less verbose code. I am not naive however and i believe this feature might be very difficult or challenging to implement yet however worth discussing and considering.

Copy link
Copy Markdown

@appcypher appcypher Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, however, it is going to be inconsistent with the semantic of other destructuring syntax.

... is like a rest or spread syntax in JavaScript.

let [first, ...remaining] = [1, 2, 3]

first == 1 // type Int
remaining == [2, 3] // type List

So if at all we want that string destructuring syntax, then it should follow the same semantics.

let [first, ...remaining] = "abc"

first == 'a' // type Char
remaining == "bc" // type String

However, I have an issue with that syntax, because it is already reserved for list destructuring. It would be nice to find another nice syntax for string destructuring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @appcypher.

Ok i will work on that and bring my alternate proposal for string destructuring back here by tomorrow (Saturday [ 16th, Nov 2019 ]) evening so we can have further discussions.

```cargo

var num = 5
until num === 0 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the triple equal operator for? It wasn't explained.


```js

var char<5,4> names = ["Steve", "Kunle", "Chris", "Azeez"] // a list of strings with a length of 4 and each string a length of 5
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have some sort of nested length specification.

A collection has 4 elements. In turn, each element is a collection of 5 elements

How is this verified statically though? Is there going to be runtime checks looping deep into collections to verify their length?

If that isn't the case, why specify them statically if they cannot be statically-verified?

Comment on lines +165 to +168
* `&&` (and)
* `and` (and)
* `||` (or)
* `or` (or)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I advocate one obvious way. I'd say we go with one here. I don't see the point of having the pairs. I'd go with the keyword operators, because they are explicit.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... i think that is fair. and , or are okay

* long > two or more `int` type(s) put together as a sequence (e.g. 1267383994747489948947333344747)
* float >
* char > a single symbol from the standard chracter set ('$')
* str > two or more `char` type(s) put together ('$rw32')
Copy link
Copy Markdown

@appcypher appcypher Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this approach to implementing strings. Modern languages support UTF-8 strings. UTF-8 strings aren't just sequence of characters because the character (actually codepoint) bit-width varies. They have a multi-byte encoding which uses between 1 and 4 bytes per character.

Your char type cannot be 8-bit because it cannot represent the entire Unicode range.

If your char type is 16-bit, then your char can represent UTF-16 (which is what Java did), but that is really limited as well, because Unicode 2019 has codepoints that can't fit in two bytes. So most languages tend to opt for 32-bit char types, because it covers the entire Unicode.

The issue with making your string just a sequence of char types is that the encoding is either going to be very wasteful or very limited. UTF-8 solves wastage issue and gives up O(1) access. The internet (heck the world) is in support of UTF-8.

In essence, string should be a UTF-8 encoding, not an array of characters.

I'd reserve this type of implementation for ascii-only string types or go-like rune types.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are very very correct! Which is why i included the str type against all restraint on my part. I wanted to discuss with everyone all the edge cases (as you have mentioned with code-points - especially code-points from Arab or German characters) with the rest of the team and seek out the best implementation.

I believe you have just made enough case not to go with a UTF-16 or UTF-8 fixed byte encoding > multi-byte encoding perhaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants