statement-expressions and block-terminators

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Graydon Hoare
Hi,

Some of you may have noticed that in the rewrite from rustboot to rustc
we're becoming substantially more expression-language-ish. This is
mostly a result of me yielding to the preferences of other developers
(and LLVM's semantics), as well as some hint that things get much easier
in syntax extensions and calculating compile-time-constants if we permit
more "statement-ish" forms as expressions. Particularly conditionals.

We've run into a (common, seen in many other languages) sort of problem
along the way here, which is that some expressions are implicitly
ignored (or must be, due to being in an ignored context) whereas others
are not. We have a nil-type (), but we don't always have sensible rules
for forcing things to have the nil type by context.

This email is a poll of alternative solutions. I'll give two example
cases and ask people for their input on which modification of the rules
feels best.

Example case that does compile:

   A:  auto x = if (foo()) { 10; } else { 11; };

Example case that does not compile:

   B:  if (foo()) { 10; } else { "hello"; }

We can write this in rust at the moment, but in the rustc typechecking
rules it will fail to compile, because 'if' is an expression-statement,
expressions have types, and the types of the two branches (judged as the
last statement's expression value, if it's an expression, or else nil)
are of different types.

Here are some approaches to solving this example. Please pick the one
you like the most:

(1) Kick all branchy expressions out of the expression grammar, put them
back in the statement grammar. Case B will compile, and case A must be
rewritten like so:

   A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };

This is the C-with-GNU-extensions model.

(2) Hoist all statements up into the expression language and make
semicolon into a sequencing operator, with a trailing-semi ignored by
the parser. Then we need to rewrite only the second case to force unit
types in the to-be-ignored differing branches.

   B:  if (foo()) { 10; () } else { "hello"; () }

Though we'd also be *allowed* to rewrite the first case to drop the
semicolons:

   A:  auto x = if (foo() { 10 } else { 11 };

This is the Ocaml approach.

(3) A slightly weaker form of (2), which is to reformulate blocks with
the following grammar:

     block ::=  { [ stmt ; ]* expr? }

In other words, every block becomes a brace-enclosed sequence of
semicolon-terminated statements, followed by an optional expr. If the
expr is missing, it is implied as (). In this case we'd be rewriting
only the first case:

   A:  auto x = if (foo()) { 10 } else { 11 };

This is similar to the Ocaml rule in practice, except that it makes the
presence or absence of the final semicolon in a block equivalent to
ending the block with the nil type. This is a possible hazard
(especially during refactoring or editing) to users who want to write a
value-producing block but accidentally semicolon-terminate the last
expression; but it's not a huge hazard since the typechecker will tell
them the value they produced is of nil type. It just might be hit a lot.

(4) Statically determine the contexts in which an expression's value
"will be used" in an outer expression, and only typecheck those
contexts. This permits both of the examples to compile as-is, but it's
the most unorthodox approach, and poses a refactoring hazard as code may
become type-invalid when nested into an expression context that "uses"
its previously-ignored result. Again, as in (3) the typechecker will
catch these cases, but they might happen more or less often than those
in (3).

We can't think of any other options. Significant whitespace is not an
option :)

Personally my knee-jerk reaction is to embrace (1) since I like
statements anyway, but I can see plausible arguments for the other 3.
Can I get a show of hands? We have to pick something.

-Graydon

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Patrick Walton
On 11/23/10 2:34 PM, Graydon Hoare wrote:
> Personally my knee-jerk reaction is to embrace (1) since I like
> statements anyway, but I can see plausible arguments for the other 3.
> Can I get a show of hands? We have to pick something.

You know my vote :) (#3, for everyone else)

Patrick

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

David Herman
In reply to this post by Graydon Hoare
Of these, I like option #3 the most.

I should say, I think anywhere that we have statements in the grammar, we could actually allow them to be expressions of type (), and ISTM that would be equally workable for option #2 or option #3. I'd be open to that alternative, since in *surface* syntax you still have the look and feel of C, but you get higher refactoring flexibility.

Dave

On Nov 23, 2010, at 2:34 PM, Graydon Hoare wrote:

> Hi,
>
> Some of you may have noticed that in the rewrite from rustboot to rustc we're becoming substantially more expression-language-ish. This is mostly a result of me yielding to the preferences of other developers (and LLVM's semantics), as well as some hint that things get much easier in syntax extensions and calculating compile-time-constants if we permit more "statement-ish" forms as expressions. Particularly conditionals.
>
> We've run into a (common, seen in many other languages) sort of problem along the way here, which is that some expressions are implicitly ignored (or must be, due to being in an ignored context) whereas others are not. We have a nil-type (), but we don't always have sensible rules for forcing things to have the nil type by context.
>
> This email is a poll of alternative solutions. I'll give two example cases and ask people for their input on which modification of the rules feels best.
>
> Example case that does compile:
>
>  A:  auto x = if (foo()) { 10; } else { 11; };
>
> Example case that does not compile:
>
>  B:  if (foo()) { 10; } else { "hello"; }
>
> We can write this in rust at the moment, but in the rustc typechecking rules it will fail to compile, because 'if' is an expression-statement, expressions have types, and the types of the two branches (judged as the last statement's expression value, if it's an expression, or else nil) are of different types.
>
> Here are some approaches to solving this example. Please pick the one you like the most:
>
> (1) Kick all branchy expressions out of the expression grammar, put them back in the statement grammar. Case B will compile, and case A must be rewritten like so:
>
>  A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
>
> This is the C-with-GNU-extensions model.
>
> (2) Hoist all statements up into the expression language and make semicolon into a sequencing operator, with a trailing-semi ignored by the parser. Then we need to rewrite only the second case to force unit types in the to-be-ignored differing branches.
>
>  B:  if (foo()) { 10; () } else { "hello"; () }
>
> Though we'd also be *allowed* to rewrite the first case to drop the semicolons:
>
>  A:  auto x = if (foo() { 10 } else { 11 };
>
> This is the Ocaml approach.
>
> (3) A slightly weaker form of (2), which is to reformulate blocks with the following grammar:
>
>    block ::=  { [ stmt ; ]* expr? }
>
> In other words, every block becomes a brace-enclosed sequence of semicolon-terminated statements, followed by an optional expr. If the expr is missing, it is implied as (). In this case we'd be rewriting only the first case:
>
>  A:  auto x = if (foo()) { 10 } else { 11 };
>
> This is similar to the Ocaml rule in practice, except that it makes the presence or absence of the final semicolon in a block equivalent to ending the block with the nil type. This is a possible hazard (especially during refactoring or editing) to users who want to write a value-producing block but accidentally semicolon-terminate the last expression; but it's not a huge hazard since the typechecker will tell them the value they produced is of nil type. It just might be hit a lot.
>
> (4) Statically determine the contexts in which an expression's value "will be used" in an outer expression, and only typecheck those contexts. This permits both of the examples to compile as-is, but it's the most unorthodox approach, and poses a refactoring hazard as code may become type-invalid when nested into an expression context that "uses" its previously-ignored result. Again, as in (3) the typechecker will catch these cases, but they might happen more or less often than those in (3).
>
> We can't think of any other options. Significant whitespace is not an option :)
>
> Personally my knee-jerk reaction is to embrace (1) since I like statements anyway, but I can see plausible arguments for the other 3. Can I get a show of hands? We have to pick something.
>
> -Graydon
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev


Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

David Herman
Two afterthoughts:

- IINM, the different syntax for blocks between option #2 and option #3 is not that drastic, so if we choose one and decide we prefer the other, it might not be too hard to change.

- In option #4, we can't completely *turn off* typechecking -- that's unsound. (For example, inside the unchecked part you could assign the wrong type to a variable or data structure.) But we could avoid certain checks (like comparing the result type of the two arms of an if). Not that I'm advocating option #4. :)

Dave

On Nov 23, 2010, at 2:53 PM, David Herman wrote:

> Of these, I like option #3 the most.
>
> I should say, I think anywhere that we have statements in the grammar, we could actually allow them to be expressions of type (), and ISTM that would be equally workable for option #2 or option #3. I'd be open to that alternative, since in *surface* syntax you still have the look and feel of C, but you get higher refactoring flexibility.
>
> Dave
>
> On Nov 23, 2010, at 2:34 PM, Graydon Hoare wrote:
>
>> Hi,
>>
>> Some of you may have noticed that in the rewrite from rustboot to rustc we're becoming substantially more expression-language-ish. This is mostly a result of me yielding to the preferences of other developers (and LLVM's semantics), as well as some hint that things get much easier in syntax extensions and calculating compile-time-constants if we permit more "statement-ish" forms as expressions. Particularly conditionals.
>>
>> We've run into a (common, seen in many other languages) sort of problem along the way here, which is that some expressions are implicitly ignored (or must be, due to being in an ignored context) whereas others are not. We have a nil-type (), but we don't always have sensible rules for forcing things to have the nil type by context.
>>
>> This email is a poll of alternative solutions. I'll give two example cases and ask people for their input on which modification of the rules feels best.
>>
>> Example case that does compile:
>>
>> A:  auto x = if (foo()) { 10; } else { 11; };
>>
>> Example case that does not compile:
>>
>> B:  if (foo()) { 10; } else { "hello"; }
>>
>> We can write this in rust at the moment, but in the rustc typechecking rules it will fail to compile, because 'if' is an expression-statement, expressions have types, and the types of the two branches (judged as the last statement's expression value, if it's an expression, or else nil) are of different types.
>>
>> Here are some approaches to solving this example. Please pick the one you like the most:
>>
>> (1) Kick all branchy expressions out of the expression grammar, put them back in the statement grammar. Case B will compile, and case A must be rewritten like so:
>>
>> A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
>>
>> This is the C-with-GNU-extensions model.
>>
>> (2) Hoist all statements up into the expression language and make semicolon into a sequencing operator, with a trailing-semi ignored by the parser. Then we need to rewrite only the second case to force unit types in the to-be-ignored differing branches.
>>
>> B:  if (foo()) { 10; () } else { "hello"; () }
>>
>> Though we'd also be *allowed* to rewrite the first case to drop the semicolons:
>>
>> A:  auto x = if (foo() { 10 } else { 11 };
>>
>> This is the Ocaml approach.
>>
>> (3) A slightly weaker form of (2), which is to reformulate blocks with the following grammar:
>>
>>   block ::=  { [ stmt ; ]* expr? }
>>
>> In other words, every block becomes a brace-enclosed sequence of semicolon-terminated statements, followed by an optional expr. If the expr is missing, it is implied as (). In this case we'd be rewriting only the first case:
>>
>> A:  auto x = if (foo()) { 10 } else { 11 };
>>
>> This is similar to the Ocaml rule in practice, except that it makes the presence or absence of the final semicolon in a block equivalent to ending the block with the nil type. This is a possible hazard (especially during refactoring or editing) to users who want to write a value-producing block but accidentally semicolon-terminate the last expression; but it's not a huge hazard since the typechecker will tell them the value they produced is of nil type. It just might be hit a lot.
>>
>> (4) Statically determine the contexts in which an expression's value "will be used" in an outer expression, and only typecheck those contexts. This permits both of the examples to compile as-is, but it's the most unorthodox approach, and poses a refactoring hazard as code may become type-invalid when nested into an expression context that "uses" its previously-ignored result. Again, as in (3) the typechecker will catch these cases, but they might happen more or less often than those in (3).
>>
>> We can't think of any other options. Significant whitespace is not an option :)
>>
>> Personally my knee-jerk reaction is to embrace (1) since I like statements anyway, but I can see plausible arguments for the other 3. Can I get a show of hands? We have to pick something.
>>
>> -Graydon
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev at mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev


Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Roy Frostig
I prefer option #3, just according to taste.

froy

On Tue, Nov 23, 2010 at 2:58 PM, David Herman <dherman at mozilla.com> wrote:

> Two afterthoughts:
>
> - IINM, the different syntax for blocks between option #2 and option #3 is
> not that drastic, so if we choose one and decide we prefer the other, it
> might not be too hard to change.
>
> - In option #4, we can't completely *turn off* typechecking -- that's
> unsound. (For example, inside the unchecked part you could assign the wrong
> type to a variable or data structure.) But we could avoid certain checks
> (like comparing the result type of the two arms of an if). Not that I'm
> advocating option #4. :)
>
> Dave
>
> On Nov 23, 2010, at 2:53 PM, David Herman wrote:
>
> > Of these, I like option #3 the most.
> >
> > I should say, I think anywhere that we have statements in the grammar, we
> could actually allow them to be expressions of type (), and ISTM that would
> be equally workable for option #2 or option #3. I'd be open to that
> alternative, since in *surface* syntax you still have the look and feel of
> C, but you get higher refactoring flexibility.
> >
> > Dave
> >
> > On Nov 23, 2010, at 2:34 PM, Graydon Hoare wrote:
> >
> >> Hi,
> >>
> >> Some of you may have noticed that in the rewrite from rustboot to rustc
> we're becoming substantially more expression-language-ish. This is mostly a
> result of me yielding to the preferences of other developers (and LLVM's
> semantics), as well as some hint that things get much easier in syntax
> extensions and calculating compile-time-constants if we permit more
> "statement-ish" forms as expressions. Particularly conditionals.
> >>
> >> We've run into a (common, seen in many other languages) sort of problem
> along the way here, which is that some expressions are implicitly ignored
> (or must be, due to being in an ignored context) whereas others are not. We
> have a nil-type (), but we don't always have sensible rules for forcing
> things to have the nil type by context.
> >>
> >> This email is a poll of alternative solutions. I'll give two example
> cases and ask people for their input on which modification of the rules
> feels best.
> >>
> >> Example case that does compile:
> >>
> >> A:  auto x = if (foo()) { 10; } else { 11; };
> >>
> >> Example case that does not compile:
> >>
> >> B:  if (foo()) { 10; } else { "hello"; }
> >>
> >> We can write this in rust at the moment, but in the rustc typechecking
> rules it will fail to compile, because 'if' is an expression-statement,
> expressions have types, and the types of the two branches (judged as the
> last statement's expression value, if it's an expression, or else nil) are
> of different types.
> >>
> >> Here are some approaches to solving this example. Please pick the one
> you like the most:
> >>
> >> (1) Kick all branchy expressions out of the expression grammar, put them
> back in the statement grammar. Case B will compile, and case A must be
> rewritten like so:
> >>
> >> A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
> >>
> >> This is the C-with-GNU-extensions model.
> >>
> >> (2) Hoist all statements up into the expression language and make
> semicolon into a sequencing operator, with a trailing-semi ignored by the
> parser. Then we need to rewrite only the second case to force unit types in
> the to-be-ignored differing branches.
> >>
> >> B:  if (foo()) { 10; () } else { "hello"; () }
> >>
> >> Though we'd also be *allowed* to rewrite the first case to drop the
> semicolons:
> >>
> >> A:  auto x = if (foo() { 10 } else { 11 };
> >>
> >> This is the Ocaml approach.
> >>
> >> (3) A slightly weaker form of (2), which is to reformulate blocks with
> the following grammar:
> >>
> >>   block ::=  { [ stmt ; ]* expr? }
> >>
> >> In other words, every block becomes a brace-enclosed sequence of
> semicolon-terminated statements, followed by an optional expr. If the expr
> is missing, it is implied as (). In this case we'd be rewriting only the
> first case:
> >>
> >> A:  auto x = if (foo()) { 10 } else { 11 };
> >>
> >> This is similar to the Ocaml rule in practice, except that it makes the
> presence or absence of the final semicolon in a block equivalent to ending
> the block with the nil type. This is a possible hazard (especially during
> refactoring or editing) to users who want to write a value-producing block
> but accidentally semicolon-terminate the last expression; but it's not a
> huge hazard since the typechecker will tell them the value they produced is
> of nil type. It just might be hit a lot.
> >>
> >> (4) Statically determine the contexts in which an expression's value
> "will be used" in an outer expression, and only typecheck those contexts.
> This permits both of the examples to compile as-is, but it's the most
> unorthodox approach, and poses a refactoring hazard as code may become
> type-invalid when nested into an expression context that "uses" its
> previously-ignored result. Again, as in (3) the typechecker will catch these
> cases, but they might happen more or less often than those in (3).
> >>
> >> We can't think of any other options. Significant whitespace is not an
> option :)
> >>
> >> Personally my knee-jerk reaction is to embrace (1) since I like
> statements anyway, but I can see plausible arguments for the other 3. Can I
> get a show of hands? We have to pick something.
> >>
> >> -Graydon
> >> _______________________________________________
> >> Rust-dev mailing list
> >> Rust-dev at mozilla.org
> >> https://mail.mozilla.org/listinfo/rust-dev
> >
> > _______________________________________________
> > Rust-dev mailing list
> > Rust-dev at mozilla.org
> > https://mail.mozilla.org/listinfo/rust-dev
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20101123/eda6c540/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Rob Arnold
In reply to this post by Graydon Hoare
Option 3 is my favorite (2 would be too cumbersome I think).

On Nov 23, 2010 2:34 PM, "Graydon Hoare" <graydon at mozilla.com> wrote:

Hi,

Some of you may have noticed that in the rewrite from rustboot to rustc
we're becoming substantially more expression-language-ish. This is mostly a
result of me yielding to the preferences of other developers (and LLVM's
semantics), as well as some hint that things get much easier in syntax
extensions and calculating compile-time-constants if we permit more
"statement-ish" forms as expressions. Particularly conditionals.

We've run into a (common, seen in many other languages) sort of problem
along the way here, which is that some expressions are implicitly ignored
(or must be, due to being in an ignored context) whereas others are not. We
have a nil-type (), but we don't always have sensible rules for forcing
things to have the nil type by context.

This email is a poll of alternative solutions. I'll give two example cases
and ask people for their input on which modification of the rules feels
best.

Example case that does compile:

 A:  auto x = if (foo()) { 10; } else { 11; };

Example case that does not compile:

 B:  if (foo()) { 10; } else { "hello"; }

We can write this in rust at the moment, but in the rustc typechecking rules
it will fail to compile, because 'if' is an expression-statement,
expressions have types, and the types of the two branches (judged as the
last statement's expression value, if it's an expression, or else nil) are
of different types.

Here are some approaches to solving this example. Please pick the one you
like the most:

(1) Kick all branchy expressions out of the expression grammar, put them
back in the statement grammar. Case B will compile, and case A must be
rewritten like so:

 A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };

This is the C-with-GNU-extensions model.

(2) Hoist all statements up into the expression language and make semicolon
into a sequencing operator, with a trailing-semi ignored by the parser. Then
we need to rewrite only the second case to force unit types in the
to-be-ignored differing branches.

 B:  if (foo()) { 10; () } else { "hello"; () }

Though we'd also be *allowed* to rewrite the first case to drop the
semicolons:

 A:  auto x = if (foo() { 10 } else { 11 };

This is the Ocaml approach.

(3) A slightly weaker form of (2), which is to reformulate blocks with the
following grammar:

   block ::=  { [ stmt ; ]* expr? }

In other words, every block becomes a brace-enclosed sequence of
semicolon-terminated statements, followed by an optional expr. If the expr
is missing, it is implied as (). In this case we'd be rewriting only the
first case:

 A:  auto x = if (foo()) { 10 } else { 11 };

This is similar to the Ocaml rule in practice, except that it makes the
presence or absence of the final semicolon in a block equivalent to ending
the block with the nil type. This is a possible hazard (especially during
refactoring or editing) to users who want to write a value-producing block
but accidentally semicolon-terminate the last expression; but it's not a
huge hazard since the typechecker will tell them the value they produced is
of nil type. It just might be hit a lot.

(4) Statically determine the contexts in which an expression's value "will
be used" in an outer expression, and only typecheck those contexts. This
permits both of the examples to compile as-is, but it's the most unorthodox
approach, and poses a refactoring hazard as code may become type-invalid
when nested into an expression context that "uses" its previously-ignored
result. Again, as in (3) the typechecker will catch these cases, but they
might happen more or less often than those in (3).

We can't think of any other options. Significant whitespace is not an option
:)

Personally my knee-jerk reaction is to embrace (1) since I like statements
anyway, but I can see plausible arguments for the other 3. Can I get a show
of hands? We have to pick something.

-Graydon
_______________________________________________
Rust-dev mailing list
Rust-dev at mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20101123/bf7e6398/attachment.html>

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Peter Hull
I would go for #1. But, this is a bit horrible
auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
Could it be written as
auto x;
if (foo()) { x = 10; } else {x = 11; }
or would the 'auto' type determination run into problems?

I imagine that 'if' and 'alt' are the most useful statements to have
as expressions, so would it be possible to add the C ternary ?:
operator, and something similar for alt?

Pete

On Wed, Nov 24, 2010 at 2:22 AM, Rob Arnold <tellrob at gmail.com> wrote:

> Option 3 is my favorite (2 would be too cumbersome I think).
>
> On Nov 23, 2010 2:34 PM, "Graydon Hoare" <graydon at mozilla.com> wrote:
>
> Hi,
>
> Some of you may have noticed that in the rewrite from rustboot to rustc
> we're becoming substantially more expression-language-ish. This is mostly a
> result of me yielding to the preferences of other developers (and LLVM's
> semantics), as well as some hint that things get much easier in syntax
> extensions and calculating compile-time-constants if we permit more
> "statement-ish" forms as expressions. Particularly conditionals.
>
> We've run into a (common, seen in many other languages) sort of problem
> along the way here, which is that some expressions are implicitly ignored
> (or must be, due to being in an ignored context) whereas others are not. We
> have a nil-type (), but we don't always have sensible rules for forcing
> things to have the nil type by context.
>
> This email is a poll of alternative solutions. I'll give two example cases
> and ask people for their input on which modification of the rules feels
> best.
>
> Example case that does compile:
>
> ?A: ?auto x = if (foo()) { 10; } else { 11; };
>
> Example case that does not compile:
>
> ?B: ?if (foo()) { 10; } else { "hello"; }
>
> We can write this in rust at the moment, but in the rustc typechecking rules
> it will fail to compile, because 'if' is an expression-statement,
> expressions have types, and the types of the two branches (judged as the
> last statement's expression value, if it's an expression, or else nil) are
> of different types.
>
> Here are some approaches to solving this example. Please pick the one you
> like the most:
>
> (1) Kick all branchy expressions out of the expression grammar, put them
> back in the statement grammar. Case B will compile, and case A must be
> rewritten like so:
>
> ?A: ?auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
>
> This is the C-with-GNU-extensions model.
>
> (2) Hoist all statements up into the expression language and make semicolon
> into a sequencing operator, with a trailing-semi ignored by the parser. Then
> we need to rewrite only the second case to force unit types in the
> to-be-ignored differing branches.
>
> ?B: ?if (foo()) { 10; () } else { "hello"; () }
>
> Though we'd also be *allowed* to rewrite the first case to drop the
> semicolons:
>
> ?A: ?auto x = if (foo() { 10 } else { 11 };
>
> This is the Ocaml approach.
>
> (3) A slightly weaker form of (2), which is to reformulate blocks with the
> following grammar:
>
> ? ?block ::= ?{ [ stmt ; ]* expr? }
>
> In other words, every block becomes a brace-enclosed sequence of
> semicolon-terminated statements, followed by an optional expr. If the expr
> is missing, it is implied as (). In this case we'd be rewriting only the
> first case:
>
> ?A: ?auto x = if (foo()) { 10 } else { 11 };
>
> This is similar to the Ocaml rule in practice, except that it makes the
> presence or absence of the final semicolon in a block equivalent to ending
> the block with the nil type. This is a possible hazard (especially during
> refactoring or editing) to users who want to write a value-producing block
> but accidentally semicolon-terminate the last expression; but it's not a
> huge hazard since the typechecker will tell them the value they produced is
> of nil type. It just might be hit a lot.
>
> (4) Statically determine the contexts in which an expression's value "will
> be used" in an outer expression, and only typecheck those contexts. This
> permits both of the examples to compile as-is, but it's the most unorthodox
> approach, and poses a refactoring hazard as code may become type-invalid
> when nested into an expression context that "uses" its previously-ignored
> result. Again, as in (3) the typechecker will catch these cases, but they
> might happen more or less often than those in (3).
>
> We can't think of any other options. Significant whitespace is not an option
> :)
>
> Personally my knee-jerk reaction is to embrace (1) since I like statements
> anyway, but I can see plausible arguments for the other 3. Can I get a show
> of hands? We have to pick something.
>
> -Graydon
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
>

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Jeffrey Yasskin
In reply to this post by Graydon Hoare
I would like #4 best, but to do it right you'd have to infer the
expected type of the branched completion from its context, and I think
you don't yet do any top-down typechecking (except a bit in
pattern-alt which may not help with this case). After that, #3, even
though I'll definitely get confused when I terminate my blocks with a
semicolon and they stop working as values.

On Tue, Nov 23, 2010 at 2:34 PM, Graydon Hoare <graydon at mozilla.com> wrote:

> Hi,
>
> Some of you may have noticed that in the rewrite from rustboot to rustc
> we're becoming substantially more expression-language-ish. This is mostly a
> result of me yielding to the preferences of other developers (and LLVM's
> semantics), as well as some hint that things get much easier in syntax
> extensions and calculating compile-time-constants if we permit more
> "statement-ish" forms as expressions. Particularly conditionals.
>
> We've run into a (common, seen in many other languages) sort of problem
> along the way here, which is that some expressions are implicitly ignored
> (or must be, due to being in an ignored context) whereas others are not. We
> have a nil-type (), but we don't always have sensible rules for forcing
> things to have the nil type by context.
>
> This email is a poll of alternative solutions. I'll give two example cases
> and ask people for their input on which modification of the rules feels
> best.
>
> Example case that does compile:
>
> ?A: ?auto x = if (foo()) { 10; } else { 11; };
>
> Example case that does not compile:
>
> ?B: ?if (foo()) { 10; } else { "hello"; }
>
> We can write this in rust at the moment, but in the rustc typechecking rules
> it will fail to compile, because 'if' is an expression-statement,
> expressions have types, and the types of the two branches (judged as the
> last statement's expression value, if it's an expression, or else nil) are
> of different types.
>
> Here are some approaches to solving this example. Please pick the one you
> like the most:
>
> (1) Kick all branchy expressions out of the expression grammar, put them
> back in the statement grammar. Case B will compile, and case A must be
> rewritten like so:
>
> ?A: ?auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
>
> This is the C-with-GNU-extensions model.
>
> (2) Hoist all statements up into the expression language and make semicolon
> into a sequencing operator, with a trailing-semi ignored by the parser. Then
> we need to rewrite only the second case to force unit types in the
> to-be-ignored differing branches.
>
> ?B: ?if (foo()) { 10; () } else { "hello"; () }
>
> Though we'd also be *allowed* to rewrite the first case to drop the
> semicolons:
>
> ?A: ?auto x = if (foo() { 10 } else { 11 };
>
> This is the Ocaml approach.
>
> (3) A slightly weaker form of (2), which is to reformulate blocks with the
> following grammar:
>
> ? ?block ::= ?{ [ stmt ; ]* expr? }
>
> In other words, every block becomes a brace-enclosed sequence of
> semicolon-terminated statements, followed by an optional expr. If the expr
> is missing, it is implied as (). In this case we'd be rewriting only the
> first case:
>
> ?A: ?auto x = if (foo()) { 10 } else { 11 };
>
> This is similar to the Ocaml rule in practice, except that it makes the
> presence or absence of the final semicolon in a block equivalent to ending
> the block with the nil type. This is a possible hazard (especially during
> refactoring or editing) to users who want to write a value-producing block
> but accidentally semicolon-terminate the last expression; but it's not a
> huge hazard since the typechecker will tell them the value they produced is
> of nil type. It just might be hit a lot.
>
> (4) Statically determine the contexts in which an expression's value "will
> be used" in an outer expression, and only typecheck those contexts. This
> permits both of the examples to compile as-is, but it's the most unorthodox
> approach, and poses a refactoring hazard as code may become type-invalid
> when nested into an expression context that "uses" its previously-ignored
> result. Again, as in (3) the typechecker will catch these cases, but they
> might happen more or less often than those in (3).
>
> We can't think of any other options. Significant whitespace is not an option
> :)
>
> Personally my knee-jerk reaction is to embrace (1) since I like statements
> anyway, but I can see plausible arguments for the other 3. Can I get a show
> of hands? We have to pick something.
>
> -Graydon
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Graydon Hoare
In reply to this post by Peter Hull
On 24/11/2010 12:35 AM, Peter Hull wrote:
> I would go for #1. But, this is a bit horrible
> auto x = { auto t = 11; if (foo()) { t = 10; }; t; };
> Could it be written as
> auto x;
> if (foo()) { x = 10; } else {x = 11; }
> or would the 'auto' type determination run into problems?

No, that would work fine. And it is definitely the road I went down
during the first ... several years of this project! I have argued
strenuously in favour of sticking to a statement-heavy approach in the
past. Partly this email thread is to serve as a record to myself and
others reading why there's even a change-of-plan happening here. To make
sure "hallway conversations" (and their IRC equivalents) don't disappear
from the records.

Where this strategy runs into acute difficulty is in contexts I
mentioned near the beginning of the email: initializing a (compile-time)
constant via a conditional, or returning a conditional from a syntax
extension used in an expression context. In those cases you have to have
at least "block as expr" to nest statement-sequences into blocks. And
"conditional as expr" follows easily due to not wanting to have to
simulate state-evolution in your constant-folding device.

Particularly when it comes to constants -- and those are really
important, you actually wind up having a lot of compile-time-constant
data in a static language, think "most literals" -- it feels more
natural to only talk about constant expressions rather than constant
statements-with-side-effects.

(Attentive readers will note that in rustboot there is presently a
"cexp" language floating outside the main grammar which handles just
such pure, constant, scalar-typed expressions, including conditional
forms for alt and if, and interprets them in a little micro-interpreter
in the frontend during crate construction. We want to get rid of cexp
and just define it as a subset of the normal expression grammar. Too
many similar-looking grammars will confuse users.)

None of these issues *doom* the statement-centric approach, but they
make it increasingly unnatural-feeling inside the compiler. Combine with
the fact that *users* are really quite fond of a fair number of
larger-than-a-primitive-statement expression forms, so you're already
parsing such things and then "desugaring" them (which itself messes up
error reporting by the compiler), and it gets to be a convincing
argument: the statement fixation is awkward for (many) users *and* for
the implementers. Who's it good for? Increasingly, I found myself unable
to answer that question. Possibly editor modes?

This is not to say that the visible structure of the grammar, or most
programs, is likely to change a lot. It will *permit* a more
nested-expressions form, but it won't actually read well if you over-do
it; particularly since block-local declarations end in a semi, and our
conditional and loop forms are braced, these are natural places to put
linebreaks. So most of the block-containing expressions will read best
arranged as a sequence of lines, not mushed into a nested expression
context. I'm also a bit concerned about how easy it'll be to convince
editor modes to handle this change, but I'm willing to give it a try. If
editor modes are the last issue, it ... feels like a solvable problem.

> I imagine that 'if' and 'alt' are the most useful statements to have
> as expressions, so would it be possible to add the C ternary ?:
> operator, and something similar for alt?

It would be possible, but I get a little tingle about "doing the wrong
thing" when considering adding expression forms that perfectly mirror
statement forms. The ternary operator is Not The Most Popular Idea from
C. Besides which, it implies control flow; it doesn't actually evaluate
both arms. So we'd be desugaring it anyway, the way we desugar && and ||
in rustboot. See above wrt. "awkward for all parties".

-Graydon

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Patrick Walton
In reply to this post by Jeffrey Yasskin
On 11/24/10 7:32 AM, Jeffrey Yasskin wrote:
> I would like #4 best, but to do it right you'd have to infer the
> expected type of the branched completion from its context, and I think
> you don't yet do any top-down typechecking (except a bit in
> pattern-alt which may not help with this case). After that, #3, even
> though I'll definitely get confused when I terminate my blocks with a
> semicolon and they stop working as values.

An alternate way of thinking about proposal #3 is that, as a rule of
thumb, ";" always means "ignore the result of the previous statement".
Formulating it this way might ease the cognitive load on users.

Patrick

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Graydon Hoare
On 10-11-24 09:06 AM, Patrick Walton wrote:

> An alternate way of thinking about proposal #3 is that, as a rule of
> thumb, ";" always means "ignore the result of the previous statement".
> Formulating it this way might ease the cognitive load on users.

While I always appreciate having new ways of explaining a language
feature, I should relate a certain pithy phrase often related by Lessig
about politics, which applies equally to languages: "if you're
explaining, you're losing".

Our business here is, in a large measure, to anticipate what users will
*already* be thinking, and to figure out something that fits well enough
to be unsurprising, palatable.

(While, of course, having superior precision and safety properties than
the sum of their vague and contradictory expectations :)

The problem is that our target market is largely people from statement
languages, who simply don't have this issue. So modeling their
assumptions directly means "various other techniques" to solve the same
design pressures -- ternary expressions, use of subordinate functions
with inlining and constexpr modifiers ... -- and we're sort of taking a
sober second look at that whole path and wondering if the
expression-language people live in a substantially better world. And if
so, how to get there without losing the statement-language audience.

Hard/subtle/tradeoffy design issue.

-Graydon

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

David Herman
Perhaps another way to look at it is the programmer's migration path, or how to get them from where they are today to a place where they're using Rust even more effectively.

In that regard, programmers can get off the ground immediately with traditional C-style programs:

    auto tmp;
    if (p) {
        foo();
        tmp = bar();
    } else {
        tmp = baz();
    }

But then you can show them that, as an *additional* feature, you can use a block as an expression by leaving off the final semicolon:

    auto tmp = if (p) { foo(); bar() } else { baz() };

Not the best example, but hopefully I'm halfway getting the point across? I guess what I'm saying is, rather than trying to explain how the whole system ties together, the language can be presented in stages -- start with traditional C-like syntax, then add a moderate dose of expressionliness.

Dave

On Nov 24, 2010, at 9:59 AM, Graydon Hoare wrote:

> On 10-11-24 09:06 AM, Patrick Walton wrote:
>
>> An alternate way of thinking about proposal #3 is that, as a rule of
>> thumb, ";" always means "ignore the result of the previous statement".
>> Formulating it this way might ease the cognitive load on users.
>
> While I always appreciate having new ways of explaining a language feature, I should relate a certain pithy phrase often related by Lessig about politics, which applies equally to languages: "if you're explaining, you're losing".
>
> Our business here is, in a large measure, to anticipate what users will *already* be thinking, and to figure out something that fits well enough to be unsurprising, palatable.
>
> (While, of course, having superior precision and safety properties than the sum of their vague and contradictory expectations :)
>
> The problem is that our target market is largely people from statement languages, who simply don't have this issue. So modeling their assumptions directly means "various other techniques" to solve the same design pressures -- ternary expressions, use of subordinate functions with inlining and constexpr modifiers ... -- and we're sort of taking a sober second look at that whole path and wondering if the expression-language people live in a substantially better world. And if so, how to get there without losing the statement-language audience.
>
> Hard/subtle/tradeoffy design issue.
>
> -Graydon
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev


Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Igor Bukanov
In reply to this post by Graydon Hoare
My preference is the option 1.

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Igor Bukanov
In reply to this post by Graydon Hoare
On 24 November 2010 17:04, Graydon Hoare <graydon at mozilla.com> wrote:
> It would be possible, but I get a little tingle about "doing the wrong
> thing" when considering adding expression forms that perfectly mirror
> statement forms.

IMO it is easier to follow

 auto x = foo() ? 10 : 11;

rather than

 auto x = if (foo()) { 10 } else { 11 };

The if-else has too much extra parenthesis. And even if one can omit
{} and write:

 auto x = if (foo()) 10 else 11;

it is still has 2 extra parenthesis making it harder to perceive. And
note that the ternary does not match the "if" as its else part must
always present making it sufficiently different IMO.

The case would be different if the "if" would have the if-then-else
syntax without parenthesis like in

  auto x = if for() then 10 else 11;

But that would be foreign for programmers in C-based languages.

> The ternary operator is Not The Most Popular Idea from C

The worst abuses that I have seen came from the use of the comma to
initialize the temporaries in the middle of the nested ?. Without the
comas it is harder to write ugly ternaries.

> Besides which, it implies control flow; it doesn't actually evaluate both
> arms. So we'd be desugaring it anyway, the way we desugar && and || in
> rustboot. See above wrt. "awkward for all parties".

That would be an argument if rust would not have && and ||. But with
latter available the control flaw implied by the ternary does not look
like an issue IMO.

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Graydon Hoare
On 10-11-25 08:50 AM, Igor Bukanov wrote:
 > My preference is the option 1.

Aw man! We were almost drifting towards a consensus. Nuts!

So C-with-gnu-extensions. Hm. That does complexify the putative
constant-folder in the front-end, but I guess a vote is a vote.

-Graydon

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Graydon Hoare
In reply to this post by Igor Bukanov
On 10-11-25 09:25 AM, Igor Bukanov wrote:

> On 24 November 2010 17:04, Graydon Hoare<graydon at mozilla.com>  wrote:
>> It would be possible, but I get a little tingle about "doing the wrong
>> thing" when considering adding expression forms that perfectly mirror
>> statement forms.
>
> IMO it is easier to follow
>
>   auto x = foo() ? 10 : 11;
>
> rather than
>
>   auto x = if (foo()) { 10 } else { 11 };

Ok. Well, ternary is ... sort of orthogonal to the entire discussion of
"how to solve the general statement-in-expression-context problem". So
let's do a secondary survey perhaps:

Who feels like adding a ternary operator?

>> The ternary operator is Not The Most Popular Idea from C
>
> The worst abuses that I have seen came from the use of the comma to
> initialize the temporaries in the middle of the nested ?. Without the
> comas it is harder to write ugly ternaries.

Oh yeah, I didn't necessarily mean "prone to abuse", just "not widely
copied". But then I went and checked and that's not true; lots of
languages picked it up.

So I guess it's just a personal bias. I don't like the ternary operator;
I was raised in lisp-land and it always felt like a less-legible variant
of better expressions. :)

>> Besides which, it implies control flow; it doesn't actually evaluate both
>> arms. So we'd be desugaring it anyway, the way we desugar&&  and || in
>> rustboot. See above wrt. "awkward for all parties".
>
> That would be an argument if rust would not have&&  and ||. But with
> latter available the control flaw implied by the ternary does not look
> like an issue IMO.

It's an argument that it falls into the same category as || and &&,
nothing deeper. Maybe I wasn't clear; I realize they have control flow
as well. I wrote the desugaring code in rustboot :(

-Graydon

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Patrick Walton
On 11/25/2010 11:00 AM, Graydon Hoare wrote:
> Who feels like adding a ternary operator?

Not I. My instinctive argument against it is that if-then-else is the
weaker of the two branching constructs we have in the language. The more
powerful one (eventually) will be the "alt" construct, which allows the
programmer to do everything that "if" does via pattern guards, as well
as allowing destructuring and pattern matching on data values. Blessing
"if-then-else" but not "alt" with the expression form seems strange to me.

Patrick

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Igor Bukanov
In reply to this post by Jeffrey Yasskin
On 24 November 2010 16:32, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> I would like #4 best, but to do it right you'd have to infer the
> expected type of the branched completion from its context, and I think
> you don't yet do any top-down typechecking (except a bit in
> pattern-alt which may not help with this case). After that, #3, even
> though I'll definitely get confused when I terminate my blocks with a
> semicolon and they stop working as values.

For me the semicoln-as-separator, not terminator, was the worst
feature of programming in Pascal. Everybody hated it as the extra
semicolon was way to often the sole reason for compilation errors. I
suspect that was part of the reasons to switch to Borland C++.

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Sebastian Sylvan
In reply to this post by Graydon Hoare
On Thu, Nov 25, 2010 at 6:54 PM, Graydon Hoare <graydon at mozilla.com> wrote:

> On 10-11-25 08:50 AM, Igor Bukanov wrote:
> > My preference is the option 1.
>
> Aw man! We were almost drifting towards a consensus. Nuts!
>

More dissenting opinions then!

How about 2, but with a tweak to the type checker so it only unifies the
types of the two arms if it *really* needs to?

So, if the type of the whole if-expression is (), then the type of each arm
can be different (implicitly ignoring any non-() value, perhaps by just
inserting a "()" at the end of each arm), but if the type is anything else,
then it needs to match with both arms.

I.e.

if (b) { getInt() } else { getFloat() } // fine, implicitly ignores the
values/types
auto x = if (b) { getInt() } else { getFloat()  // Error, the arms of the if
have different types

The trailing semi-colon would be an aesthetic optional that wouldn't impact
semantics.

As far as I can tell, this would seem to avoid subtle problems due to
missing a semi-colon and trivial mistakes like that, while also matching
intuition about what should be legal. The downside is that the type-checking
becomes a bit unorthodox.

--
Sebastian Sylvan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20101125/06b79577/attachment.html>

Reply | Threaded
Open this post in threaded view
|

statement-expressions and block-terminators

Patrick Walton
In reply to this post by Igor Bukanov
On 11/25/2010 2:16 PM, Igor Bukanov wrote:

> On 24 November 2010 16:32, Jeffrey Yasskin<jyasskin at gmail.com>  wrote:
>> I would like #4 best, but to do it right you'd have to infer the
>> expected type of the branched completion from its context, and I think
>> you don't yet do any top-down typechecking (except a bit in
>> pattern-alt which may not help with this case). After that, #3, even
>> though I'll definitely get confused when I terminate my blocks with a
>> semicolon and they stop working as values.
>
> For me the semicoln-as-separator, not terminator, was the worst
> feature of programming in Pascal. Everybody hated it as the extra
> semicolon was way to often the sole reason for compilation errors. I
> suspect that was part of the reasons to switch to Borland C++.

Keep in mind proposal #3 allows you to write code exactly as you would
in C++. You always use the semicolon as a statement terminator. It's
just that if you want to use a block as an expression (which is
forbidden in ordinary C++), you can leave off the final semicolon. So
it's really an extension to C++'s syntax, not a different sort of
behavior entirely.

Patrick

12