Bikeshed proposal to simplify syntax

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Sebastian Sylvan
Hi Rust team! First of all, congratulations on the 0.3 release!

I have a bikeshed proposal to selectively simplify Rust's syntax in a
way that has the side benefit of applying "negative syntactic
pressure" on expensive constructs. It boils down to this: remove all
special syntax for heap allocations and heap pointers. Keep special
syntax for borrowed pointers, keep special syntax for fixed size
vectors. E.g. a heap pointer might be ptr<T>. Allocating a block of
uninitialized memory of type T might be heap_alloc<T>(), with the
normal literals used to initialize it.

Now, this sounds a bit crazy, and I'm pretty sure it won't be adopted,
but I'd appreciate it if you gave it a moment's serious consideration
and thought about what the impact would really be. I don't think it
would cost as much as you might initially think (modern C++ does
essentially this with all the shared_ptr etc., and many of the actual
allocation calls are hidden behind constructor functions anyway) and
there are strong wins.

The two main benefits would be:

1. Reduce the complexity of the syntax. In particular, reduce the
amount of "special symbols". You've all heard jokes about Perl looking
like line-noise, I'm sure. Reducing the amount of special characters
you need to know about before understanding code is a win. Getting rid
of some of the symbols and treating those types in a more regular way
would also kill a lot of design problems like "what does a heap
allocated fixed-sized vector literal look like"? Even more
importantly, it would make it easier to see what a complicated type
means because it would follow simple nesting rules that you already
understand, because it's the same rules that apply to user-defined
types. It would also make library-pointer types look like the "real
thing" (e.g. arc<T>).

2. Apply selective negative pressure to constructs which should be
avoided if possible. IMO good syntax makes "preferable" constructs
easy to write, and non-preferable constructs harder. Rust does a
pretty good job here already (e.g. sharing mutable memory between
tasks is suitably clunky). I personally think that heap allocations
deserve to be in the "non-preferable" category, too. They add memory
management overhead (GC/refcounting). They reduce locality. They add
waste (header words, the pointer itself, maybe ref counts, etc.). They
increase fragmentation. I believe controlling allocations will be
essential to good performance in the future, as heap sizes grow
massive (esp. for GC which is O(live_objects)). Heap allocations are
absolutely essential sometimes, of course, but it's preferable by far
to try to store the data on the stack or interior to the owning data
structure, or in a big memory pool, and use borrowed pointers to
access it indirectly - only resort to heap pointers if there is no
other option. This is also why I think the *only* syntactically
preferred pointer should be the borrowed pointer. IMO, Rust makes it
far too easy to allocate memory on various heaps - just add a little
sigil and you're done. C makes you appreciate the implications of what
you're about to do when you type "malloc".

Anyway, as I said, I don't think this will gain much traction, but
it's one of those small niggles that I would do differently if I was
designing the language, so I thought I'd float it for consideration.
Thanks for your time!

--
Sebastian Sylvan

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Graydon Hoare
On 12-07-12 11:41 PM, Sebastian Sylvan wrote:
> Hi Rust team! First of all, congratulations on the 0.3 release!

Thanks. The whole team continues to impress me too, the amount of work
this time around should not be understated (and not just churn; really
thoughtful and high-quality refactorings, a lot of hard problems).

> I have a bikeshed proposal to selectively simplify Rust's syntax in a
> way that has the side benefit of applying "negative syntactic
> pressure" on expensive constructs. It boils down to this: remove all
> special syntax for heap allocations and heap pointers.

Thus ~T becomes uniq<T>, ~expr becomes uniq(expr), etc?

> Allocating a block of
> uninitialized memory of type T might be heap_alloc<T>(), with the
> normal literals used to initialize it.

Note: we don't support allocating uninitialized memory.

> Now, this sounds a bit crazy, and I'm pretty sure it won't be adopted,
> but I'd appreciate it if you gave it a moment's serious consideration
> and thought about what the impact would really be. I don't think it
> would cost as much as you might initially think (modern C++ does
> essentially this with all the shared_ptr etc., and many of the actual
> allocation calls are hidden behind constructor functions anyway) and
> there are strong wins.

I agree that modern C++ does this, and it does relieve some syntactic
pressure (slack which C++11 seems to have gleefully soaked up). I don't
think that users having to write shared_ptr<foo> rather than @foo is
quite so clearly a "strong win" in our case, though.

> 1. Reduce the complexity of the syntax. In particular, reduce the
> amount of "special symbols". You've all heard jokes about Perl looking
> like line-noise, I'm sure.

I think this is a bit unfair; perl isn't even a context-free grammar. I
think we're in the same ballpark as Objective-C right now. And shrinking.

> Reducing the amount of special characters
> you need to know about before understanding code is a win.

True, but it's in tension with conciseness / expressivity. I initially
erred very much on the side of maintenance programmers in the design
(verbosity and clarity over expressivity) and am gradually being dragged
back towards the expressive side. This is one of numerous tensions in a
language; almost nothing is a pure win/lose, all is tradeoffs.

> Getting rid
> of some of the symbols and treating those types in a more regular way
> would also kill a lot of design problems like "what does a heap
> allocated fixed-sized vector literal look like"?

Unless you're planning on making our type parameters carry integer types
-- danger! -- this proposal doesn't help there.

> importantly, it would make it easier to see what a complicated type
> means because it would follow simple nesting rules that you already
> understand, because it's the same rules that apply to user-defined
> types. It would also make library-pointer types look like the "real
> thing" (e.g. arc<T>).

But they aren't. The compiler is doing a bunch of open-coding on those
pointer types, including reasoning about the initialization-state of the
pointee memory, enforcing kinds, generating visitors that walk through
them, and pattern-matching on the structure.

> 2. Apply selective negative pressure to constructs which should be
> avoided if possible.

Yeah. Again, this is in tension with "letting users write what they need
to". Rust initially prohibited even _cyclic_ memory and made all private
memory copy-on-write. Guess what was the first and most pressing request?

Second-guessing users and telling them they don't want to do what they
_do_ want to do is ... generally a losing game. @ and ~ are not
beautiful, and surely if someone can avoid reaching for them I wager the
noisiness and measurable performance cost is sufficient deterrence to
make the user think twice. I might be wrong, but ... the largest single
case of allocation is "", for example, and we _just_ started making it
obvious that those allocate.

(And we still have essentially _no_ code that emits static constants as
read-only memory, aside from fixed-size and slice-strings, and integer
constants. So ~"abc" actually hits the allocator every time, despite
being a constant.)

> preferred pointer should be the borrowed pointer. IMO, Rust makes it
> far too easy to allocate memory on various heaps - just add a little
> sigil and you're done. C makes you appreciate the implications of what
> you're about to do when you type "malloc".

Fair. It's true that we allocate too much presently. I believe a lot of
this comes from a combination of incomplete constant optimization (see
above), hiding the uniqueness of vectors and strings (no longer done as
of 0.3), and not having had the requisite technologies at our disposal
when writing the compiler, first-pass: our closures were weak,
borrowed-pointers nonexistent, interior-vectors nonexistent, arenas
nonexistent, etc. I would like to get some experience with using the new
technology in earnest, before looking at blunter instruments as you're
suggesting here.

(Also: nobody's done the no-gc lint pass yet, but I fully intend to
provide it. Might just do so idly now, it's quick work)

> Anyway, as I said, I don't think this will gain much traction, but
> it's one of those small niggles that I would do differently if I was
> designing the language, so I thought I'd float it for consideration.

Sure. I appreciate the concern, I just think the weight of evidence
isn't _so_ clear that rust code "always" allocates too much, vs. the
current code doing so. I think it'll take some time to see. Also there
are a bunch of other factors at work, as I point out above (in terms of
the compiler knowing all about these pointers).

-Graydon

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Sebastian Sylvan
On Fri, Jul 13, 2012 at 5:18 PM, Graydon Hoare <graydon at mozilla.com> wrote:
>
> Thus ~T becomes uniq<T>, ~expr becomes uniq(expr), etc?
>

Yep.

>
>> Allocating a block of
>> uninitialized memory of type T might be heap_alloc<T>(), with the
>> normal literals used to initialize it.
>
>
> Note: we don't support allocating uninitialized memory.

Sure, the heap_alloc could take the initial value as input and do
whatever optimizations needed to make sure it never actually creates
the literal on the stack just to copy it to the heap.

>> Getting rid
>> of some of the symbols and treating those types in a more regular way
>> would also kill a lot of design problems like "what does a heap
>> allocated fixed-sized vector literal look like"?
>
>
> Unless you're planning on making our type parameters carry integer types --
> danger! -- this proposal doesn't help there.
>

Well vectors would retain their special syntax (fixed size arrays are
pure goodness and should be syntactically preferred) so the size would
be part of the vector type/literal. The point is that figuring out
what a pointer to T looks like is easy (on both the type and
constructor side), even if T is one of the special forms like vectors.
It seems to me that you get into more head-aches when you need to
figure out how to make multiple special forms of syntax interact
nicely and without surprising people.

> Yeah. Again, this is in tension with "letting users write what they need
> to". Rust initially prohibited even _cyclic_ memory and made all private
> memory copy-on-write. Guess what was the first and most pressing request?

Sure, but as for allocations I think we have more experience about the
benefits and dangers of making it pretty. Not necessarily in the way
of having 100% analogous languages, but in the way of being able to
look at languages that make heap allocation very common, and languages
that make it a big deal. I agree with you that it may not be
completely clear where Rust stands in this spectrum yet since it's
sort of a psychological thing, but I think we can at least say that in
general languages that make allocations ugly have done a better job at
avoiding that particular performance pitfall in practice. Perhaps
Rust's more robust support for interior data (compared to Lua or
whatever) is enough, but if you can make allocations stick out even
more while simultaneously simplifying the syntax then it may add up to
being a good idea.

Seb

--
Sebastian Sylvan

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Niko Matsakis
In reply to this post by Sebastian Sylvan
This is an interesting idea.  I guess the key question is just *how*
negative heap allocations are.  I think you are likely overstating the
case, or perhaps underestimating the extent to which the price of heap
allocations can be ameliorated by optimizing the allocator.  But I don't
claim to have extensive data on this point.

But I have another concern as well.  Regardless of how expensive heap
allocations are, they are likely to be frequently *necessary*.  Much as
I love borrowed pointers, they are only suitable for things whose
lifetimes follow a stack discipline. Moreover, without a more complex
type system, there will be cases that do follow a stack discipline but
for which borrowed pointers are not complex enough to express.  Given
that, forcing uses to write `ptr<Foo>` where in any other modern
language they would probably just write `Foo` doesn't seem like it's
going to win us any friends.  `@Foo` seems a reasonable compromise to me.

I guess my fears boil down to this: I am already concerned about Rust
feeling too heavy-weight and ponderous.  I like that Rust exposes things
like the program stack and gives a fair amount of room for
micro-optimization while still retaining type safety, but if that ends
up making the language feel less convenient than e.g. Java it may not be
worth it, given the speed/memory capabilities of today's computers (and
tomorrow's).  So I'm hesitant to make changes that tax users of GC more
than we already do.


Niko

On 7/12/12 11:41 PM, Sebastian Sylvan wrote:

> Hi Rust team! First of all, congratulations on the 0.3 release!
>
> I have a bikeshed proposal to selectively simplify Rust's syntax in a
> way that has the side benefit of applying "negative syntactic
> pressure" on expensive constructs. It boils down to this: remove all
> special syntax for heap allocations and heap pointers. Keep special
> syntax for borrowed pointers, keep special syntax for fixed size
> vectors. E.g. a heap pointer might be ptr<T>. Allocating a block of
> uninitialized memory of type T might be heap_alloc<T>(), with the
> normal literals used to initialize it.
>
> Now, this sounds a bit crazy, and I'm pretty sure it won't be adopted,
> but I'd appreciate it if you gave it a moment's serious consideration
> and thought about what the impact would really be. I don't think it
> would cost as much as you might initially think (modern C++ does
> essentially this with all the shared_ptr etc., and many of the actual
> allocation calls are hidden behind constructor functions anyway) and
> there are strong wins.
>
> The two main benefits would be:
>
> 1. Reduce the complexity of the syntax. In particular, reduce the
> amount of "special symbols". You've all heard jokes about Perl looking
> like line-noise, I'm sure. Reducing the amount of special characters
> you need to know about before understanding code is a win. Getting rid
> of some of the symbols and treating those types in a more regular way
> would also kill a lot of design problems like "what does a heap
> allocated fixed-sized vector literal look like"? Even more
> importantly, it would make it easier to see what a complicated type
> means because it would follow simple nesting rules that you already
> understand, because it's the same rules that apply to user-defined
> types. It would also make library-pointer types look like the "real
> thing" (e.g. arc<T>).
>
> 2. Apply selective negative pressure to constructs which should be
> avoided if possible. IMO good syntax makes "preferable" constructs
> easy to write, and non-preferable constructs harder. Rust does a
> pretty good job here already (e.g. sharing mutable memory between
> tasks is suitably clunky). I personally think that heap allocations
> deserve to be in the "non-preferable" category, too. They add memory
> management overhead (GC/refcounting). They reduce locality. They add
> waste (header words, the pointer itself, maybe ref counts, etc.). They
> increase fragmentation. I believe controlling allocations will be
> essential to good performance in the future, as heap sizes grow
> massive (esp. for GC which is O(live_objects)). Heap allocations are
> absolutely essential sometimes, of course, but it's preferable by far
> to try to store the data on the stack or interior to the owning data
> structure, or in a big memory pool, and use borrowed pointers to
> access it indirectly - only resort to heap pointers if there is no
> other option. This is also why I think the *only* syntactically
> preferred pointer should be the borrowed pointer. IMO, Rust makes it
> far too easy to allocate memory on various heaps - just add a little
> sigil and you're done. C makes you appreciate the implications of what
> you're about to do when you type "malloc".
>
> Anyway, as I said, I don't think this will gain much traction, but
> it's one of those small niggles that I would do differently if I was
> designing the language, so I thought I'd float it for consideration.
> Thanks for your time!
>



Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Graydon Hoare
In reply to this post by Graydon Hoare
On 12-07-13 7:21 PM, Bennie Kloosteman wrote:

 > We spend far more time reading than writing code   and more time is
> spend checking results , specs / emails /docs  , compiling  , code
> organization and see where things fit etc  - the time spend typing is
> low . It is not just maintenance programmers but a lot of new code is
> refactoring and bug fixing.  Conciseness is of little benefit and IMHO
> is dangerous as it makes language adoption harder  this is especially so
> for the libraries.  And adoption is crucial for Rust.

Yes, I'm familiar with this argument. It's exactly the same argument I
presented when initially asked why Rust wrote rec(...) rather than
{...}, had only a single namespace, had no type-directed dispatch, name
overloading, environment capture, type parameter inference, name
shadowing, integer-literal inference ... I lost all these arguments, and
many more, to people who were upset about having to "write more than
necessary".

Adoption is driven by many factors. Expressivity _is_ one of them;
though you'll note it's always been explicitly subordinate to safety,
efficiency and practicality. We balance factors as best we can. Keep in
mind that even seemingly simple concepts like "readable" are _very_ hard
to quantify: Ada was "designed to be readable" and yet the verbosity
appears to work _against_ it. Readers' eyes glaze over and are unable to
quickly apprehend detail, have to keep too much in short-term memory.

The argument for sigils boils down to one about ubiquity. Words we write
(or read!) over and over again, we tend to abbreviate. Sigils are just
the limit case of abbreviation: 1 symbol. Infix operators survive too,
and numbers, and printf strings, and lots of little lexical
peculiarities that require the reader to know how to scan them, but are
so ubiquitous that they push us towards abbreviation. Readers and
writers alike can find the abbreviated form easier to apprehend than the
elaborated form.

>   Java and C# have a good balance focusing on 5-6 chars keywords with
> very few 2 and 3 char ones  ( int is the main one) .

"Good" except for all those programmers who deride Java as far too
verbose. Ask around, opinions vary. C# can't be compared to Java
reasonably here as it contains far more abbreviated forms. Because of
Anders Hejlsberg's taste, as far as I can tell. I'm not insulting it;
he's got compelling taste!

Taste plays a large role in language design, especially in places like
this. A lot of conversations boil down to subjective judgment calls
where it's not really worth saying much beyond "I find X distasteful,
and I would prefer Y".

> Anyway Rust is a new language it should not look like something from the
> 70's meant for only mathematical types.

I concur, taste-wise, with aiming for less-academic and
less-mathematical terminology when possible (for example, I hope we do
not wind up referring to ~ as 'affine' in any context other than casual
conversations between designers). Hopefully this will improve more as we
file down further redundancy in the language and ensure the remaining
vocabulary is comprised of short, pithy, common words.

As for the 70s ... any decade that produced C, ML, Forth, CLU, Mesa,
Smalltalk, Pascal/Modula and Scheme is OK in my book. It's kinda the
design-space where we're aiming, actually. Just less crashy. They didn't
have quite as much internet in their face as we moderns do.

-Graydon

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Graydon Hoare
On 17/07/2012 12:40 AM, Bennie Kloosteman wrote:

> The problem to me is the less the language but more the libraries ,
> habbits now will become embedded in the "language style" .

We've always aimed for pluralism in the design: trying to make it
sufficiently easy to write in functional style, imperative style, and
(less strongly, but getting better) OO style. I feel like rust is not
presently forcing either "language style" on users, but I suppose the
libraries reflect the biases of the team.

I'm not sure discussing this in its "most abstract" form is the most
useful approach though. Perhaps if you could make your concerns a bit
more clear in code ... perhaps pick a single file from the std or core
lib (if it's not too much to ask) and modify it to the style you'd feel
is more legible, we can discuss in a pull request?

> Though in a world where all languages are safe and fast the other
> factors are important.

Ha! Call me when we get to that world. We're competing with C++, note.

> Yes ubiquity is crucial so why cont instead of continue its not like you
> use it often ...

It's been changed to 'again'. I'm not going to discuss that bug any
further, it's a total bikeshed and of no consequence.

> Yes affine is bad ,  but also iface is marginal.. id prefer interface
> due to erring on the side of verbosity.

Changing to 'trait' in 0.4 (which is the most common name for the sorts
of things they are).

> Anyway I say this because I want it to succeed , and the libraries are
> starting to form so the "style": they set is crucial and a 70s style is
> not good ....  If a library is full dblnlst  , str_cp  ,xml_p  adoption
> will be harder - its probably easier  these days to learn a new language
> than a new standard library as they will get big.

Ok, the keyword stuff I'm not really interested in debating ad
infinitum, there just aren't _enough_ of them for this to be worth a
long discussion (and it's 5 chars vs. 6, seriously). The abbreviations
in library code are a different matter. It's true that we have a bunch
of abbreviated library names. Examples:

   vec (not Vector)
   cmp (not Compare)
   bitv (not BitVector)
   dbg (not Debug)
   dlist (not DoublyLinkedList or CircularList)
   iter (not Iterable)
   ptr (not Pointer)
   rand::rng (not Random::Generator, say)
   sys (not System)
   rt (not Runtime)

Are these sorts of abbreviated names actually posing a problem to users?
It feels to me like we're in a similar ballpark to the naming
conventions of ... at least _some_ other language standard libraries:

    http://www.ruby-doc.org/stdlib-1.9.3/
    http://docs.python.org/library/
    http://golang.org/pkg/
    http://en.cppreference.com/w/cpp

Though obviously not the same style as Java, C# or (curiously) Haskell,
I haven't heard a _lot_ of clear feedback on this point. Patrick has
been advocating for us to change house style to writing type names as
TypeNames, but aside from that ... is vowel-omission or abbreviation
seriously an issue? (eg. python putting regular expressions in 're' or
system services in 'sys'?) Maybe having more-verbose type names, but
keeping module names short, is a good balance?

-Graydon

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Christian Siefkes
Hi Graydon and all

On 07/17/2012 04:41 PM, Graydon Hoare wrote:
> Though obviously not the same style as Java, C# or (curiously) Haskell, I
> haven't heard a _lot_ of clear feedback on this point. Patrick has been
> advocating for us to change house style to writing type names as TypeNames,
> but aside from that ... is vowel-omission or abbreviation seriously an
> issue? (eg. python putting regular expressions in 're' or system services in
> 'sys'?) Maybe having more-verbose type names, but keeping module names
> short, is a good balance?

I, for one, are fine with short names.

Best regards
        Chris

--
|------- Dr. Christian Siefkes ------- christian at siefkes.net -------
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|    Peer Production Everywhere:       http://peerconomy.org/wiki/
|---------------------------------- OpenPGP Key ID: 0x346452D8 --
Vaterland nennt sich der Staat immer dann, wenn er sich anschickt, auf
Menschenmord auszugehen.
        -- Friedrich D?rrenmatt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20120717/b26e2e95/attachment.sig>

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Daniel Patterson
In reply to this post by Graydon Hoare

On Jul 17, 2012, at 10:41 AM, Graydon Hoare wrote:

> Ok, the keyword stuff I'm not really interested in debating ad infinitum, there just aren't _enough_ of them for this to be worth a long discussion (and it's 5 chars vs. 6, seriously). The abbreviations in library code are a different matter. It's true that we have a bunch of abbreviated library names. Examples:
>
>  vec (not Vector)
>  cmp (not Compare)
>  bitv (not BitVector)
>  dbg (not Debug)
>  dlist (not DoublyLinkedList or CircularList)
>  iter (not Iterable)
>  ptr (not Pointer)
>  rand::rng (not Random::Generator, say)
>  sys (not System)
>  rt (not Runtime)
>
> Are these sorts of abbreviated names actually posing a problem to users?

As an outsider/newbie to Rust, I think this is pretty much a non-issue. Names are always going to be hard without good documentation, and with it, they will always be easy to use. It is no easier to remember vector than vec, and I think learning the name is the smallest/easiest thing you need to do when you are learning a library. As long as they aren't confusing (like if the name for a vector were "dup" or something similarly irrelevant/arbitrary), I personally think it is great that they are short. Not to make it easier to _write_, but to make it easier to read code. I want to have to read the minimum number of characters, and ideally be able to have the most amount of content on a single screen, within reason (it has to be relevant).

At least for me, it is similar with the core syntax. There aren't that many keywords, so learning them isn't going to be that hard no matter what they are, and short / relevant seems great to me. This isn't a language that is targeting people who've never programmed before, and for people who have, learning syntax is easy, so providing a syntax that makes it more pleasant to program in once you actually know the language (i.e., easier to read, based on shorter keywords etc) is a good thing. Too often I think there is a tendency to want to cater a language somewhat to people who've never used it before, even though the reality is that most people who are using it will know it well, and people who don't will be able to learn it based on good documentation, _not_ language features.

As an aside / broader comment, I think that rust code looks great (with a few caveats, which are actively being worked on), and congratulate the team on managing to balance interesting semantics (which is the exciting part, of course), with syntax that actually makes me want to program in it (as much as people say syntax doesn't matter?. it does). I would be very happy if this allowed me be able to stop writing C++. So, keep up the good work.

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Sebastian Sylvan
In reply to this post by Graydon Hoare
On Tue, Jul 17, 2012 at 7:41 AM, Graydon Hoare <graydon at mozilla.com> wrote:
> Though obviously not the same style as Java, C# or (curiously) Haskell, I
> haven't heard a _lot_ of clear feedback on this point. Patrick has been
> advocating for us to change house style to writing type names as TypeNames,
> but aside from that ... is vowel-omission or abbreviation seriously an
> issue? (eg. python putting regular expressions in 're' or system services in
> 'sys'?) Maybe having more-verbose type names, but keeping module names
> short, is a good balance?

IMO I like the shorter names. As long as it doesn't cause ambiguity.
"vector" tells me nothing that "vec" doesn't, so saving some screen
real-estate to speed up reading (and typing) is a win. It's not
black-and-white though. If something is used infrequently enough that
you don't expect people to remember it by heart, avoid short mnemonics
in favour of descriptive names. If it's unsafe and should stick out
like a sore thumb, avoid short names. For library stuff that's
essentially just shy of being a built-in language feature (like vec,
cmp, ptr, etc.) shorter names make sense IMO.

--
Sebastian Sylvan

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Kevin Cantu
As an English speaking engineer, "vector" has real meaning and
immediately recognize "vec" as an abbreviation of that, but a school
kid in Shanghai (??) shouldn't need to know everything I do to start
learning the language.  Long names can be easier to discover the
meanings of.

Anyways, here's another good place to drop this link:
http://www.youtube.com/watch?v=_ahvzDzKdB0&feature=gv (Growing a
Language, by Guy Steele).  :D


--
Kevin Cantu


On Tue, Jul 17, 2012 at 10:36 AM, Sebastian Sylvan
<sebastian.sylvan at gmail.com> wrote:

> On Tue, Jul 17, 2012 at 7:41 AM, Graydon Hoare <graydon at mozilla.com> wrote:
>> Though obviously not the same style as Java, C# or (curiously) Haskell, I
>> haven't heard a _lot_ of clear feedback on this point. Patrick has been
>> advocating for us to change house style to writing type names as TypeNames,
>> but aside from that ... is vowel-omission or abbreviation seriously an
>> issue? (eg. python putting regular expressions in 're' or system services in
>> 'sys'?) Maybe having more-verbose type names, but keeping module names
>> short, is a good balance?
>
> IMO I like the shorter names. As long as it doesn't cause ambiguity.
> "vector" tells me nothing that "vec" doesn't, so saving some screen
> real-estate to speed up reading (and typing) is a win. It's not
> black-and-white though. If something is used infrequently enough that
> you don't expect people to remember it by heart, avoid short mnemonics
> in favour of descriptive names. If it's unsafe and should stick out
> like a sore thumb, avoid short names. For library stuff that's
> essentially just shy of being a built-in language feature (like vec,
> cmp, ptr, etc.) shorter names make sense IMO.
>
> --
> Sebastian Sylvan
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Steven Blenkinsop
In reply to this post by Graydon Hoare
On Tuesday, July 17, 2012, Graydon Hoare wrote:

>  <snip>
>   rand::rng (not Random::Generator, say)


This one is redundant, and the redundancy is crowding out useful
information. I know rng is a common initialization, but you'd get the same
information across with rand::gen in the same space. Then, "generator"
doesn't get relegated to one character out of eight, and you remove any
possibility of reading it as random::range.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20120717/291c3ae2/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Elliott Slaughter
In reply to this post by Graydon Hoare
> From: "Graydon Hoare" <graydon at mozilla.com>
>
> Ok, the keyword stuff I'm not really interested in debating ad
> infinitum, there just aren't _enough_ of them for this to be worth a
> long discussion (and it's 5 chars vs. 6, seriously). The
> abbreviations
> in library code are a different matter. It's true that we have a
> bunch
> of abbreviated library names. Examples:
>
>    vec (not Vector)
>    cmp (not Compare)
>    bitv (not BitVector)
>    dbg (not Debug)
>    dlist (not DoublyLinkedList or CircularList)
>    iter (not Iterable)
>    ptr (not Pointer)
>    rand::rng (not Random::Generator, say)
>    sys (not System)
>    rt (not Runtime)
>
> Are these sorts of abbreviated names actually posing a problem to
> users?

I definitely like short keywords, and I don't have a problem with short names in the standard library, but I am a little worried about the precedent being set for Rust code in general.

Take, for example, trans/base.rs in the Rust code base. The code uses abbreviations fairly aggressively, including ccx, bcx, icx, ty, ti, insn, ptr, t, incr, sess, among others. None of these are especially difficult to figure out, but the time it takes to get used to the code is non-zero.

Now imagine that third-party Rust libraries follow this example. Now I have to learn abbreviations for every library I use in my application. If for any reason I need to modify a third party library for my own purposes, I'll need to learn its internal abbreviations as well.

Should we really be using short name everywhere? And if not, how do we encourage people to use readable names, given the example we are providing in the standard library?

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Tim Chevalier
On Tue, Jul 17, 2012 at 2:23 PM, Elliott Slaughter
<eslaughter at mozilla.com> wrote:
> Take, for example, trans/base.rs in the Rust code base. The code uses abbreviations fairly aggressively, including ccx, bcx, icx, ty, ti, insn, ptr, t, incr, sess, among others. None of these are especially difficult to figure out, but the time it takes to get used to the code is non-zero.
>

This could be addressed by better documentation (inline comments or
something else). The advantage of that is that the documentation is
confined to one part of a particular module that no one needs to
re-read once they've learned what the abbreviations mean; in contrast,
everyone pays the cost of using (both typing and reading) long names
over and over.

Also, since in Rust it's usually visually apparent what the type of a
variable is, at least if it's a function argument or a top-level
function, all you really need to do to figure out what "ccx" or "bcx"
means is figure out their types. The piece that's missing right now is
useful documentation for each type.

> Now imagine that third-party Rust libraries follow this example. Now I have to learn abbreviations for every library I use in my application. If for any reason I need to modify a third party library for my own purposes, I'll need to learn its internal abbreviations as well.
>

Likewise, I think that's a documentation issue.

> Should we really be using short name everywhere? And if not, how do we encourage people to use readable names, given the example we are providing in the standard library?

To me, short versus long names are a matter of taste and not something
we should dictate. Different contexts suggest different sorts of
naming conventions. I trust anyone who is sensible enough to choose
Rust to be sensible enough to choose those conventions for themselves
:-)

Cheers,
Tim

--
Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt
"Debate is useless when one participant denies the full dignity of the
other." -- Eric Berndt

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Patrick Walton
In reply to this post by Elliott Slaughter
On 7/17/12 2:23 PM, Elliott Slaughter wrote:
> Now imagine that third-party Rust libraries follow this example. Now
> I have to learn abbreviations for every library I use in my
> application. If for any reason I need to modify a third party library
> for my own purposes, I'll need to learn its internal abbreviations as
> well.
>
> Should we really be using short name everywhere? And if not, how do
> we encourage people to use readable names, given the example we are
> providing in the standard library?

Agreed. I personally prefer longer, unabbreviated names (and camel-cased
types) in user code for this reason. Keywords are a small fixed set of
things that all users of the language must know, while user code
contains an unbounded number of abbreviations in the limit. See resolve3
for an example of this (although perhaps my names are *too* long there...)

In general I'm not a big fan of Unix-command-line-style abbreviation. It
works and is elegant when programs are kept very small, but as programs
grow larger and larger programmers have to page in and out abbreviations
too often for easy reading. Functions like CreateOrRecycleThebesLayer()
and UpdateDisplayItemDataForFrame() in Gecko may be lengthy, but they
sure make digging through MXR easier.

Unix was forced to abbreviate for the 6 character limit, but that
doesn't exist anymore. In fact, modern Plan 9 code doesn't abbreviate
nearly as often as we do; they just pick short names. Take esc.c (just
as an example) from Go:

http://code.google.com/p/go/source/browse/src/cmd/gc/esc.c

We have escapes(), visit(), visitcodelist(), visitcode(), analyze(),
escfunc(), escloopdepthlist(), escloopdepth(), esclist(), esc(),
escassign(), esccall(), escflows(), escflood(), escwalk(), and esctag().
If we assume that the "esc" prefix is just C namespacing, then the only
*abbreviation* there is "func". The rest are just short names. The
resulting code reads much nicer than the Rust code in our compiler.
Short names are fine and read well; abbreviations don't.

(In fact, I prefer unabbreviated keywords in general; I'd prefer
"return", "module", and "match", but I'm happy to yield to the tastes of
the community and BDFL here, since I feel that abbreviated keywords have
a much smaller cost than abbreviated user code.)

Patrick

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Patrick Walton
In reply to this post by Tim Chevalier
On 7/17/12 2:37 PM, Tim Chevalier wrote:
> Likewise, I think that's a documentation issue.

I don't think that's entirely true. Gecko is so large that nobody can
hold all the names in their head. And the dependency graph of Gecko has
rapidly approached a complete graph over time -- it's inevitable in
large software projects -- so everyone encounters names of functions
that they haven't seen before. Being able to guess at a glance what a
function does is critical, and it's much easier to parse
CreateOrRecycleThebesLayer() than, say, mk_tl().

Patrick

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Vladimir Lushnikov
In reply to this post by Tim Chevalier
English unabbreviated names seem like the way to go (they're not long
if you're used to
AbstractWidgetWithNetworkingConfigurationFactorySingleton or something
like that from that Java world).

I don't think it's about the number of keystrokes though - for two reasons:
  [1] Your editor (/IDE) should help you complete the keywords and type names
  [2] Code comprehension research so far shows it's about word count
not character count - i.e. the more words of code you have to read,
the longer it takes to comprehend

I personally prefer to have an editor that can code complete as it
both reduces the number of keystrokes and allows me to remember names
of concepts rather than abbreviations or particular combinations of
class names. I haven't written much Rust code because of a lack of
good editor.

When someone in your team asks you - hey, can you help me with this
problem - I often find myself reading code that I've never seen and
simulating what it's doing in my head. If I also had to learn a
vocabulary or look at the documentation while standing at someone
else's desk then debugging the code would take longer. Good code
should be self-documenting - you read it once, and you can begin to
understand what it's doing.

Symbolic abbreviations (like @ or ~ are different because there are a
fixed number of them with a fixed meaning) and you should be able to
learn these when you've learnt the language. This is in contrast to
abbreviations like 'ccx' or 'bcx' where the number of abbreviations is
not fixed and differs from program to program.

Obviously this is all just my opinion.

Thanks,
Vlad


On Tue, Jul 17, 2012 at 10:37 PM, Tim Chevalier <catamorphism at gmail.com> wrote:

> On Tue, Jul 17, 2012 at 2:23 PM, Elliott Slaughter
> <eslaughter at mozilla.com> wrote:
>> Take, for example, trans/base.rs in the Rust code base. The code uses abbreviations fairly aggressively, including ccx, bcx, icx, ty, ti, insn, ptr, t, incr, sess, among others. None of these are especially difficult to figure out, but the time it takes to get used to the code is non-zero.
>>
>
> This could be addressed by better documentation (inline comments or
> something else). The advantage of that is that the documentation is
> confined to one part of a particular module that no one needs to
> re-read once they've learned what the abbreviations mean; in contrast,
> everyone pays the cost of using (both typing and reading) long names
> over and over.
>
> Also, since in Rust it's usually visually apparent what the type of a
> variable is, at least if it's a function argument or a top-level
> function, all you really need to do to figure out what "ccx" or "bcx"
> means is figure out their types. The piece that's missing right now is
> useful documentation for each type.
>
>> Now imagine that third-party Rust libraries follow this example. Now I have to learn abbreviations for every library I use in my application. If for any reason I need to modify a third party library for my own purposes, I'll need to learn its internal abbreviations as well.
>>
>
> Likewise, I think that's a documentation issue.
>
>> Should we really be using short name everywhere? And if not, how do we encourage people to use readable names, given the example we are providing in the standard library?
>
> To me, short versus long names are a matter of taste and not something
> we should dictate. Different contexts suggest different sorts of
> naming conventions. I trust anyone who is sensible enough to choose
> Rust to be sensible enough to choose those conventions for themselves
> :-)
>
> Cheers,
> Tim
>
> --
> Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt
> "Debate is useless when one participant denies the full dignity of the
> other." -- Eric Berndt
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Patrick Walton
In reply to this post by Patrick Walton
In short: "short unabbreviated names are fine, but abbreviations should
be considered carefully and assumed bad unless proven otherwise".

Patrick

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Niko Matsakis
In reply to this post by Patrick Walton
This is interesting.  When I first started working on the Rust compiler,
I very much enjoyed the Unix-like, abbreviated names.  However, I have
to admit that reading Patrick's resolve3 code is making me ashamed of
the code I've written lately.  I plan to go back and "verbos-ify" some
of my code.  I do think it helps with readability overall, though it is
definitely true that if names are unnecessarily long (e.g.,
`ty_param_bounds_and_ty` or `optional_interface`) they can become
distracting.  Also, our 78-character limit starts to become more of a
burden.


Niko


On 7/17/12 2:39 PM, Patrick Walton wrote:

> On 7/17/12 2:23 PM, Elliott Slaughter wrote:
>> Now imagine that third-party Rust libraries follow this example. Now
>> I have to learn abbreviations for every library I use in my
>> application. If for any reason I need to modify a third party library
>> for my own purposes, I'll need to learn its internal abbreviations as
>> well.
>>
>> Should we really be using short name everywhere? And if not, how do
>> we encourage people to use readable names, given the example we are
>> providing in the standard library?
>
> Agreed. I personally prefer longer, unabbreviated names (and
> camel-cased types) in user code for this reason. Keywords are a small
> fixed set of things that all users of the language must know, while
> user code contains an unbounded number of abbreviations in the limit.
> See resolve3 for an example of this (although perhaps my names are
> *too* long there...)
>
> In general I'm not a big fan of Unix-command-line-style abbreviation.
> It works and is elegant when programs are kept very small, but as
> programs grow larger and larger programmers have to page in and out
> abbreviations too often for easy reading. Functions like
> CreateOrRecycleThebesLayer() and UpdateDisplayItemDataForFrame() in
> Gecko may be lengthy, but they sure make digging through MXR easier.
>
> Unix was forced to abbreviate for the 6 character limit, but that
> doesn't exist anymore. In fact, modern Plan 9 code doesn't abbreviate
> nearly as often as we do; they just pick short names. Take esc.c (just
> as an example) from Go:
>
> http://code.google.com/p/go/source/browse/src/cmd/gc/esc.c
>
> We have escapes(), visit(), visitcodelist(), visitcode(), analyze(),
> escfunc(), escloopdepthlist(), escloopdepth(), esclist(), esc(),
> escassign(), esccall(), escflows(), escflood(), escwalk(), and
> esctag(). If we assume that the "esc" prefix is just C namespacing,
> then the only *abbreviation* there is "func". The rest are just short
> names. The resulting code reads much nicer than the Rust code in our
> compiler. Short names are fine and read well; abbreviations don't.
>
> (In fact, I prefer unabbreviated keywords in general; I'd prefer
> "return", "module", and "match", but I'm happy to yield to the tastes
> of the community and BDFL here, since I feel that abbreviated keywords
> have a much smaller cost than abbreviated user code.)
>
> Patrick
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev



Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Ziad Hatahet
Not to start a religious war, but maybe using camelCase instead of
under_score would save even more space given the 78-character line limit? :)


--
Ziad


On Tue, Jul 17, 2012 at 9:03 PM, Niko Matsakis <niko at alum.mit.edu> wrote:

> This is interesting.  When I first started working on the Rust compiler, I
> very much enjoyed the Unix-like, abbreviated names.  However, I have to
> admit that reading Patrick's resolve3 code is making me ashamed of the code
> I've written lately.  I plan to go back and "verbos-ify" some of my code.
>  I do think it helps with readability overall, though it is definitely true
> that if names are unnecessarily long (e.g., `ty_param_bounds_and_ty` or
> `optional_interface`) they can become distracting.  Also, our 78-character
> limit starts to become more of a burden.
>
>
> Niko
>
>
>
> On 7/17/12 2:39 PM, Patrick Walton wrote:
>
>> On 7/17/12 2:23 PM, Elliott Slaughter wrote:
>>
>>> Now imagine that third-party Rust libraries follow this example. Now
>>> I have to learn abbreviations for every library I use in my
>>> application. If for any reason I need to modify a third party library
>>> for my own purposes, I'll need to learn its internal abbreviations as
>>> well.
>>>
>>> Should we really be using short name everywhere? And if not, how do
>>> we encourage people to use readable names, given the example we are
>>> providing in the standard library?
>>>
>>
>> Agreed. I personally prefer longer, unabbreviated names (and camel-cased
>> types) in user code for this reason. Keywords are a small fixed set of
>> things that all users of the language must know, while user code contains
>> an unbounded number of abbreviations in the limit. See resolve3 for an
>> example of this (although perhaps my names are *too* long there...)
>>
>> In general I'm not a big fan of Unix-command-line-style abbreviation. It
>> works and is elegant when programs are kept very small, but as programs
>> grow larger and larger programmers have to page in and out abbreviations
>> too often for easy reading. Functions like CreateOrRecycleThebesLayer() and
>> UpdateDisplayItemDataForFrame(**) in Gecko may be lengthy, but they sure
>> make digging through MXR easier.
>>
>> Unix was forced to abbreviate for the 6 character limit, but that doesn't
>> exist anymore. In fact, modern Plan 9 code doesn't abbreviate nearly as
>> often as we do; they just pick short names. Take esc.c (just as an example)
>> from Go:
>>
>> http://code.google.com/p/go/**source/browse/src/cmd/gc/esc.c<http://code.google.com/p/go/source/browse/src/cmd/gc/esc.c>
>>
>> We have escapes(), visit(), visitcodelist(), visitcode(), analyze(),
>> escfunc(), escloopdepthlist(), escloopdepth(), esclist(), esc(),
>> escassign(), esccall(), escflows(), escflood(), escwalk(), and esctag(). If
>> we assume that the "esc" prefix is just C namespacing, then the only
>> *abbreviation* there is "func". The rest are just short names. The
>> resulting code reads much nicer than the Rust code in our compiler. Short
>> names are fine and read well; abbreviations don't.
>>
>> (In fact, I prefer unabbreviated keywords in general; I'd prefer
>> "return", "module", and "match", but I'm happy to yield to the tastes of
>> the community and BDFL here, since I feel that abbreviated keywords have a
>> much smaller cost than abbreviated user code.)
>>
>> Patrick
>> ______________________________**_________________
>> Rust-dev mailing list
>> Rust-dev at mozilla.org
>> https://mail.mozilla.org/**listinfo/rust-dev<https://mail.mozilla.org/listinfo/rust-dev>
>>
>
>
> ______________________________**_________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/**listinfo/rust-dev<https://mail.mozilla.org/listinfo/rust-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20120718/3b1e1070/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bikeshed proposal to simplify syntax

Till Schneidereit
In reply to this post by Niko Matsakis
Having recently read a considerable amount of code in SpiderMonkey and
quite a bit in various parts of the Rust compiler, I can attest to
strong use of abbreviated names making things much harder to
understand.

While Tim is right about documentation helping with that, I agree with
Patrick that that just doesn't scale indefinitely. Also, at least for
me, the additional burden of keeping a translation table in my head
makes a surprisingly big difference in my ability to parse code and
figure out its semantics quickly.

The bottom line is that I don't think it's just a matter of taste:
it's a usability issue. Maybe not the biggest one, but certainly not a
small one, either.

On Wed, Jul 18, 2012 at 6:03 AM, Niko Matsakis <niko at alum.mit.edu> wrote:

> This is interesting.  When I first started working on the Rust compiler, I
> very much enjoyed the Unix-like, abbreviated names.  However, I have to
> admit that reading Patrick's resolve3 code is making me ashamed of the code
> I've written lately.  I plan to go back and "verbos-ify" some of my code.  I
> do think it helps with readability overall, though it is definitely true
> that if names are unnecessarily long (e.g., `ty_param_bounds_and_ty` or
> `optional_interface`) they can become distracting.  Also, our 78-character
> limit starts to become more of a burden.
>
>
> Niko
>
>
>
> On 7/17/12 2:39 PM, Patrick Walton wrote:
>>
>> On 7/17/12 2:23 PM, Elliott Slaughter wrote:
>>>
>>> Now imagine that third-party Rust libraries follow this example. Now
>>> I have to learn abbreviations for every library I use in my
>>> application. If for any reason I need to modify a third party library
>>> for my own purposes, I'll need to learn its internal abbreviations as
>>> well.
>>>
>>> Should we really be using short name everywhere? And if not, how do
>>> we encourage people to use readable names, given the example we are
>>> providing in the standard library?
>>
>>
>> Agreed. I personally prefer longer, unabbreviated names (and camel-cased
>> types) in user code for this reason. Keywords are a small fixed set of
>> things that all users of the language must know, while user code contains an
>> unbounded number of abbreviations in the limit. See resolve3 for an example
>> of this (although perhaps my names are *too* long there...)
>>
>> In general I'm not a big fan of Unix-command-line-style abbreviation. It
>> works and is elegant when programs are kept very small, but as programs grow
>> larger and larger programmers have to page in and out abbreviations too
>> often for easy reading. Functions like CreateOrRecycleThebesLayer() and
>> UpdateDisplayItemDataForFrame() in Gecko may be lengthy, but they sure make
>> digging through MXR easier.
>>
>> Unix was forced to abbreviate for the 6 character limit, but that doesn't
>> exist anymore. In fact, modern Plan 9 code doesn't abbreviate nearly as
>> often as we do; they just pick short names. Take esc.c (just as an example)
>> from Go:
>>
>> http://code.google.com/p/go/source/browse/src/cmd/gc/esc.c
>>
>> We have escapes(), visit(), visitcodelist(), visitcode(), analyze(),
>> escfunc(), escloopdepthlist(), escloopdepth(), esclist(), esc(),
>> escassign(), esccall(), escflows(), escflood(), escwalk(), and esctag(). If
>> we assume that the "esc" prefix is just C namespacing, then the only
>> *abbreviation* there is "func". The rest are just short names. The resulting
>> code reads much nicer than the Rust code in our compiler. Short names are
>> fine and read well; abbreviations don't.
>>
>> (In fact, I prefer unabbreviated keywords in general; I'd prefer "return",
>> "module", and "match", but I'm happy to yield to the tastes of the community
>> and BDFL here, since I feel that abbreviated keywords have a much smaller
>> cost than abbreviated user code.)
>>
>> Patrick
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev at mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev
>
>
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev

12