The future of iterators in Rust

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The future of iterators in Rust

Daniel Micay
A quick terminology refresher, for those who aren't familiar with it:

* Internal iterator: takes a closure, runs the closure until it asks to break
* External iterator: state machine, advanced by the caller in a loop

To a caller, external iterators provide the most functionality, because they
can be used as an internal iterator. You lose the state of an internal iterator
by breaking out of iterator, so generic algorithms like zip, union, intersect
and merge can't be implemented for a pair of iterators.

# Issues with internal iterators in Rust

A few months ago, we only had internal iterators and there were no generic
algorithms to use with any iterator - only with BaseIter's `each` method.

Rust's internal iterators implement the protocol encoded in Rust's for
statement, but it's not possible to give them all a common trait or implement
generic methods or functions taking any internal iterator.

As a workaround, we can write algorithms assuming internal iterators only take
one argument (the closure):

    fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint

The caller has to use a partial function to call these adaptors, for which we
lack sugar:

    count(|f| uint::range(0, 10, f))

For simple functions, this is fairly reasonable once you're used to it. It
quickly gets out of control though, even for a simple function like filter:

    filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, output: &fn(T) -> bool) -> bool {}

Sadly, `filter_ref` is also needed to work around closures behaving badly with
lifetimes. An example of the problem with `fold_ref`:

    fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> T {
        fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
    }

Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
region), it won't work with `fold` because that requires the borrowed pointer
to have the same type (and thus lifetime) for every iteration like `&'a int`.

This issue with borrowed pointers was blocking me from replacing the existing
algorithms reimplemented for both `str` and `vec` with the generic ones.

Chaining together iteration algorithms is a common use case, but even chaining
two together is confusing at first:

    to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)

Another more alarming issue is that with internal iterators, the `break` and
`return` statements don't always work in a `for` loop. If the iterator isn't
implemented correctly, the loop will keep going.

This also has borrow checking implications, because the compiler can't assume
those statements actually cause flow control to leave the loop immediately.

# External iterators

Based on the above, you might think generic iteration algorithms in Rust are a
bleak prospect. However, we already have a nice external iterator library, and
they don't suffer from the above issues.

All kinds of external iterators implement the following trait, whether they are
a fibonacci number generator, a reverse iterator over a vector or iterator over
a range in a sorted set:

    pub trait Iterator<A> {
        /// Advance the iterator and return the next value. Return `None` when the end is reached.
        fn next(&mut self) -> Option<A>;
    }

Generic adaptors are implemented on `Iterator`, and many of them are `Iterator`
implementations themselves:

    use std::iterator::*;

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
        println(it.fold(0.0, |a, b| a + b).to_str())
    }

If you're curious, the optimized LLVM IR: http://ix.io/5Xl

Unlike internal iterators, external iterators only run one iteration at a time,
so a `for` loop designed for them would always be able to succeed with `break`
and `return`. It would also be able to avoid the ugly `advance` wrapper
currently required to use external iterators with `for`.

    // The current situation, wrapping an external iterator as an internal one
    //
    // Since the advance method is not known the be correct, borrow checking
    // still assumes `return` and `break` are imperfect.
    for xs.zip(ys).advance |x| { ... }

    // A hypothetical `for` loop using the `Iterator` trait
    for iterator |x| { ... }

    // It could also fall back to an `Iterable` trait and obtain an iterator
    for container |x| { ... }

External iterators also avoid the problems with references and closures,
because they simply return `T` rather than passing it to a closure.

# Why not just switch to external iterators?

Algorithms that can be represented easily without the call stack are as easy to
write as either an internal or external iterator.

However, without support for compiling a function to a state machine (C# does
this for yield), traversal of a recursive data structure has to be manually
translated to a procedural algorithm with an explicit stack. For complex data
structures, this process can be very difficult.

I'm hopeful Rust will gain support for this down the road after 1.0. If it
does, there will be no reason to write immutable internal iterators.

Another issue is mutability, as you can write iterators that are able to mutate
containers. With internal iterators, this is easy to do with safe code. With
external ones, it will `unsafe` and won't be easy to get right.

# Solution?

I don't have any proposal to completely solve this issue. :)

I think extending the built-in `for` loop to work with external iterators
should be considered, because right now the verbosity discourages using them
and makes borrow checking more painful than it has to be.

It could treat functions as internal iterators, and look for an `Iterator`
implementation (using a `lang` item) for external ones.

Python's `for` loop starts by looking for an iterator (a `__next__` method) and
falls back to an iterable (an `__iter__` method) so behaviour like this isn't
an alien concept.

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Patrick Walton
On 6/5/13 9:09 PM, Daniel Micay wrote:

> I think extending the built-in `for` loop to work with external iterators
> should be considered, because right now the verbosity discourages using them
> and makes borrow checking more painful than it has to be.
>
> It could treat functions as internal iterators, and look for an `Iterator`
> implementation (using a `lang` item) for external ones.
>
> Python's `for` loop starts by looking for an iterator (a `__next__` method) and
> falls back to an iterable (an `__iter__` method) so behaviour like this isn't
> an alien concept.

This is a very well-thought out post, and I find it persuasive. The
mutability issue is one I hadn't considered, and seems to make a good
argument for including both in the language.

There is also compilation speed to consider. I have not measured, but
intuitively I would guess that external iterators are faster to compile
than internal ones in codegen. This is because codegen has to create a
closure only to painstakingly break it apart again in optimization via a
combination of inlining, constant propagation, and scalar replacement of
aggregates.

Patrick

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Graydon Hoare
On 05/06/2013 9:15 PM, Patrick Walton wrote:

> On 6/5/13 9:09 PM, Daniel Micay wrote:
>> I think extending the built-in `for` loop to work with external iterators
>> should be considered, because right now the verbosity discourages
>> using them
>> and makes borrow checking more painful than it has to be.
>>
>> It could treat functions as internal iterators, and look for an
>> `Iterator`
>> implementation (using a `lang` item) for external ones.
>>
>> Python's `for` loop starts by looking for an iterator (a `__next__`
>> method) and
>> falls back to an iterable (an `__iter__` method) so behaviour like
>> this isn't
>> an alien concept.
>
> This is a very well-thought out post, and I find it persuasive. The
> mutability issue is one I hadn't considered, and seems to make a good
> argument for including both in the language.

Yeah. I think it's clear enough that both have their (strong)
advantages; we have hashmaps and treemaps too, and both vectors and lists :)

The main thing I'm concerned with is making the interfaces to user code
smooth enough that neither feels markedly second-class or unusable.
Extending 'for' as you suggest sounds like a good step.

It might also be good to add a tutorial chapter on the matter,
introducing these terms and relating them together, so users can see the
relative use cases.

Thanks for the ongoing work in this area!

-Graydon

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Vadim Chugunov
In reply to this post by Daniel Micay

On Wed, Jun 5, 2013 at 9:09 PM, Daniel Micay <[hidden email]> wrote:

All kinds of external iterators implement the following trait, whether they are
a fibonacci number generator, a reverse iterator over a vector or iterator over
a range in a sorted set:

    pub trait Iterator<A> {
        /// Advance the iterator and return the next value. Return `None` when the end is reached.
        fn next(&mut self) -> Option<A>;
    }

Based on my experience with iterators in other languages, I would like throw in the following idea:

pub trait BlockyIterator<A> {
        /// Advance the iterator and return the next block of values.  Return empty vector when the end is reached
        fn next(&mut self) -> &'self [A]
}

Rationale: when concrete type of the iterator is known, the next() method inlines nicely.  However, when used polymorphically, every iterator advancement turns into a function call.  A lot of that overhead could be shaved off, if iterators can return multiple values at a time.  A "normal" Iterator can then be implemented on top of this trait.

This is especially applicable to types which already store elements in an vector, most notably [T] itself.

 
However, without support for compiling a function to a state machine (C# does
this for yield), traversal of a recursive data structure has to be manually
translated to a procedural algorithm with an explicit stack. For complex data
structures, this process can be very difficult.

I'm hopeful Rust will gain support for this down the road after 1.0. If it
does, there will be no reason to write immutable internal iterators.

This could also be implemented with co-routines.   Supposedly stack switching will be separated from tasks sometime soon?
 


_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Thu, Jun 6, 2013 at 6:19 PM, Vadim <[hidden email]> wrote:

>
> Based on my experience with iterators in other languages, I would like throw
> in the following idea:
>
> pub trait BlockyIterator<A> {
>         /// Advance the iterator and return the next block of values.
> Return empty vector when the end is reached
>         fn next(&mut self) -> &'self [A]
> }
>
> Rationale: when concrete type of the iterator is known, the next() method
> inlines nicely.  However, when used polymorphically, every iterator
> advancement turns into a function call.  A lot of that overhead could be
> shaved off, if iterators can return multiple values at a time.  A "normal"
> Iterator can then be implemented on top of this trait.
>
> This is especially applicable to types which already store elements in an
> vector, most notably [T] itself.

As you've mentioned, the existing Iterator trait allows you to return
data in blocks if you desire (Iterator<&[T]> or Iterator<~[T]>).
Essentially only vectors and vector-based deques can return a slice,
so it wouldn't really be a generic trait.

> This could also be implemented with co-routines.   Supposedly stack
> switching will be separated from tasks sometime soon?

Coroutines/generators are quite cool, but they're far from free if
they're implemented with context switching between code and not with a
state machine - a switch involves swapping the registers and wiping
out a lot of the cache.

The C# solution can be as efficient as the code written by hand since
it just generates a struct and a stateful iteration function, which is
right in line with Rust's system's programming niche.

The hard part would be implementing it, and managing to keep the
borrow/move checking errors comprehensible (all your locals
essentially become fields).
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Vadim Chugunov



On Thu, Jun 6, 2013 at 4:13 PM, Daniel Micay <[hidden email]> wrote:
On Thu, Jun 6, 2013 at 6:19 PM, Vadim <[hidden email]> wrote:
>
> Based on my experience with iterators in other languages, I would like throw
> in the following idea:
>
> pub trait BlockyIterator<A> {
>         /// Advance the iterator and return the next block of values.
> Return empty vector when the end is reached
>         fn next(&mut self) -> &'self [A]
> }
>
> Rationale: when concrete type of the iterator is known, the next() method
> inlines nicely.  However, when used polymorphically, every iterator
> advancement turns into a function call.  A lot of that overhead could be
> shaved off, if iterators can return multiple values at a time.  A "normal"
> Iterator can then be implemented on top of this trait.
>
> This is especially applicable to types which already store elements in an
> vector, most notably [T] itself.

As you've mentioned, the existing Iterator trait allows you to return
data in blocks if you desire (Iterator<&[T]> or Iterator<~[T]>).
Essentially only vectors and vector-based deques can return a slice,
so it wouldn't really be a generic trait.

The iterator object itself can contain a fixed size buffer into which it copies objects before returning slice to the caller.  This would work for almost any container type.

Perhaps I'm confused about the amount of inlining that Rust can perform.  Do you expect that iterator.next() calls will be inlined across crate boundaries?

Coroutines/generators are quite cool, but they're far from free if
they're implemented with context switching between code and not with a
state machine - a switch involves swapping the registers and wiping
out a lot of the cache.

Hmm, yes, that true...


_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Bill Myers
In reply to this post by Graydon Hoare
Scala has a similar design, with the following traits:
- TraversableOnce: can be internally iterated once (has a foreach() method that takes a closure)
- Traversable: can be internally iterated unlimited times (has a foreach() method that takes a closure)
- Iterable: can be externally iterated (has an iterator() method that returns an Iterator trait)

The way it works is that Iterable extends Traversable, which extends TraversableOnce, and the for loop just uses TraversableOnce, and Iterable has a default implementation of the TraversableOnce foreach() function using the iterator() function.

Also, the Iterator trait itself extends TraversableOnce, implementing foreach() by mutating itself.

It might be a good idea to investigate copying this design.


_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Sebastian Sylvan


On Thu, Jun 6, 2013 at 7:22 PM, Bill Myers <[hidden email]> wrote:
Scala has a similar design, with the following traits:
- TraversableOnce: can be internally iterated once (has a foreach() method that takes a closure)
- Traversable: can be internally iterated unlimited times (has a foreach() method that takes a closure)
- Iterable: can be externally iterated (has an iterator() method that returns an Iterator trait)

The way it works is that Iterable extends Traversable, which extends TraversableOnce, and the for loop just uses TraversableOnce, and Iterable has a default implementation of the TraversableOnce foreach() function using the iterator() function.

Also, the Iterator trait itself extends TraversableOnce, implementing foreach() by mutating itself.

It might be a good idea to investigate copying this design.
 
I find Andrei Alexandrescu's argument about range-based iterators pretty persuasive: http://www.informit.com/articles/article.aspx?p=1407357
 
Seb

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Thu, Jun 6, 2013 at 11:01 PM, Sebastian Sylvan
<[hidden email]> wrote:

>
>
> On Thu, Jun 6, 2013 at 7:22 PM, Bill Myers <[hidden email]> wrote:
>>
>> Scala has a similar design, with the following traits:
>> - TraversableOnce: can be internally iterated once (has a foreach() method
>> that takes a closure)
>> - Traversable: can be internally iterated unlimited times (has a foreach()
>> method that takes a closure)
>> - Iterable: can be externally iterated (has an iterator() method that
>> returns an Iterator trait)
>>
>> The way it works is that Iterable extends Traversable, which extends
>> TraversableOnce, and the for loop just uses TraversableOnce, and Iterable
>> has a default implementation of the TraversableOnce foreach() function using
>> the iterator() function.
>>
>> Also, the Iterator trait itself extends TraversableOnce, implementing
>> foreach() by mutating itself.
>>
>> It might be a good idea to investigate copying this design.
>
>
> I find Andrei Alexandrescu's argument about range-based iterators pretty
> persuasive: http://www.informit.com/articles/article.aspx?p=1407357
>
> Seb

Andrei usually talks about ranges in contrast with C++ iterators, so
it's a bit different than comparing with the Iterator trait Rust is
using. C++ iterators don't know when they've reached the end of the
range, so you have to compare them with a sentinel end iterator. In
that sense, Rust's iterator is a range and the `next` method is
comparable to the pop method on D ranges.
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Sebastian Sylvan
The linked article contrasts them with the GoF-style iterators as well.
 
The Rust Iterator trait is similar to the "one pass ranges" (and possibly forward ranges), but not double-ended ranges or random-access ranges. It's the *family* of range-based iterators that makes it flexible (e.g. allowing you to write an efficient in-place reverse without knowing the underlying data structure, using a double-ended range).
 
 


On Thu, Jun 6, 2013 at 8:16 PM, Daniel Micay <[hidden email]> wrote:
On Thu, Jun 6, 2013 at 11:01 PM, Sebastian Sylvan
<[hidden email]> wrote:
>
>
> On Thu, Jun 6, 2013 at 7:22 PM, Bill Myers <[hidden email]> wrote:
>>
>> Scala has a similar design, with the following traits:
>> - TraversableOnce: can be internally iterated once (has a foreach() method
>> that takes a closure)
>> - Traversable: can be internally iterated unlimited times (has a foreach()
>> method that takes a closure)
>> - Iterable: can be externally iterated (has an iterator() method that
>> returns an Iterator trait)
>>
>> The way it works is that Iterable extends Traversable, which extends
>> TraversableOnce, and the for loop just uses TraversableOnce, and Iterable
>> has a default implementation of the TraversableOnce foreach() function using
>> the iterator() function.
>>
>> Also, the Iterator trait itself extends TraversableOnce, implementing
>> foreach() by mutating itself.
>>
>> It might be a good idea to investigate copying this design.
>
>
> I find Andrei Alexandrescu's argument about range-based iterators pretty
> persuasive: http://www.informit.com/articles/article.aspx?p=1407357
>
> Seb

Andrei usually talks about ranges in contrast with C++ iterators, so
it's a bit different than comparing with the Iterator trait Rust is
using. C++ iterators don't know when they've reached the end of the
range, so you have to compare them with a sentinel end iterator. In
that sense, Rust's iterator is a range and the `next` method is
comparable to the pop method on D ranges.



--
Sebastian Sylvan

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Fri, Jun 7, 2013 at 12:58 AM, Sebastian Sylvan
<[hidden email]> wrote:

> The linked article contrasts them with the GoF-style iterators as well.
>
> The Rust Iterator trait is similar to the "one pass ranges" (and possibly
> forward ranges), but not double-ended ranges or random-access ranges. It's
> the *family* of range-based iterators that makes it flexible (e.g. allowing
> you to write an efficient in-place reverse without knowing the underlying
> data structure, using a double-ended range).
>
> See fig. 3:
> http://www.informit.com/content/images/art_alexandrescu3_iterators/elementLinks/alexandrescu3_fig03.jpg

The extent to which you can have mutable iterators in Rust is pretty
small, because of the memory safety requirement. Iterators can't open
up a hole allowing multiple mutable references to the same object to
be obtained, so I don't think mutable bidirectional or random access
iterators are possible.

Forward iterators can't ever give you an alias because they're a
single pass over the container. It's an easy guarantee to provide.
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Matthieu Monrocq



On Fri, Jun 7, 2013 at 7:05 AM, Daniel Micay <[hidden email]> wrote:
On Fri, Jun 7, 2013 at 12:58 AM, Sebastian Sylvan
<[hidden email]> wrote:
> The linked article contrasts them with the GoF-style iterators as well.
>
> The Rust Iterator trait is similar to the "one pass ranges" (and possibly
> forward ranges), but not double-ended ranges or random-access ranges. It's
> the *family* of range-based iterators that makes it flexible (e.g. allowing
> you to write an efficient in-place reverse without knowing the underlying
> data structure, using a double-ended range).
>
> See fig. 3:
> http://www.informit.com/content/images/art_alexandrescu3_iterators/elementLinks/alexandrescu3_fig03.jpg

The extent to which you can have mutable iterators in Rust is pretty
small, because of the memory safety requirement. Iterators can't open
up a hole allowing multiple mutable references to the same object to
be obtained, so I don't think mutable bidirectional or random access
iterators are possible.

Forward iterators can't ever give you an alias because they're a
single pass over the container. It's an easy guarantee to provide.

Is it ?

In this case it would mean that you can only have one Forward Iterator in scope (for a given container) at once too (ie, the forward iterator borrows the container); otherwise you could have two distinct iterators pointing to the same underlying element.

I certainly appreciate the ongoing debate anyway, it's great to see things being exposed to light and openly discussed. I would like to contribute with one point: partitioning.

Sometimes, you would like to partition a container, or point to one of its elements.

For example, in C++, you have an overload of insert which takes an iterator allowing you to point to the routine where the element you ask to insert is likely to go and thus shaving off a couple comparisons (if you are right). This requires pointing to a single element, to be contrasted with a range.

Another example would be partitioning, a partition of a slice can be represented with two points: the two end-points of the slice and the point of partition.

Both of those examples can be represented by ranges (or in C++ iterator) though they do not themselves imply any iteration. My point, thus, is that there might be a need for "fingers inside a container" that go beyond basic iteration.

-- Matthieu
 
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev


_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Fri, Jun 7, 2013 at 12:10 PM, Matthieu Monrocq
<[hidden email]> >>

>> The extent to which you can have mutable iterators in Rust is pretty
>> small, because of the memory safety requirement. Iterators can't open
>> up a hole allowing multiple mutable references to the same object to
>> be obtained, so I don't think mutable bidirectional or random access
>> iterators are possible.
>>
>> Forward iterators can't ever give you an alias because they're a
>> single pass over the container. It's an easy guarantee to provide.
>
>
> Is it ?
>
> In this case it would mean that you can only have one Forward Iterator in
> scope (for a given container) at once too (ie, the forward iterator borrows
> the container); otherwise you could have two distinct iterators pointing to
> the same underlying element.

Yeah, the compiler will only allow you to obtain one mutable iterator
at a time. You can obtain any number of immutable ones.

> I certainly appreciate the ongoing debate anyway, it's great to see things
> being exposed to light and openly discussed. I would like to contribute with
> one point: partitioning.
>
> Sometimes, you would like to partition a container, or point to one of its
> elements.
>
> For example, in C++, you have an overload of insert which takes an iterator
> allowing you to point to the routine where the element you ask to insert is
> likely to go and thus shaving off a couple comparisons (if you are right).
> This requires pointing to a single element, to be contrasted with a range.
>
> Another example would be partitioning, a partition of a slice can be
> represented with two points: the two end-points of the slice and the point
> of partition.
>
> Both of those examples can be represented by ranges (or in C++ iterator)
> though they do not themselves imply any iteration. My point, thus, is that
> there might be a need for "fingers inside a container" that go beyond basic
> iteration.
>
> -- Matthieu

To some extent we have a special case for this with string/vector
slices but it's not yet extended to containers in general. The vector
module could return two *disjoint* `&mut [T]` slices because the
caller is unable to extend them.
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Yasuhiro Fujii
In reply to this post by Daniel Micay
Hi,
I'm excited about Rust's iterator library based on external style.
On the other hand, I was thinking about extending current "for" syntax/protocol
to support returning values, which is useful for filter_map() like
functions [*0]. (It might have been discussed before, but I cannot
find it in rust-dev archives)

Example:

    fn test() {
        let src = ~[0, 1, 2, 3, 4];
        let mut sum = 0;
        let dst = for src.filter_map() |&x| {
            sum += x;
            if sum == 10 {
                break x + 1; // => (false, Some(x + 1))
            }
            else if sum > 10 {
                break        // => (false, None)
            }

            if x % 2 == 0 {
                loop         // => (true, None)
            }
            else {
                loop x + 1;  // => (true, Some(x + 1))
            }
        }
    }
    // In extended "for", loop/break statements in the closures are
    // transformed with following rules:
    //   loop      => return (true , None)
    //   loop val  => return (true , Some(val))
    //   break     => return (false, None)
    //   break val => return (false, Some(val))

vec::filter_map() should be slightly modified from original [*1]:

    pure fn filter_map<T, U: Copy>(v: &[T], f: fn(t: &T) -> (bool,
Option<U>)) -> ~[U] {
        let mut result = ~[];
        for each(v) |elem| {
            let (cont, val) = f(elem);
            match val {
                None => { /* no-op */ }
                Some(result_elem) => unsafe { result.push(result_elem); }
            }
            cont || break;
        }
        result
    }

With this extension, closure-calling "for" syntactic sugar is still useful even
if Rust's iterators become external style.  In Daniel Micay's example:

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
    }

This can be rewritten as follows (with appropriate FilterMapIterator<>
implementation):

    fn main() {
        let mut it = for Counter::new(0.0, 1.0).filter_map() |&x| {
            if (x < 10000000.0) {
                x / 2.0 + 2.0
            }
            else {
                break;
            }
        }
    }

Any combination of transform/filter/skip_whike/take_while/skip/take/scan can be
converted.  I think this is more readable and expressive.

Many languages have special syntax for manipulating iterators (Python's
generator comprehension, C#'s query expression, Haskell's List monads, etc.  I
know they are based on very different approaches and different range of
application, but for similar purposes).  This extension may be seen as one of
them.


[*0]: Sorry for out of the topic in this thread, but I think it is also useful
(and better consistency) to enable "while" statement to return value like
following:

    let xs = ~[0, 1, 2, 4, 5];
    let found_two =
        while i < xs.size() {
            if xs[i] == 2 {
                break true
            }
        }
        else {
            // "else" clause is executed when "while" statement is terminated
            // NOT by a "break" statement. (Python behaves like the same)
            false
        }

[*1]: Original implementation vec::filter_map():

    pure fn filter_map<T, U: Copy>(v: &[T], f: fn(t: &T) -> Option<U>) -> ~[U] {
        let mut result = ~[];
        for each(v) |elem| {
            match f(elem) {
                None => { /* no-op */ }
                Some(result_elem) => unsafe { result.push(result_elem); }
            }
        }
        result


On Thu, Jun 6, 2013 at 1:09 PM, Daniel Micay <[hidden email]> wrote:

> A quick terminology refresher, for those who aren't familiar with it:
>
> * Internal iterator: takes a closure, runs the closure until it asks to break
> * External iterator: state machine, advanced by the caller in a loop
>
> To a caller, external iterators provide the most functionality, because they
> can be used as an internal iterator. You lose the state of an internal iterator
> by breaking out of iterator, so generic algorithms like zip, union, intersect
> and merge can't be implemented for a pair of iterators.
>
> # Issues with internal iterators in Rust
>
> A few months ago, we only had internal iterators and there were no generic
> algorithms to use with any iterator - only with BaseIter's `each` method.
>
> Rust's internal iterators implement the protocol encoded in Rust's for
> statement, but it's not possible to give them all a common trait or implement
> generic methods or functions taking any internal iterator.
>
> As a workaround, we can write algorithms assuming internal iterators only take
> one argument (the closure):
>
>     fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint
>
> The caller has to use a partial function to call these adaptors, for which we
> lack sugar:
>
>     count(|f| uint::range(0, 10, f))
>
> For simple functions, this is fairly reasonable once you're used to it. It
> quickly gets out of control though, even for a simple function like filter:
>
>     filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, output: &fn(T) -> bool) -> bool {}
>
> Sadly, `filter_ref` is also needed to work around closures behaving badly with
> lifetimes. An example of the problem with `fold_ref`:
>
>     fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> T {
>         fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
>     }
>
> Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
> region), it won't work with `fold` because that requires the borrowed pointer
> to have the same type (and thus lifetime) for every iteration like `&'a int`.
>
> This issue with borrowed pointers was blocking me from replacing the existing
> algorithms reimplemented for both `str` and `vec` with the generic ones.
>
> Chaining together iteration algorithms is a common use case, but even chaining
> two together is confusing at first:
>
>     to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)
>
> Another more alarming issue is that with internal iterators, the `break` and
> `return` statements don't always work in a `for` loop. If the iterator isn't
> implemented correctly, the loop will keep going.
>
> This also has borrow checking implications, because the compiler can't assume
> those statements actually cause flow control to leave the loop immediately.
>
> # External iterators
>
> Based on the above, you might think generic iteration algorithms in Rust are a
> bleak prospect. However, we already have a nice external iterator library, and
> they don't suffer from the above issues.
>
> All kinds of external iterators implement the following trait, whether they are
> a fibonacci number generator, a reverse iterator over a vector or iterator over
> a range in a sorted set:
>
>     pub trait Iterator<A> {
>         /// Advance the iterator and return the next value. Return `None` when the end is reached.
>         fn next(&mut self) -> Option<A>;
>     }
>
> Generic adaptors are implemented on `Iterator`, and many of them are `Iterator`
> implementations themselves:
>
>     use std::iterator::*;
>
>     fn main() {
>         let mut it = Counter::new(0.0, 1.0)
>             .take_while(|x| *x < 10000000.0)
>             .transform(|x| x / 2.0)
>             .transform(|x| x + 2.0);
>         println(it.fold(0.0, |a, b| a + b).to_str())
>     }
>
> If you're curious, the optimized LLVM IR: http://ix.io/5Xl
>
> Unlike internal iterators, external iterators only run one iteration at a time,
> so a `for` loop designed for them would always be able to succeed with `break`
> and `return`. It would also be able to avoid the ugly `advance` wrapper
> currently required to use external iterators with `for`.
>
>     // The current situation, wrapping an external iterator as an internal one
>     //
>     // Since the advance method is not known the be correct, borrow checking
>     // still assumes `return` and `break` are imperfect.
>     for xs.zip(ys).advance |x| { ... }
>
>     // A hypothetical `for` loop using the `Iterator` trait
>     for iterator |x| { ... }
>
>     // It could also fall back to an `Iterable` trait and obtain an iterator
>     for container |x| { ... }
>
> External iterators also avoid the problems with references and closures,
> because they simply return `T` rather than passing it to a closure.
>
> # Why not just switch to external iterators?
>
> Algorithms that can be represented easily without the call stack are as easy to
> write as either an internal or external iterator.
>
> However, without support for compiling a function to a state machine (C# does
> this for yield), traversal of a recursive data structure has to be manually
> translated to a procedural algorithm with an explicit stack. For complex data
> structures, this process can be very difficult.
>
> I'm hopeful Rust will gain support for this down the road after 1.0. If it
> does, there will be no reason to write immutable internal iterators.
>
> Another issue is mutability, as you can write iterators that are able to mutate
> containers. With internal iterators, this is easy to do with safe code. With
> external ones, it will `unsafe` and won't be easy to get right.
>
> # Solution?
>
> I don't have any proposal to completely solve this issue. :)
>
> I think extending the built-in `for` loop to work with external iterators
> should be considered, because right now the verbosity discourages using them
> and makes borrow checking more painful than it has to be.
>
> It could treat functions as internal iterators, and look for an `Iterator`
> implementation (using a `lang` item) for external ones.
>
> Python's `for` loop starts by looking for an iterator (a `__next__` method) and
> falls back to an iterable (an `__iter__` method) so behaviour like this isn't
> an alien concept.
>
> _______________________________________________
> Rust-dev mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
In reply to this post by Vadim Chugunov
On Thu, Jun 6, 2013 at 8:01 PM, Vadim <[hidden email]> wrote:
>
> The iterator object itself can contain a fixed size buffer into which it
> copies objects before returning slice to the caller.  This would work for
> almost any container type.
>
> Perhaps I'm confused about the amount of inlining that Rust can perform.  Do
> you expect that iterator.next() calls will be inlined across crate
> boundaries?

It can inline between crates as long as the functions/methods are
marked with `#[inline]`. This causes the compiler to output the AST
into the crate and add an inline hint.

Statically dispatched function calls are fast though, just not when
they're going to be in an inner loop like these ones.
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Niko Matsakis
In reply to this post by Daniel Micay
On Thu, Jun 06, 2013 at 12:09:29AM -0400, Daniel Micay wrote:
> A quick terminology refresher, for those who aren't familiar with it:
> Another issue is mutability, as you can write iterators that are able to mutate
> containers. With internal iterators, this is easy to do with safe code. With
> external ones, it will `unsafe` and won't be easy to get right.

Something about this assertion didn't sit right with me. It took me a
little bit, but I realized it's actually faily easy to do a mutable
external iterator. Here is an example for a binary tree, visiting in pre-order.

https://gist.github.com/nikomatsakis/5736715

Perhaps there are other cases that are harder? I'd be curious to know
how the efficiency of this compares with the internal iterator case;
clearly, it requires allocation where an internal iterator does not.


Niko
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Sat, Jun 8, 2013 at 5:45 PM, Niko Matsakis <[hidden email]> wrote:

> On Thu, Jun 06, 2013 at 12:09:29AM -0400, Daniel Micay wrote:
>> A quick terminology refresher, for those who aren't familiar with it:
>> Another issue is mutability, as you can write iterators that are able to mutate
>> containers. With internal iterators, this is easy to do with safe code. With
>> external ones, it will `unsafe` and won't be easy to get right.
>
> Something about this assertion didn't sit right with me. It took me a
> little bit, but I realized it's actually faily easy to do a mutable
> external iterator. Here is an example for a binary tree, visiting in pre-order.
>
> https://gist.github.com/nikomatsakis/5736715

Huh, I'm surprised by how simple it is. When I tried to do it before,
it was with the old borrow checker so I guess I could have just been
fighting with the old bugs/limitations.

> Perhaps there are other cases that are harder? I'd be curious to know
> how the efficiency of this compares with the internal iterator case;
> clearly, it requires allocation where an internal iterator does not.

Efficiency-wise, I don't think it should be a big deal because for up
to 2^32 elements it will only need a stack depth of ~32. It could be
reserving the memory in advance based on the size of the tree. The
recursive version will have stack overflow checks for each level of
recursion so that's not much different than the vector capacity
checks.
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Daniel Micay
On Sat, Jun 8, 2013 at 5:56 PM, Daniel Micay <[hidden email]> wrote:
>
> Efficiency-wise, I don't think it should be a big deal because for up
> to 2^32 elements it will only need a stack depth of ~32. It could be
> reserving the memory in advance based on the size of the tree. The
> recursive version will have stack overflow checks for each level of
> recursion so that's not much different than the vector capacity
> checks.

(in a self-balancing binary search tree, not an unbalanced one)
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Niko Matsakis
In reply to this post by Daniel Micay
On Sat, Jun 08, 2013 at 05:56:34PM -0400, Daniel Micay wrote:
> Huh, I'm surprised by how simple it is. When I tried to do it before,
> it was with the old borrow checker so I guess I could have just been
> fighting with the old bugs/limitations.

The end result looks quite nice, but I confess it took me two or three
iterations to find an approach that would work. The key is to only
hold on to the state you will need in the future---this is why the
`next` method pops off the current node and then pushes its children.
Because the current node is no longer on the stack, we can then return
an `&mut` pointer to its value.

For fun, I extended the approach to a post-order iterator as well:

https://gist.github.com/nikomatsakis/5737243

This is somewhat trickier, but I think mostly because post-order is
generally trickier than pre-order for an external iterator, since you
visit the value on the way *up* and not on the way *down*.

Anyway, I think the idea of "only keep those parts you have yet to
visit" is clearer in the post order case: the post order state starts
with options for left and right that are Some (presuming the node has
children) and those options are None'd out as we proceed.

> Efficiency-wise, I don't think it should be a big deal because for up
> to 2^32 elements it will only need a stack depth of ~32. It could be
> reserving the memory in advance based on the size of the tree. The
> recursive version will have stack overflow checks for each level of
> recursion so that's not much different than the vector capacity
> checks.

This makes sense. Also, there are advantages to using the heap over
the stack. For example, the pre-order iterator only keeps 1 word per
depth of the tree, whereas a stack frame is significantly larger.


Niko
_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
Reply | Threaded
Open this post in threaded view
|

Re: The future of iterators in Rust

Noam Yorav-Raphael
In reply to this post by Daniel Micay
Hello,

Would you mind going into detail about when you need iterators that mutate containers? I always think of iterators as having no side effects.

I ask that because if somehow you can live with only external iterators, having one type of iterators instead of two seems to me like a great simplification of the language.

Thanks,
Noam


On 6 June 2013 07:09, Daniel Micay <[hidden email]> wrote:
A quick terminology refresher, for those who aren't familiar with it:

* Internal iterator: takes a closure, runs the closure until it asks to break
* External iterator: state machine, advanced by the caller in a loop

To a caller, external iterators provide the most functionality, because they
can be used as an internal iterator. You lose the state of an internal iterator
by breaking out of iterator, so generic algorithms like zip, union, intersect
and merge can't be implemented for a pair of iterators.

# Issues with internal iterators in Rust

A few months ago, we only had internal iterators and there were no generic
algorithms to use with any iterator - only with BaseIter's `each` method.

Rust's internal iterators implement the protocol encoded in Rust's for
statement, but it's not possible to give them all a common trait or implement
generic methods or functions taking any internal iterator.

As a workaround, we can write algorithms assuming internal iterators only take
one argument (the closure):

    fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint

The caller has to use a partial function to call these adaptors, for which we
lack sugar:

    count(|f| uint::range(0, 10, f))

For simple functions, this is fairly reasonable once you're used to it. It
quickly gets out of control though, even for a simple function like filter:

    filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, output: &fn(T) -> bool) -> bool {}

Sadly, `filter_ref` is also needed to work around closures behaving badly with
lifetimes. An example of the problem with `fold_ref`:

    fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> T {
        fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
    }

Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
region), it won't work with `fold` because that requires the borrowed pointer
to have the same type (and thus lifetime) for every iteration like `&'a int`.

This issue with borrowed pointers was blocking me from replacing the existing
algorithms reimplemented for both `str` and `vec` with the generic ones.

Chaining together iteration algorithms is a common use case, but even chaining
two together is confusing at first:

    to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)

Another more alarming issue is that with internal iterators, the `break` and
`return` statements don't always work in a `for` loop. If the iterator isn't
implemented correctly, the loop will keep going.

This also has borrow checking implications, because the compiler can't assume
those statements actually cause flow control to leave the loop immediately.

# External iterators

Based on the above, you might think generic iteration algorithms in Rust are a
bleak prospect. However, we already have a nice external iterator library, and
they don't suffer from the above issues.

All kinds of external iterators implement the following trait, whether they are
a fibonacci number generator, a reverse iterator over a vector or iterator over
a range in a sorted set:

    pub trait Iterator<A> {
        /// Advance the iterator and return the next value. Return `None` when the end is reached.
        fn next(&mut self) -> Option<A>;
    }

Generic adaptors are implemented on `Iterator`, and many of them are `Iterator`
implementations themselves:

    use std::iterator::*;

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
        println(it.fold(0.0, |a, b| a + b).to_str())
    }

If you're curious, the optimized LLVM IR: http://ix.io/5Xl

Unlike internal iterators, external iterators only run one iteration at a time,
so a `for` loop designed for them would always be able to succeed with `break`
and `return`. It would also be able to avoid the ugly `advance` wrapper
currently required to use external iterators with `for`.

    // The current situation, wrapping an external iterator as an internal one
    //
    // Since the advance method is not known the be correct, borrow checking
    // still assumes `return` and `break` are imperfect.
    for xs.zip(ys).advance |x| { ... }

    // A hypothetical `for` loop using the `Iterator` trait
    for iterator |x| { ... }

    // It could also fall back to an `Iterable` trait and obtain an iterator
    for container |x| { ... }

External iterators also avoid the problems with references and closures,
because they simply return `T` rather than passing it to a closure.

# Why not just switch to external iterators?

Algorithms that can be represented easily without the call stack are as easy to
write as either an internal or external iterator.

However, without support for compiling a function to a state machine (C# does
this for yield), traversal of a recursive data structure has to be manually
translated to a procedural algorithm with an explicit stack. For complex data
structures, this process can be very difficult.

I'm hopeful Rust will gain support for this down the road after 1.0. If it
does, there will be no reason to write immutable internal iterators.

Another issue is mutability, as you can write iterators that are able to mutate
containers. With internal iterators, this is easy to do with safe code. With
external ones, it will `unsafe` and won't be easy to get right.

# Solution?

I don't have any proposal to completely solve this issue. :)

I think extending the built-in `for` loop to work with external iterators
should be considered, because right now the verbosity discourages using them
and makes borrow checking more painful than it has to be.

It could treat functions as internal iterators, and look for an `Iterator`
implementation (using a `lang` item) for external ones.

Python's `for` loop starts by looking for an iterator (a `__next__` method) and
falls back to an iterable (an `__iter__` method) so behaviour like this isn't
an alien concept.

_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev



_______________________________________________
Rust-dev mailing list
[hidden email]
https://mail.mozilla.org/listinfo/rust-dev
12