从零开始的 Rust 学习笔记(5)

modules
use, as, nested path & glob
The way of organising sub-modules
Vec & HashMap
String with UTF-8
Unrecoverable errors
Recoverable errors
Propagate error to caller

1. modules

可以使用 cargo 来新建一个 module～只需要执行

cargo new --lib modulename

就可以创建一个名为 modulename 的模块，cargo 创建的目录结构如下～

.
 ├── Cargo.toml
 └── src
     └── lib.rs

那么 src/lib.rs 就是我们这个模块的主入口啦（因为可以把不同的功能放进不同的文件里，这个 src/lib.rs 可以算是 Python 中的 __init__.py，当然还是有些区别）

那么如何去真正的写一个模块呢？模版大概如下～

mod 模块名 {
    pub mod 公开子模块 {
        pub fn 公开的方法() {
            // ...
        }
        
        fn 私有方法() {
            // ...
        }
    }

    mod 私有子模块 {
        // ...
    }
}

可以看到 mod 中是可以嵌套 mod 的，就像是 C++ 中 namespace 里还可以继续嵌套 namespace 一样

然后最外层的默认就是 pub 的，在内层的需要增加 pub 修饰才可以(｡･ω･｡)没有 pub 修饰的就是在内部私有的模块了，也就意味在外面是无法使用它的

同样的，在模块中要公开的方法 / 函数 / Enum / 结构体也是需要使用 pub 修饰才可以～

不过在 src/lib.rs 中直接定义函数的话，默认则是私有的，需要增加 pub 修饰才可以～

mod 模块名 {
    pub mod 公开子模块 {
        pub fn 公开的方法() {
            // ...
        }
        
        fn 私有方法() {
            // ...
        }
    }

    mod 私有子模块 {
        // ...
    }
}

fn private_fn() {
    // ...
}

pub fn public_fn() {
    // ...
}

在 src/lib.rs 中使用刚才定义好的模块的话，有两种方式，一种是使用Absolute path，另一种则是Relative path。比如一个实际的例子

mod hot_pot {
    pub struct Order {
        pub spicy: u32,
        kind: String
    }
    
    impl Order {
        pub fn serve(spicy: u32) -> Order {
            Order {
                spicy,
                kind: String::from("Butter")
            }
        }
    }
}

pub fn eat() {
    // Absolute path
    let hotpot = crate::hot_pot::Order::serve(100);
    
    // Relative path
    let hotpot = hot_pot::Order::serve(100);
}

另外一点就是需要注意私有模块 / 函数 / 结构体等等，访问私有的部分会报错（当然也不能一把梭，全部用 pub 修饰，真就只能具体情况具体考虑）

下面的代码展示了，在内部是可以访问其自身以及外层的私有模块 / 函数 / 结构体等等，但是反之则会报错( ；´Д｀)

mod hot_pot {
    pub struct Order {
        pub spicy: u32,
        kind: String
    }
    
    impl Order {
        pub fn serve(spicy: u32) -> Order {
            // ok to call private fn
            Order::private_order_detail();
            
            let mut order = Order {
                spicy,
                kind: String::from("TBD")
            };
            // ok to access private member
            order.kind = String::from("Butter");
            order
        }
        
        fn private_order_detail() {
            // ok to call private fn
            private_hotpot_fn();
        }
    }
    
    fn private_hotpot_fn() {
        // ...
    }
}

pub fn eat() {
    // Error: no access to private fn
    crate::hot_pot::private_hotpot_fn();
    
    // Error: no access to private fn
    hot_pot::Order::private_order_detail();
    
    // Error: no access to private member
    let hotpot = hot_pot::Order::serve(100);
    hotpot.kind = String::from("Tomato");
}

那么要在一个模块里访问其 parent 里、或者 parent 的 parent 里的函数、或者在外层的函数的话，可以使用 super::～

但是如果层级太深的话，就可以用 use 在该处引入想要使用的

fn outter_most_fn() {
    // ...
}

mod module_name {
    fn some_fn_in_mod() {
        // add `super::`
        super::outter_most_fn();
    }
    
    mod inner_level_1 {
        fn some_fn_in_l1() {
            // add another `super::`
            super::super::outter_most_fn();
        }
        
        mod inner_level_2 {
            fn some_fn_in_l2() {
                // and you may add another `super::` again
                // but it is way too verbose! Σ（·□·；）
                super::super::super::outter_most_fn();
            }
            
            fn concise() {
                use crate::outter_most_fn;
                outter_most_fn();
            }
        }
    }
}

2. use, as, nested path & glob

那么既然说到了 use，那么与之相关的，就先从 as 开始～

这一点其实跟 Python 几乎一样，就是给 use / import 的包重新命名

use std::io::Result as IoResult;

对比 Python 的话，就是

import numpy as np

接下来则是 nested path，这个看到代码也很好理解，就是比如，从同一个包里要 use 很多不同的东西的时候，可以把它们的公共部分写出来，然后用 {} 把不同的列举出来

use std::{cmp::Ordering, io};

这样就等价于

use std::cmp::Ordering;
use std::cmp::io;

在 Rust 中还有一个稍微独特一点的则是，当我想 use 如下这样的包的时候

use std::io;
use std::io::Write;

提取出公共部分的话，前一个不就没有剩的了嘛，于是 Rust 提供了一个 self 来做这件事，也就是可以写成：

use std::io::{self, Write};

那这里最后一点则是 glob，通配符。这个也跟 Python 的特别像，作用也是一样的

use std::collections::*;

对比 Python 中使用 * 的时候～

from numpy import *

3. The way of organising sub-modules

要比较规范的组织子模块的话，可以像 Python 一样～我们把这个模块独立到一个文件里，例如

.
 ├── Cargo.lock
 ├── Cargo.toml
 └── src
     ├── lib.rs
     └── submodule_file_name.rs

在这个 submodule_file_name.rs 里，它也未必真的只有一个 mod，比如如下的 submodule_file_name.rs 就包含了 2 个，一个是 submodule_in_file，另一个是 another_submodule_in_file

pub mod submodule_in_file {
    pub fn from_sub() {
        println!("from submodule!");
    }
}

pub mod another_submodule_in_file {
    pub fn from_another_sub() {
        println!("from another submodule!");
    }
}

那么怎么在 lib.rs 中使用呢，只需要声明一下文件名即可～

mod submodule_file_name;

pub fn in_lib() {
    submodule_file_name::submodule_in_file::from_sub();
    submodule_file_name::another_submodule_in_file::from_another_sub();
}

但是这么写起来的话，就过于啰嗦了，所以也可以像刚才那样，在 lib.rs 中用 use 语句

mod submodule_file_name;
use submodule_file_name::{submodule_in_file, another_submodule_in_file};

pub fn in_lib() {
    submodule_in_file::from_sub();
    another_submodule_in_file::from_another_sub();
}

那如果想要在 submodule_file_name::submodule_in_file 下，再定义一个名为 submodule_of_submodule 的子模块呢？

Rust 中则是在 submodule_file_name::submodule_in_file 下先声明这个名为 submodule_of_submodule 的子模块

pub mod submodule_in_file {
    // declare the `submodule_of_submodule`
    pub mod submodule_of_submodule;

    pub fn from_sub() {
        println!("from submodule!");
    }
}

接下来则是以目录的形式去组织～

.
 ├── Cargo.lock
 ├── Cargo.toml
 └── src
     ├── lib.rs
     ├── submodule_file_name
     │   └── submodule_in_file
     │       └── submodule_of_submodule.rs
     └── submodule_file_name.rs

这里 src/submodule_file_name/submodule_in_file 目录下的 submodule_of_submodule.rs 就是代表了在 submodule_file_name::submodule_in_file 下的submodule_of_submodule 模块

假如在 submodule_of_submodule.rs 中有如下代码

pub mod sub_sub_module_1 {
    pub fn sub_sub_fn() {
        println!("from sub sub 1");
    }
}

pub mod sub_sub_module_2 {
    pub fn sub_sub_fn() {
        println!("from sub sub 2");
    }
}

那么在 lib.rs 中的使用则为

mod submodule_file_name;
use submodule_file_name::submodule_in_file::submodule_of_submodule::{sub_sub_module_1, sub_sub_module_2};

pub fn in_lib() {
    sub_sub_module_1::sub_sub_fn();
    sub_sub_module_2::sub_sub_fn();
}

4. Vec & HashMap

Vec 和 HashMap 的话，算是超常用的两个了吧

用法上也跟大多数语言一样，不过与 Python 的 dict 比起来的话，Rust 中的 HashMap 更像是 C++ 中的 std::map，也就是说 Key 和 Value 的类型是在一开始就确定下来的

要使用 Vec 的话，是需要

use std::vec::Vec;

使用 HashMap 则是

use std::collections::HashMap;

先来说说 Vec 吧，选一些常用的部分的话，这些基本上来说与 C++ 里的 std::vector 差不太多

use std::vec::Vec;

fn main() {
    let mut v: Vec<u32> = [1, 2, 3, 4, 5].to_vec();
    
    // len()
    println!("v.len(): {}, v.capacity(): {}", v.len(), v.capacity());
    
    // push_back
    v.push(6);
    
    // len()
    println!("v.len(): {}, v.capacity(): {}", v.len(), v.capacity());
    
    // pop()
    // try to get the last element
    // and if there exists one
    // it will also remove from the vector
    if let Some(last) = v.pop() {
        println!("last: {}", last);
        // len()
        println!("v.len(): {}, v.capacity(): {}", v.len(), v.capacity());
    }
    
    // last()
    // try to get the last element
    // but won't remove any element
    if let Some(last) = v.last() {
        println!("last: {}", last);
        // len()
        println!("v.len(): {}, v.capacity(): {}", v.len(), v.capacity());
    }
    
    // mutable iter
    for i in &mut v {
        *i += 50;
    }
    
    // immutable iter
    for i in &v {
        println!("{}", i);
    }
    
    // clear()
    v.clear();
    if let Some(last) = v.pop() {
        println!("last: {}", last);
    } else {
        println!("already empty");
    }
}

还有不少内建的方法，具体可以阅读 Rust 的官方文档，https://doc.rust-lang.org/std/vec/struct.Vec.html

接下来则是 HashMap，这个则是跟 std::map 差不多的～

use std::collections::HashMap;

fn main() {
    // there's no need to explict write the type
    // the Rust compiler can infer these
    let mut scores = HashMap::new();

    scores.insert(String::from("Beef"), 80);
    scores.insert(String::from("Duck"), 70);
    println!("{:?}", scores);
}

不过比起 C++ 的话，Rust 编译器可以自动推断类型，不需要在声明时就显式的指定 Key、Value 的类型

但是也有例外的情况，比如用在结构体上的话，如从零开始的 Rust 学习笔记(3)——Yet Another Way to Kill Your Brain 里的 BrainfuckVMStatus，在声明的时候就需要指明 Key, Value 的类型都是什么

use std::collections::HashMap;

#[derive(Debug)]
struct BrainfuckVMStatus {
    tape: HashMap<i32, i32>
}

impl BrainfuckVMStatus {
    fn new() -> BrainfuckVMStatus {
        BrainfuckVMStatus {
            tape: HashMap::new()
        }
    }
}

fn main() {
    let status = BrainfuckVMStatus::new();
    println!("{:?}", status);
}

如果我们在上面没有写明 Key, Value 的类型的话，Rust 编译器就会报错

（同样的，如果 Vec 用在 struct 中的话，也需要指明其元素的类型）

接下来则是 ownership 了～不管是 Vec<T> 还是 HashMap<K, V>，对于实现了 Copy trait 的类型，都会被复制进去；如果没有实现 Copy trait，则其 ownership 会被转移给相应的 Vec<T> 或者 HashMap<K, V>

HashMap 要插入值的话，则是

use std::collections::HashMap;

fn main() {
    let mut score: HashMap<String, i32> = HashMap::new();
    
    score.insert("Beef".to_string(), 90);
    score.insert("Cocoa".to_string(), 100);
    
    let key = "Apple".to_string();
    let value = 95;
    score.insert(key, value);
    
    println!("{:?}", score);
}

然后则是最常用的取值、更新、遍历 Key, Value，基本上与 Python 一样的感觉，除了不能直接以 subscript 的方式去 insert / update 以外

use std::collections::HashMap;

fn main() {
    let mut score: HashMap<String, i32> = HashMap::new();
    
    // first storing
    score.insert("Cocoa".to_string(), 100);
    
    // get
    if let Some(value) = score.get(&"Cocoa".to_string()) {
        println!("value: {}", value);
    }
    
    // update
    score.insert("Cocoa".to_string(), 1000);
    
    // iter
    for (key, value) in &score {
        println!("iter - {}: {}", key, value);
    }
    
    println!("{:?}", score);
}

其余比较有趣的内建方法的话，比如防止覆盖已有的 Key 的话，就可以先用 entry 拿到一个 Entry，然后接一个 or_insert，这样就会在没有该 Key 时才会写入，在已有该 Key 存在时，or_insert 就不会执行写入

use std::collections::HashMap;

fn main() {
    let mut score: HashMap<String, i32> = HashMap::new();
    
    // first storing
    score.insert("Cocoa".to_string(), 100);
    score.insert("Apple".to_string(), 80);
    
    // "Cocoa" exists
    // won't be overwritten
    score.entry("Cocoa".to_string()).or_insert(1000);
    
    println!("{:?}", score);
}

另一个可能比较常用到的，比如在做直方图之统计类时，数据是一个一个过来，而且也无法提前预知有哪些 Key，然后在收到每个数据之后更新计数

use std::collections::HashMap;

fn main() {
    let mut char_count: HashMap<char, i32> = HashMap::new();
    let sentence = "Hello World!";
    
    for c in sentence.chars() {
        let count = char_count.entry(c).or_insert(0);
        *count += 1;
    }
    
    println!("{:#?}", char_count);
}

5. String with UTF-8

这里就只说在 Rust 中需要特别注意的地方～

首先 String 都是使用 UTF-8 编码，但是在取下标的时候也只按照 UTF-8 进行，2 个字节为一个 char，或者要么就是按照 byte 来取。可是就算是 UTF-8，有些文字的编码也不一定是 2 个字节，可能是多个字节组合在一起的_(:3」∠)_

同时，获取 .len() 的时候是计算 bytes 数，.chars() 以 2 个字节为一个 char 计算，不按照 Grapheme Clusters 来

Grapheme Clusters 就是刚才说的，有些文字可能是多个字节组合在一起的，比如 नमस्ते，这 4 个我也忘了是什么语言的文字，它们实际上是由 ['न', 'म', 'स', '्', 'त', 'े'] 这 6 个组合起来的

fn main() {
    for c in "नमस्ते".chars() {
        println!("{}", c);
    }
}

要想按照人类实际使用的文字来遍历的话，则只能依靠第三方 crate 了╮(╯▽╰)╭

6. Unrecoverable errors

这个好像没什么好说的诶，大概就是“实在解决不了的错误，那就 panic 吧”23333

在遇到程序 panic crash 的时候，可以通过设置环境变量 RUST_BACKTRACE=1 来让程序在 panic 的时候输出栈回溯

所以这么短就说完了，还有必要写一个 section 嘛？

当时就是为了承上启下玩梗了！

7. Recoverable errors

那么 recoverable 的 error 的话，我们就可以 don't panic 了～

例如之前常见到的 Result<T, E>

enum Result<T, E> {
    Ok(T),
    Err(E),
}

举个例子的话，比如读取硬盘上的文件，这个也是超常用的了，而且也很有可能因为各种原因无法打开文件

use std::fs::File;

fn main() {
    let f = File::open("hello.txt");

    match f {
        Ok(file) => {
            println!("did open file:\n {:?}", file);
        },
        Err(error) => {
            println!("Problem opening the file:\n {:?}", error);
        },
    };
}

可以看到在上面第一次运行的时候，由于文件不存在，于是报错了～在报错信息中，发现结构体中有 kind 的信息（也就是为什么无法 open 文件），于是我们还可以继续去 match 这个 kind

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    let f = File::open("hello.txt");

    match f {
        Ok(file) => {
            println!("did open file:\n {:?}", file);
        },
        Err(error) => match error.kind() {
            // create the file if not exists
            ErrorKind::NotFound => match File::create("hello.txt") {
                Ok(fc) => println!("did create file:\n {:?}", fc),
                Err(e) => println!("Problem creating the file:\n {:?}", e),
            },
            other_error => println!("Problem opening the file:\n {:?}", other_error)
        }
    };
}

不过你会发现我们上面的 match 又多又长，很不简洁，因此可以用 Result<T, E> 的其中一个名为 unwrap_or_else 的方法让代码简洁一些

这个方法会在 Ok(T) 的时候自动把 T 给取出来返回～

但是！因为这个时候是有返回值的，而 Rust 要求所有路径的返回值类型相同，因此就不能使用 println 了，可是文件要么打不开，要不创建不了，要求的类型又是一个文件，所以我们能拿什么返回呢，只有 panic! 了╮(╯▽╰)╭

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    let f = File::open("hello.txt").unwrap_or_else(|error| {
        if error.kind() == ErrorKind::NotFound {
            File::create("hello.txt").unwrap_or_else(|error| {
                panic!("Problem creating the file:\n {:?}", error);
            })
        } else {
            panic!("Problem opening the file:\n {:?}", error);
        }
    });
    println!("file create/open: {:?}", f);
}

如果不想做 error handling 的话，还可以像 https://ryza.moe/2019/09/learning-rust-from-zero-1/ 那样用 expect 来写

use std::fs::File;

fn main() {
    let f = File::open("hello.txt")
               .expect("Failed to open hello.txt");
}

甚至可以直接 unwrap，但是一般只在完全确定返回值是 Ok(T) 的时候使用

use std::fs::File;

fn main() {
    // basically only use this
    // when you're 100% sure that 
    // the result would be Ok(T)
    let f = File::open("hello.txt").unwrap();
}

8. Propagate error to caller

也就是我们自己不 handle error 而往 caller 那边抛回 error

比如在写某些接口的时候，我们希望 caller 能够看到是什么错误，或者就是需要把错误统一处理，又或者别的原因等等

那么要把 error 抛回给 caller 的话，首先要确定自己这边所有的正常的返回值类型应该是相同的，接下来就是需要将自己函数的返回值类型改为 Result<T, E>

use std::io;
use std::io::Read;
use std::fs::File;

fn read_username_from_file() -> Result<String, io::Error> {
    let f = File::open("hello.txt");

    // return error to caller if occurs
    let mut f = match f {
        Ok(file) => file,
        Err(e) => return Err(e),
    };

    let mut s = String::new();

    // return error to caller if occurs
    match f.read_to_string(&mut s) {
        Ok(_) => Ok(s),
        Err(e) => Err(e),
    }
}

但是你大概很快就发现，既然都要把 error 抛回给 caller 的话，其实 match 里面没有做什么事情，代码读起来会很啰嗦（因为 match 里只是在不断的 unwrap OK<T> 和 return Err(E)）

于是 Rust 提供了一个简写方案，只需要在返回 Result<T, E> 的函数之后加上 ? 即可（实际上的话，只要 impl 了 std::ops:Try 的返回值都行）

use std::io;
use std::io::Read;
use std::fs::File;

fn read_username_from_file() -> Result<String, io::Error> {
    let mut f = File::open("hello.txt")?;
    let mut s = String::new();
    f.read_to_string(&mut s)?;
    Ok(s)
}

同样的，这里也可以连在一起写～

use std::io;
use std::io::Read;
use std::fs::File;

fn read_username_from_file() -> Result<String, io::Error> {
    let mut s = String::new();
    File::open("hello.txt")?.read_to_string(&mut s)?;
    Ok(s)
}

Cocoa