预测句子是否是语义病句,语义错误和拼写错误、语法错误
中文语义病句识别挑战赛 一、赛事背景 近年来随着自媒体热潮的掀起,人人都是信息的生产者,互联网上文本错误的内容暴增,如何避免这些文本错误,成为了人们迫切关注的问题。因此,各大有关文本校对的比赛蜂拥而至。然而,过往的文本错误主要针对拼写错误和语法错误,这些错误对于人类来说相对简单,往往是由外国语言学习者和中文母语写作者的疏忽而产生的。对于出版、教育等一些对深层次的中文语义错误识别有需求的行业,中文语义病句的识别将会有更大的帮助。语义病句经常出现在初高中的语文考试题目中,用来衡量学生对语文知识的掌握程度,这类语义病句对于学生来说是比较困难的,对于研究也有重大意义。
二、赛事任务 中文语义病句识别是一个二分类的问题,预测句子是否是语义病句。语义错误和拼写错误、语法错误不同,语义错误更加关注句子语义层面的合法性,语义病句例子如下表所示。
病句
解析
英法联军烧毁并洗劫了北京圆明园。
应该先“洗劫”,再“烧毁”
山上的水宝贵,我们把它留给晚上来的人喝。
歧义,“晚上/来”“晚/上来”
国内彩电市场严重滞销。
“市场”不能“滞销”
三、评审规则 1.数据说明 本次比赛使用的数据一部分来自网络上的中小学病句题库,一部分来自人工标注。每条数据包括句子id、句子标签(0:正确句子/1:病句)、句子,以上三个字段用制表符分隔。数据格式示例如下表所示:
id
标签
句子
1
1
英法联军烧毁并洗劫了北京圆明园。
2
1
山上的水宝贵,我们把它留给晚上来的人喝。
3
0
国内彩电严重滞销。
本次大赛由哈工大讯飞联合实验室提供的数据作为训练样本。训练集中病句和正确句子的比例大致7:3,要求参赛者使用且仅能使用组织方提供的训练集进行训练,不允许使用额外的人工标注的数据进行训练、更不允许将测试集的数据用于训练。此次比赛分为初赛和复赛两个阶段,两个阶段所使用的训练集相同。
2.评估指标 本模型依据提交的结果文件,采用针对语义病句的F1-score进行评价。
实现代码 导入所需包 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import osimport randomfrom functools import partialfrom sklearn.utils.class_weight import compute_class_weightimport numpy as npimport paddleimport paddle as Pimport paddle.nn.functional as Fimport paddlenlp as ppnlpimport pandas as pdfrom paddle.io import Datasetfrom paddlenlp.data import Stack, Tuple , Padfrom paddlenlp.datasets import MapDatasetfrom paddlenlp.transformers import LinearDecayWithWarmupfrom paddlenlp.transformers import ErnieGramModelfrom sklearn.model_selection import StratifiedKFoldfrom tqdm import tqdmimport numpy as npimport paddle.fluid as fluidimport paddle.nn as nnfrom sklearn.metrics import f1_score
初始化所有需要的用到的参数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 class Config : text_col = 'text' target_col = 'label' max_len = 90 batch_size = 32 target_size = 2 seed = 71 n_fold = 5 learning_rate = 5e-5 epochs = 5 warmup_proportion = 0.1 weight_decay = 0.01 model_name = "ernie-1.0" print_freq = 100
使用FGM(Fast Gradient Method)对抗训练过程 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 class FGM (): """针对embedding层梯度上升干扰的对抗训练方法,Fast Gradient Method(FGM)""" def __init__ (self, model ): self.model = model self.backup = {} def attack (self, epsilon=0.15 , emb_name='embeddings' ): for name, param in self.model.named_parameters(): if not param.stop_gradient and emb_name in name: self.backup[name] = param.numpy() grad_tensor = paddle.to_tensor(param.grad) norm = paddle.norm(grad_tensor) if norm != 0 : r_at = epsilon * grad_tensor / norm param.add(r_at) def restore (self, emb_name='embeddings' ): for name, param in self.model.named_parameters(): if not param.stop_gradient and emb_name in name: assert name in self.backup param.set_value(self.backup[name]) self.backup = {}
设置随机种子,保证结果复现 1 2 3 4 5 def seed_torch (seed=42 ): random.seed(seed) os.environ['PYTHONHASHSEED' ] = str (seed) np.random.seed(seed)
1 2 import osos.listdir('/home/aistudio/data/data176811/' )
['data.xlsx', 'test1.csv', '提交示例.csv']
数据的读取 1 2 3 4 5 6 CFG = Config() seed_torch(seed=CFG.seed) train = pd.read_excel('data/data176811/data.xlsx' ) test = pd.read_table('data/data176811/test1.csv' )
id
label
text
0
1
1
通过大力发展社区教育,使我省全民终身学习的教育体系已深入人心。
1
2
1
再次投入巨资的英超劲旅曼城队能否在2010-2011年度的英超联赛中夺得英超冠军,曼联、切尔...
2
3
1
广西居民纸质图书的阅读率偏低,手机阅读将成为了广西居民极倾向的阅读方式。
3
4
1
文字书写时代即将结束,预示着人与字之间最亲密的一种关系已经终结。与此同时,屏幕文化造就了另一...
4
5
1
安徽合力公司2006年叉车销售强劲,销售收入涨幅很有可能将超过40%以上。公司预计2006年...
...
...
...
...
45242
45244
0
进入5月以来,全国新增人感染H7N9禽流感病例呈明显下降趋势。
45243
45245
1
建设中国新一代天气雷达监测网,能够明显改善对热带气旋或台风登陆位置及强度预报的准确性,尤其对...
45244
45246
1
每当回忆起和他朝夕相处的一段生活,他那循循善诱的教导和那和蔼可亲的音容笑貌,又重新出现在我的面前。
45245
45247
1
8月,延安市公开拍卖35辆超编超标公务车。在拍卖过程中,多辆年份较新、行驶里程较少的公务车竞...
45246
45248
1
清华大学联合剑桥大学、麻省理工学院,成立低碳能源大学联盟未来交通研究中心,他们试图寻找解决北...
45247 rows × 3 columns
定义5折交叉验证 1 2 3 4 5 6 folds = train.copy() Fold = StratifiedKFold(n_splits=CFG.n_fold, shuffle=True , random_state=CFG.seed) for n, (train_index, val_index) in enumerate (Fold.split(folds, folds[CFG.target_col])): folds.loc[val_index, 'fold' ] = int (n) folds['fold' ] = folds['fold' ].astype(int )
数据预处理操作 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 class CustomDataset (Dataset ): def __init__ (self, df ): self.data = df.values.tolist() self.texts = df[CFG.text_col] self.labels = df[CFG.target_col] def __len__ (self ): return len (self.texts) def __getitem__ (self, idx ): """ 索引数据 :param idx: :return: """ text = str (self.texts[idx]) label = self.labels[idx] example = {'text' : text, 'label' : label} return example
将data进行Embedding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 def convert_example (example, tokenizer, max_seq_length=512 , is_test=False ): """ 创建Bert输入 :: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence | Returns: input_ids(obj:`list[int]`): The list of token ids. token_type_ids(obj: `list[int]`): List of sequence pair mask. label(obj:`numpy.array`, data type of int64, optional): The input label if not is_test. """ encoded_inputs = tokenizer(text=example["text" ], max_seq_len=max_seq_length) input_ids = encoded_inputs["input_ids" ] token_type_ids = encoded_inputs["token_type_ids" ] if not is_test: label = np.array([example["label" ]], dtype="int64" ) return input_ids, token_type_ids, label else : return input_ids, token_type_ids
创建DataLoader 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 def create_dataloader (dataset, mode='train' , batch_size=1 , batchify_fn=None , trans_fn=None ): if trans_fn: dataset = dataset.map (trans_fn) shuffle = True if mode == 'train' else False if mode == 'train' : batch_sampler = paddle.io.DistributedBatchSampler( dataset, batch_size=batch_size, shuffle=shuffle) else : batch_sampler = paddle.io.BatchSampler( dataset, batch_size=batch_size, shuffle=shuffle) return paddle.io.DataLoader( dataset=dataset, batch_sampler=batch_sampler, collate_fn=batchify_fn, return_list=True )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 if CFG.model_name == 'ernie-1.0' : tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0' ) elif CFG.model_name == 'ernie-doc-base-zh' : tokenizer = ppnlp.transformers.ErnieDocTokenizer.from_pretrained('ernie-doc-base-zh' ) else : tokenizer = ppnlp.transformers.ErnieGramTokenizer.from_pretrained(CFG.model_name) trans_func = partial( convert_example, tokenizer=tokenizer, max_seq_length=CFG.max_len) batchify_fn = lambda samples, fn=Tuple ( Pad(axis=0 , pad_val=tokenizer.pad_token_id), Pad(axis=0 , pad_val=tokenizer.pad_token_type_id), Stack(dtype="int64" ) ): [data for data in fn(samples)]
评估模型 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 @paddle.no_grad() def evaluate (model, criterion, metric, data_loader ): """ 验证函数 """ model.eval () metric.reset() losses = [] preds_list = [] labels_list = [] for batch in data_loader: input_ids, token_type_ids, labels = batch logits = model(input_ids, token_type_ids) preds_list.append(np.argmax(logits.numpy(), axis=1 )) labels_list.append(labels) loss = criterion(logits, labels) losses.append(loss.numpy()) correct = metric.compute(logits, labels) metric.update(correct) accu = metric.accumulate() f1_macro = f1_score(np.concatenate(preds_list, axis=0 ), np.concatenate(labels_list, axis=0 ), average='binary' ) print ("eval loss: %.5f, accu: %.5f" % (np.mean(losses), accu)) model.train() metric.reset() return accu, f1_macro
模型进行预测 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 def predict (model, data, tokenizer, batch_size=1 ): """ 预测函数 """ examples = [] for text in data: input_ids, segment_ids = convert_example( text, tokenizer, max_seq_length=CFG.max_len, is_test=True ) examples.append((input_ids, segment_ids)) batchify_fn = lambda samples, fn=Tuple ( Pad(axis=0 , pad_val=tokenizer.pad_token_id), Pad(axis=0 , pad_val=tokenizer.pad_token_id), ): fn(samples) batches = [] one_batch = [] for example in examples: one_batch.append(example) if len (one_batch) == batch_size: batches.append(one_batch) one_batch = [] if one_batch: batches.append(one_batch) results = [] model.eval () for batch in tqdm(batches): input_ids, segment_ids = batchify_fn(batch) input_ids = paddle.to_tensor(input_ids) segment_ids = paddle.to_tensor(segment_ids) logits = model(input_ids, segment_ids) probs = F.softmax(logits, axis=1 ) results.append(probs.numpy()) return np.vstack(results)
预测数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def inference (): model_paths = [ f'{CFG.model_name} _fold0.bin' , f'{CFG.model_name} _fold1.bin' , f'{CFG.model_name} _fold2.bin' , f'{CFG.model_name} _fold3.bin' , f'{CFG.model_name} _fold4.bin' , ] if CFG.model_name == 'ernie-1.0' : model = ppnlp.transformers.ErnieForSequenceClassification.from_pretrained(CFG.model_name, num_classes=CFG.target_size) elif CFG.model_name == 'ernie-doc-base-zh' : model = ppnlp.transformers.ErnieDocForSequenceClassification.from_pretrained('ernie-doc-base-zh' , num_classes=CFG.target_size) else : model = ppnlp.transformers.ErnieGramForSequenceClassification.from_pretrained(CFG.model_name, num_classes=CFG.target_size) fold_preds = [] for model_path in model_paths: model.load_dict(P.load(model_path)) pred = predict(model, test.to_dict(orient='records' ), tokenizer, 16 ) fold_preds.append(pred) preds = np.mean(fold_preds, axis=0 ) np.save("preds.npy" , preds) labels = np.argmax(preds, axis=1 ) test['label' ] = labels test[['id' , 'label' ]].to_csv('paddle.csv' ,sep='\t' , index=None )
训练函数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 def train (): for fold in range (CFG.n_fold): print (f"===============training fold_nth:{fold + 1 } ======================" ) trn_idx = folds[folds['fold' ] != fold].index val_idx = folds[folds['fold' ] == fold].index train_folds = folds.loc[trn_idx].reset_index(drop=True ) valid_folds = folds.loc[val_idx].reset_index(drop=True ) train_dataset = CustomDataset(train_folds) train_ds = MapDataset(train_dataset) dev_dataset = CustomDataset(valid_folds) dev_ds = MapDataset(dev_dataset) train_data_loader = create_dataloader( train_ds, mode='train' , batch_size=CFG.batch_size, batchify_fn=batchify_fn, trans_fn=trans_func) dev_data_loader = create_dataloader( dev_ds, mode='dev' , batch_size=CFG.batch_size, batchify_fn=batchify_fn, trans_fn=trans_func) if CFG.model_name == 'ernie-1.0' : model = ppnlp.transformers.ErnieForSequenceClassification.from_pretrained(CFG.model_name, num_classes=CFG.target_size) elif CFG.model_name == 'ernie-doc-base-zh' : model = ppnlp.transformers.ErnieDocForSequenceClassification.from_pretrained('ernie-doc-base-zh' , num_classes=CFG.target_size) else : model = ppnlp.transformers.ErnieGramForSequenceClassification.from_pretrained(CFG.model_name, num_classes=CFG.target_size) num_training_steps = len (train_data_loader) * CFG.epochs lr_scheduler = LinearDecayWithWarmup(CFG.learning_rate, num_training_steps, CFG.warmup_proportion) optimizer = paddle.optimizer.AdamW( learning_rate=lr_scheduler, parameters=model.parameters(), weight_decay=CFG.weight_decay, apply_decay_param_fun=lambda x: x in [ p.name for n, p in model.named_parameters() if not any (nd in n for nd in ["bias" , "norm" ]) ]) criterion = paddle.nn.loss.CrossEntropyLoss() metric = paddle.metric.Accuracy() global_step = 0 best_val_acc = 0 best_val_f1 = 0 fgm = FGM(model) for epoch in range (1 , CFG.epochs + 1 ): for step, batch in enumerate (train_data_loader, start=1 ): input_ids, segment_ids, labels = batch logits = model(input_ids, segment_ids) loss = criterion(logits, labels) probs = F.softmax(logits, axis=1 ) correct = metric.compute(probs, labels) metric.update(correct) acc = metric.accumulate() f1_macro = f1_score(np.argmax(probs.numpy(), axis=1 ), labels, average='binary' ) global_step += 1 if global_step % CFG.print_freq == 0 : print ("global step %d, epoch: %d, batch: %d, loss: %.5f, acc: %.5f,f1_macro: %.5f" % ( global_step, epoch, step, loss, acc, f1_macro)) loss.backward() fgm.attack() logits_adv = model(input_ids, segment_ids) loss_adv = criterion(logits_adv, labels) loss_adv.backward() fgm.restore() optimizer.step() lr_scheduler.step() optimizer.clear_grad() acc, f1 = evaluate(model, criterion, metric, dev_data_loader) if acc > best_val_acc: best_val_acc = acc P.save(model.state_dict(), f'{CFG.model_name} _fold{fold} .bin' ) print ('Best Val acc %.5f' % best_val_acc) del model break
===============training fold_nth:1======================
[2022-11-23 19:03:33,381] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W1123 19:03:33.386370 628 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1123 19:03:33.390496 628 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
global step 100, epoch: 1, batch: 100, loss: 0.46791, acc: 0.73188,f1_macro: 0.89655
global step 200, epoch: 1, batch: 200, loss: 0.65628, acc: 0.73625,f1_macro: 0.79245
global step 300, epoch: 1, batch: 300, loss: 0.73637, acc: 0.74031,f1_macro: 0.79245
global step 400, epoch: 1, batch: 400, loss: 0.57431, acc: 0.74234,f1_macro: 0.83636
global step 500, epoch: 1, batch: 500, loss: 0.37923, acc: 0.74138,f1_macro: 0.91525
global step 600, epoch: 1, batch: 600, loss: 0.38467, acc: 0.74344,f1_macro: 0.91228
global step 700, epoch: 1, batch: 700, loss: 0.51419, acc: 0.74562,f1_macro: 0.78261
global step 800, epoch: 1, batch: 800, loss: 0.38103, acc: 0.74750,f1_macro: 0.89286
global step 900, epoch: 1, batch: 900, loss: 0.34946, acc: 0.74899,f1_macro: 0.91304
global step 1000, epoch: 1, batch: 1000, loss: 0.53628, acc: 0.75106,f1_macro: 0.92308
global step 1100, epoch: 1, batch: 1100, loss: 0.42920, acc: 0.75298,f1_macro: 0.91525
eval loss: 0.44171, accu: 0.78530
Best Val acc 0.78530
global step 1200, epoch: 2, batch: 68, loss: 0.22114, acc: 0.83042,f1_macro: 0.91667
global step 1300, epoch: 2, batch: 168, loss: 0.29129, acc: 0.83464,f1_macro: 0.90476
global step 1400, epoch: 2, batch: 268, loss: 0.58879, acc: 0.83605,f1_macro: 0.80000
global step 1500, epoch: 2, batch: 368, loss: 0.42207, acc: 0.83993,f1_macro: 0.85714
global step 1600, epoch: 2, batch: 468, loss: 0.18622, acc: 0.83747,f1_macro: 0.95833
global step 1700, epoch: 2, batch: 568, loss: 0.26564, acc: 0.83814,f1_macro: 0.93617
global step 1800, epoch: 2, batch: 668, loss: 0.38719, acc: 0.84108,f1_macro: 0.86364
global step 1900, epoch: 2, batch: 768, loss: 0.35982, acc: 0.84204,f1_macro: 0.88889
global step 2000, epoch: 2, batch: 868, loss: 0.33469, acc: 0.84465,f1_macro: 0.89474
global step 2100, epoch: 2, batch: 968, loss: 0.52275, acc: 0.84475,f1_macro: 0.81818
global step 2200, epoch: 2, batch: 1068, loss: 0.23787, acc: 0.84574,f1_macro: 0.96552
eval loss: 0.45888, accu: 0.79613
Best Val acc 0.79613
global step 2300, epoch: 3, batch: 36, loss: 0.04629, acc: 0.94878,f1_macro: 0.98246
global step 2400, epoch: 3, batch: 136, loss: 0.19579, acc: 0.94003,f1_macro: 0.97778
global step 2500, epoch: 3, batch: 236, loss: 0.03761, acc: 0.93988,f1_macro: 1.00000
global step 2600, epoch: 3, batch: 336, loss: 0.21055, acc: 0.93899,f1_macro: 0.88235
global step 2700, epoch: 3, batch: 436, loss: 0.07569, acc: 0.94030,f1_macro: 0.97674
global step 2800, epoch: 3, batch: 536, loss: 0.17687, acc: 0.94076,f1_macro: 0.97674
global step 2900, epoch: 3, batch: 636, loss: 0.19501, acc: 0.94099,f1_macro: 0.92000
global step 3000, epoch: 3, batch: 736, loss: 0.25105, acc: 0.94153,f1_macro: 0.93617
global step 3100, epoch: 3, batch: 836, loss: 0.09677, acc: 0.94109,f1_macro: 0.95833
global step 3200, epoch: 3, batch: 936, loss: 0.12877, acc: 0.94071,f1_macro: 0.96000
global step 3300, epoch: 3, batch: 1036, loss: 0.12475, acc: 0.94058,f1_macro: 0.96154
eval loss: 0.50610, accu: 0.82055
Best Val acc 0.82055
global step 3400, epoch: 4, batch: 4, loss: 0.03767, acc: 0.98438,f1_macro: 1.00000
global step 3500, epoch: 4, batch: 104, loss: 0.00521, acc: 0.98257,f1_macro: 1.00000
global step 3600, epoch: 4, batch: 204, loss: 0.01782, acc: 0.98100,f1_macro: 1.00000
global step 3700, epoch: 4, batch: 304, loss: 0.03877, acc: 0.97995,f1_macro: 1.00000
global step 3800, epoch: 4, batch: 404, loss: 0.03308, acc: 0.98028,f1_macro: 1.00000
global step 3900, epoch: 4, batch: 504, loss: 0.02905, acc: 0.98065,f1_macro: 1.00000
global step 4000, epoch: 4, batch: 604, loss: 0.00859, acc: 0.97987,f1_macro: 1.00000
global step 4100, epoch: 4, batch: 704, loss: 0.00564, acc: 0.97954,f1_macro: 1.00000
global step 4200, epoch: 4, batch: 804, loss: 0.02505, acc: 0.97905,f1_macro: 1.00000
global step 4300, epoch: 4, batch: 904, loss: 0.01213, acc: 0.97905,f1_macro: 1.00000
global step 4400, epoch: 4, batch: 1004, loss: 0.30969, acc: 0.97883,f1_macro: 0.92683
global step 4500, epoch: 4, batch: 1104, loss: 0.11206, acc: 0.97905,f1_macro: 0.98039
eval loss: 0.65418, accu: 0.82221
Best Val acc 0.82221
global step 4600, epoch: 5, batch: 72, loss: 0.04641, acc: 0.99045,f1_macro: 0.98182
global step 4700, epoch: 5, batch: 172, loss: 0.00841, acc: 0.99001,f1_macro: 1.00000
global step 4800, epoch: 5, batch: 272, loss: 0.00417, acc: 0.99081,f1_macro: 1.00000
global step 4900, epoch: 5, batch: 372, loss: 0.00382, acc: 0.98975,f1_macro: 1.00000
global step 5000, epoch: 5, batch: 472, loss: 0.00420, acc: 0.99060,f1_macro: 1.00000
global step 5100, epoch: 5, batch: 572, loss: 0.01085, acc: 0.99088,f1_macro: 1.00000
global step 5200, epoch: 5, batch: 672, loss: 0.02734, acc: 0.99056,f1_macro: 0.97778
global step 5300, epoch: 5, batch: 772, loss: 0.00837, acc: 0.99053,f1_macro: 1.00000
global step 5400, epoch: 5, batch: 872, loss: 0.05018, acc: 0.99097,f1_macro: 0.97674
global step 5500, epoch: 5, batch: 972, loss: 0.04738, acc: 0.99126,f1_macro: 0.97959
global step 5600, epoch: 5, batch: 1072, loss: 0.01556, acc: 0.99128,f1_macro: 1.00000
eval loss: 0.79161, accu: 0.82597
Best Val acc 0.82597
[2022-11-23 19:47:36,006] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
100%|██████████| 65/65 [00:02<00:00, 31.80it/s]
100%|██████████| 65/65 [00:01<00:00, 33.80it/s]
100%|██████████| 65/65 [00:01<00:00, 32.83it/s]
100%|██████████| 65/65 [00:01<00:00, 33.76it/s]
100%|██████████| 65/65 [00:01<00:00, 33.67it/s]