[add]上传训练benchmark by z00560161

2020-10-19 20:22:23 +08:00
parent 22b83024f5
commit 82522e2f61
1225 changed files with 345421 additions and 0 deletions
@@ -0,0 +1,480 @@
+# Contents
+
+- [ResNet Description](#resnet-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+    - [Evaluation Process](#evaluation-process)
+- [Model Description](#model-description)
+    - [Performance](#performance)
+        - [Evaluation Performance](#evaluation-performance)
+- [Description of Random Situation](#description-of-random-situation)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+
+# [ResNet Description](#contents)
+## Description
+ResNet (residual neural network) was proposed by Kaiming He and other four Chinese of Microsoft Research Institute. Through the use of ResNet unit, it successfully trained 152 layers of neural network, and won the championship in ilsvrc2015. The error rate on top 5 was 3.57%, and the parameter quantity was lower than vggnet, so the effect was very outstanding. Traditional convolution network or full connection network will have more or less information loss. At the same time, it will lead to the disappearance or explosion of gradient, which leads to the failure of deep network training. ResNet solves this problem to a certain extent. By passing the input information to the output, the integrity of the information is protected. The whole network only needs to learn the part of the difference between input and output, which simplifies the learning objectives and difficulties.The structure of ResNet can accelerate the training of neural network very quickly, and the accuracy of the model is also greatly improved. At the same time, ResNet is very popular, even can be directly used in the concept net network.
+
+These are examples of training ResNet50/ResNet101/SE-ResNet50 with CIFAR-10/ImageNet2012 dataset in MindSpore.ResNet50 and ResNet101 can reference [paper 1](https://arxiv.org/pdf/1512.03385.pdf) below, and SE-ResNet50 is a variant of ResNet50 which reference  [paper 2](https://arxiv.org/abs/1709.01507) and [paper 3](https://arxiv.org/abs/1812.01187) below, Training SE-ResNet50 for just 24 epochs using 8 Ascend 910, we can reach top-1 accuracy of 75.9%.(Training ResNet101 with dataset CIFAR-10 and SE-ResNet50 with CIFAR-10 is not supported yet.)
+
+## Paper
+1.[paper](https://arxiv.org/pdf/1512.03385.pdf):Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition"
+
+2.[paper](https://arxiv.org/abs/1709.01507):Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu. "Squeeze-and-Excitation Networks"
+
+3.[paper](https://arxiv.org/abs/1812.01187):Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+
+# [Model Architecture](#contents)
+
+The overall network architecture of ResNet is show below:
+[Link](https://arxiv.org/pdf/1512.03385.pdf)
+
+# [Dataset](#contents)
+
+Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>)
+- Dataset size：60,000 32*32 colorful images in 10 classes
+  - Train：50,000 images
+  - Test： 10,000 images
+- Data format：binary files
+  - Note：Data will be processed in dataset.py
+- Download the dataset, the directory structure is as follows:
+
+```
+├─cifar-10-batches-bin
+│
+└─cifar-10-verify-bin
+```
+
+Dataset used: [ImageNet2012](http://www.image-net.org/)
+
+- Dataset size 224*224 colorful images in 1000 classes
+  - Train：1,281,167 images  
+  - Test： 50,000 images   
+- Data format：jpeg
+  - Note：Data will be processed in dataset.py
+- Download the dataset, the directory structure is as follows:
+
+ ```
+└─dataset
+    ├─ilsvrc                # train dataset
+    └─validation_preprocess # evaluate dataset
+```
+
+# [Features](#contents)
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data types, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend/GPU）
+  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
+- Framework
+  - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+  - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
+  - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
+
+
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
+- Runing on Ascend
+```
+# distributed training
+Usage: sh run_distribute_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# standalone training
+Usage: sh run_standalone_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH]
+[PRETRAINED_CKPT_PATH](optional)
+
+# run evaluation example
+Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+```
+
+- Runing on GPU
+```
+# distributed training example
+sh run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012]  [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# standalone training example
+sh run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# infer example
+sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+```
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+```shell
+.
+└──resnet
+  ├── README.md
+  ├── script
+    ├── run_distribute_train.sh            # launch ascend distributed training(8 pcs)
+    ├── run_parameter_server_train.sh      # launch ascend parameter server training(8 pcs)
+    ├── run_eval.sh                        # launch ascend evaluation
+    ├── run_standalone_train.sh            # launch ascend standalone training(1 pcs)
+    ├── run_distribute_train_gpu.sh        # launch gpu distributed training(8 pcs)
+    ├── run_parameter_server_train_gpu.sh  # launch gpu parameter server training(8 pcs)
+    ├── run_eval_gpu.sh                    # launch gpu evaluation
+    └── run_standalone_train_gpu.sh        # launch gpu standalone training(1 pcs)
+  ├── src
+    ├── config.py                          # parameter configuration
+    ├── dataset.py                         # data preprocessing
+    ├── crossentropy.py                    # loss definition for ImageNet2012 dataset
+    ├── lr_generator.py                    # generate learning rate for each step
+    └── resnet.py                          # resnet backbone, including resnet50 and resnet101 and se-resnet50
+  ├── eval.py                              # eval net
+  └── train.py                             # train net
+```
+
+## [Script Parameters](#contents)
+
+Parameters for both training and evaluation can be set in config.py.
+
+- Config for ResNet50, CIFAR-10 dataset
+
+```
+"class_num": 10,                  # dataset class num
+"batch_size": 32,                 # batch size of input tensor
+"loss_scale": 1024,               # loss scale
+"momentum": 0.9,                  # momentum
+"weight_decay": 1e-4,             # weight decay 
+"epoch_size": 90,                 # only valid for taining, which is always 1 for inference 
+"pretrain_epoch_size": 0,         # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
+"save_checkpoint": True,          # whether save checkpoint or not
+"save_checkpoint_epochs": 5,      # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last step
+"keep_checkpoint_max": 10,        # only keep the last keep_checkpoint_max checkpoint
+"save_checkpoint_path": "./",     # path to save checkpoint
+"warmup_epochs": 5,               # number of warmup epoch
+"lr_decay_mode": "poly"           # decay mode can be selected in steps, ploy and default
+"lr_init": 0.01,                  # initial learning rate
+"lr_end": 0.00001,                # final learning rate
+"lr_max": 0.1,                    # maximum learning rate
+```
+
+- Config for ResNet50, ImageNet2012 dataset
+
+```
+"class_num": 1001,                # dataset class number
+"batch_size": 32,                 # batch size of input tensor
+"loss_scale": 1024,               # loss scale
+"momentum": 0.9,                  # momentum optimizer
+"weight_decay": 1e-4,             # weight decay 
+"epoch_size": 90,                 # only valid for taining, which is always 1 for inference 
+"pretrain_epoch_size": 0,         # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
+"save_checkpoint": True,          # whether save checkpoint or not
+"save_checkpoint_epochs": 5,      # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
+"keep_checkpoint_max": 10,        # only keep the last keep_checkpoint_max checkpoint
+"save_checkpoint_path": "./",     # path to save checkpoint relative to the executed path
+"warmup_epochs": 0,               # number of warmup epoch
+"lr_decay_mode": "Linear",        # decay mode for generating learning rate
+"label_smooth": True,             # label smooth
+"label_smooth_factor": 0.1,       # label smooth factor
+"lr_init": 0,                     # initial learning rate
+"lr_max": 0.1,                    # maximum learning rate
+"lr_end": 0.0,                    # minimum learning rate
+```
+
+- Config for ResNet101, ImageNet2012 dataset
+
+```
+"class_num": 1001,                # dataset class number
+"batch_size": 32,                 # batch size of input tensor
+"loss_scale": 1024,               # loss scale
+"momentum": 0.9,                  # momentum optimizer
+"weight_decay": 1e-4,             # weight decay
+"epoch_size": 120,                # epoch size for training
+"pretrain_epoch_size": 0,         # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
+"save_checkpoint": True,          # whether save checkpoint or not
+"save_checkpoint_epochs": 5,      # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
+"keep_checkpoint_max": 10,        # only keep the last keep_checkpoint_max checkpoint
+"save_checkpoint_path": "./",     # path to save checkpoint relative to the executed path
+"warmup_epochs": 0,               # number of warmup epoch
+"lr_decay_mode": "cosine"         # decay mode for generating learning rate
+"label_smooth": 1,                # label_smooth
+"label_smooth_factor": 0.1,       # label_smooth_factor
+"lr": 0.1                         # base learning rate
+```
+
+- Config for SE-ResNet50, ImageNet2012 dataset
+
+```
+"class_num": 1001,                # dataset class number
+"batch_size": 32,                 # batch size of input tensor
+"loss_scale": 1024,               # loss scale
+"momentum": 0.9,                  # momentum optimizer
+"weight_decay": 1e-4,             # weight decay
+"epoch_size": 28 ,                # epoch size for creating learning rate
+"train_epoch_size": 24            # actual train epoch size
+"pretrain_epoch_size": 0,         # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
+"save_checkpoint": True,          # whether save checkpoint or not
+"save_checkpoint_epochs": 4,      # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
+"keep_checkpoint_max": 10,        # only keep the last keep_checkpoint_max checkpoint
+"save_checkpoint_path": "./",     # path to save checkpoint relative to the executed path
+"warmup_epochs": 3,               # number of warmup epoch
+"lr_decay_mode": "cosine"         # decay mode for generating learning rate
+"label_smooth": True,             # label_smooth
+"label_smooth_factor": 0.1,       # label_smooth_factor
+"lr_init": 0.0,                   # initial learning rate
+"lr_max": 0.3,                    # maximum learning rate
+"lr_end": 0.0001,                 # end learning rate
+```
+
+## [Training Process](#contents)
+
+### Usage
+#### Running on Ascend
+```
+# distributed training
+Usage: sh run_distribute_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# standalone training
+Usage: sh run_standalone_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH]
+[PRETRAINED_CKPT_PATH](optional)
+
+# run evaluation example
+Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+
+```
+For distributed training, a hccl configuration file with JSON format needs to be created in advance.
+
+Please follow the instructions in the link [hccn_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
+
+Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the followings in log.
+
+#### Running on GPU
+
+```
+# distributed training example
+sh run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012]  [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# standalone training example
+sh run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+
+# infer example
+sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+```
+
+#### Running parameter server mode training
+
+- Parameter server training Ascend example
+
+```
+sh run_parameter_server_train.sh [resnet50|resnet101] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+```
+
+- Parameter server training GPU example
+```
+sh run_parameter_server_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
+```
+
+### Result
+
+- Training ResNet50 with CIFAR-10 dataset
+
+```
+# distribute training result(8 pcs)
+epoch: 1 step: 195, loss is 1.9601055
+epoch: 2 step: 195, loss is 1.8555021
+epoch: 3 step: 195, loss is 1.6707983
+epoch: 4 step: 195, loss is 1.8162166
+epoch: 5 step: 195, loss is 1.393667
+...
+```
+
+- Training ResNet50 with ImageNet2012 dataset
+
+```
+# distribute training result(8 pcs)
+epoch: 1 step: 5004, loss is 4.8995576
+epoch: 2 step: 5004, loss is 3.9235563
+epoch: 3 step: 5004, loss is 3.833077
+epoch: 4 step: 5004, loss is 3.2795618
+epoch: 5 step: 5004, loss is 3.1978393
+...
+```
+
+- Training ResNet101 with ImageNet2012 dataset
+
+```
+# distribute training result(8p)
+epoch: 1 step: 5004, loss is 4.805483
+epoch: 2 step: 5004, loss is 3.2121816
+epoch: 3 step: 5004, loss is 3.429647
+epoch: 4 step: 5004, loss is 3.3667371
+epoch: 5 step: 5004, loss is 3.1718972
+...
+epoch: 67 step: 5004, loss is 2.2768745
+epoch: 68 step: 5004, loss is 1.7223864
+epoch: 69 step: 5004, loss is 2.0665488
+epoch: 70 step: 5004, loss is 1.8717369
+...
+```
+- Training SE-ResNet50 with ImageNet2012 dataset
+
+```
+# distribute training result(8 pcs)
+epoch: 1 step: 5004, loss is 5.1779146
+epoch: 2 step: 5004, loss is 4.139395
+epoch: 3 step: 5004, loss is 3.9240637
+epoch: 4 step: 5004, loss is 3.5011306
+epoch: 5 step: 5004, loss is 3.3501816
+...
+```
+
+## [Evaluation Process](#contents)
+
+### Usage
+
+#### Running on Ascend
+```
+# evaluation
+Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+```
+
+```
+# evaluation example
+sh run_eval.sh resnet50 cifar10 ~/cifar10-10-verify-bin ~/resnet50_cifar10/train_parallel0/resnet-90_195.ckpt
+```
+
+> checkpoint can be produced in training process.
+
+#### Running on GPU
+```
+sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
+```
+
+### Result
+
+Evaluation result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log.
+
+- Evaluating ResNet50 with CIFAR-10 dataset
+
+```
+result: {'acc': 0.91446314102564111} ckpt=~/resnet50_cifar10/train_parallel0/resnet-90_195.ckpt
+```
+
+- Evaluating ResNet50 with ImageNet2012 dataset
+
+```
+result: {'acc': 0.7671054737516005} ckpt=train_parallel0/resnet-90_5004.ckpt
+```
+
+- Evaluating ResNet101 with ImageNet2012 dataset
+
+```
+result: {'top_5_accuracy': 0.9429417413572343, 'top_1_accuracy': 0.7853513124199744} ckpt=train_parallel0/resnet-120_5004.ckpt
+```
+
+- Evaluating SE-ResNet50 with ImageNet2012 dataset
+
+```
+result: {'top_5_accuracy': 0.9342589628681178, 'top_1_accuracy': 0.768065781049936} ckpt=train_parallel0/resnet-24_5004.ckpt
+
+```
+
+# [Model Description](#contents)
+## [Performance](#contents)
+
+### Evaluation Performance 
+
+#### ResNet50 on CIFAR-10
+| Parameters                 | Ascend 910                                                   |   GPU |
+| -------------------------- | -------------------------------------- |---------------------------------- |
+| Model Version              | ResNet50-v1.5                                                |ResNet50-v1.5|
+| Resource                   | Ascend 910，CPU 2.60GHz 56cores，Memory 314G  | GPU(Tesla V100 SXM2)，CPU 2.1GHz 24cores，Memory 128G
+| uploaded Date              | 04/01/2020 (month/day/year)                          | 08/01/2020 (month/day/year)
+| MindSpore Version          | 0.1.0-alpha                                                       |0.6.0-alpha   |
+| Dataset                    | CIFAR-10                                                    | CIFAR-10
+| Training Parameters        | epoch=90, steps per epoch=195, batch_size = 32             |epoch=90, steps per epoch=195, batch_size = 32  |
+| Optimizer                  | Momentum                                                         |Momentum|
+| Loss Function              | Softmax Cross Entropy                                       |Softmax Cross Entropy           |
+| outputs                    | probability                                                 |  probability          |
+| Loss                       | 0.000356                                                    | 0.000716  |
+| Speed                      | 18.4ms/step（8pcs）                     |69ms/step（8pcs）|
+| Total time                 | 6 mins                          | 20.2 mins|
+| Parameters (M)             | 25.5                                                         | 25.5 |
+| Checkpoint for Fine tuning | 179.7M (.ckpt file)                                         |179.7M (.ckpt file)|
+| Scripts                    | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
+
+#### ResNet50 on ImageNet2012
+| Parameters                 | Ascend 910                                                   |   GPU |
+| -------------------------- | -------------------------------------- |---------------------------------- |
+| Model Version              | ResNet50-v1.5                                                |ResNet50-v1.5|
+| Resource                   | Ascend 910，CPU 2.60GHz 56cores，Memory 314G  |  GPU(Tesla V100 SXM2)，CPU 2.1GHz 24cores，Memory 128G
+| uploaded Date              | 04/01/2020 (month/day/year)  ；                        | 08/01/2020 (month/day/year)
+| MindSpore Version          | 0.1.0-alpha                                                       |0.6.0-alpha   |
+| Dataset                    | ImageNet2012                                                    | ImageNet2012|
+| Training Parameters        | epoch=90, steps per epoch=5004, batch_size = 32             |epoch=90, steps per epoch=5004, batch_size = 32  |
+| Optimizer                  | Momentum                                                         |Momentum|
+| Loss Function              | Softmax Cross Entropy                                       |Softmax Cross Entropy           |
+| outputs                    | probability                                                 |  probability          |
+| Loss                       | 1.8464266                                                    | 1.9023  |
+| Speed                      | 18.4ms/step（8pcs）                     |67.1ms/step（8pcs）|
+| Total time                 | 139 mins                          | 500 mins|
+| Parameters (M)             | 25.5                                                         | 25.5 |
+| Checkpoint for Fine tuning | 197M (.ckpt file)                                         |197M (.ckpt file)     |
+| Scripts                    | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
+
+#### ResNet101 on ImageNet2012
+| Parameters                 | Ascend 910                                                   |   GPU |
+| -------------------------- | -------------------------------------- |---------------------------------- |
+| Model Version              | ResNet101                                                |ResNet101|
+| Resource                   | Ascend 910，CPU 2.60GHz 56cores，Memory 314G  |  GPU(Tesla V100 SXM2)，CPU 2.1GHz 24cores，Memory 128G
+| uploaded Date              | 04/01/2020 (month/day/year)                          | 08/01/2020 (month/day/year)
+| MindSpore Version          | 0.1.0-alpha                                                       |0.6.0-alpha   |
+| Dataset                    | ImageNet2012                                                    | ImageNet2012|
+| Training Parameters        | epoch=120, steps per epoch=5004, batch_size = 32             |epoch=120, steps per epoch=5004, batch_size = 32  |
+| Optimizer                  | Momentum                                                         |Momentum|
+| Loss Function              | Softmax Cross Entropy                                       |Softmax Cross Entropy           |
+| outputs                    | probability                                                 |  probability          |
+| Loss                       | 1.6453942                                                    | 1.7023412  |
+| Speed                      | 30.3ms/step（8pcs）                     |108.6ms/step（8pcs）|
+| Total time                 | 301 mins                          | 1100 mins|
+| Parameters (M)             | 44.6                                                        | 44.6 |
+| Checkpoint for Fine tuning | 343M (.ckpt file)                                         |343M (.ckpt file)     |
+| Scripts                    | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
+
+#### SE-ResNet50 on ImageNet2012
+
+| Parameters                 | Ascend 910
+| -------------------------- | ------------------------------------------------------------------------ |
+| Model Version              | SE-ResNet50                                               |
+| Resource                   | Ascend 910，CPU 2.60GHz 56cores，Memory 314G  |
+| uploaded Date              | 08/16/2020 (month/day/year)  ；                        |
+| MindSpore Version          | 0.7.0-alpha                                                 |
+| Dataset                    | ImageNet2012                                                |
+| Training Parameters        | epoch=24, steps per epoch=5004, batch_size = 32             |
+| Optimizer                  | Momentum                                                    |
+| Loss Function              | Softmax Cross Entropy                                       |
+| outputs                    | probability                                                 |
+| Loss                       | 1.754404                                                    |
+| Speed                      | 24.6ms/step（8pcs）                     |
+| Total time                 | 49.3 mins                                                  |
+| Parameters (M)             | 25.5                                                         |
+| Checkpoint for Fine tuning | 215.9M (.ckpt file)                                         |
+| Scripts                    | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
+
+# [Description of Random Situation](#contents)
+
+In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
+
+
+# [ModelZoo Homepage](#contents)
+ Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
@@ -0,0 +1,89 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""train resnet."""
+import os
+import argparse
+from mindspore import context
+from mindspore.common import set_seed
+from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
+from mindspore.train.model import Model
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from src.CrossEntropySmooth import CrossEntropySmooth
+
+parser = argparse.ArgumentParser(description='Image classification')
+parser.add_argument('--net', type=str, default=None, help='Resnet Model, either resnet50 or resnet101')
+parser.add_argument('--dataset', type=str, default=None, help='Dataset, either cifar10 or imagenet2012')
+
+parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
+parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
+parser.add_argument('--device_target', type=str, default='Ascend', help='Device target')
+args_opt = parser.parse_args()
+
+set_seed(1)
+
+if args_opt.net == "resnet50":
+    from src.resnet import resnet50 as resnet
+    if args_opt.dataset == "cifar10":
+        from src.config import config1 as config
+        from src.dataset import create_dataset1 as create_dataset
+    else:
+        from src.config import config2 as config
+        from src.dataset import create_dataset2 as create_dataset
+elif args_opt.net == "resnet101":
+    from src.resnet import resnet101 as resnet
+    from src.config import config3 as config
+    from src.dataset import create_dataset3 as create_dataset
+else:
+    from src.resnet import se_resnet50 as resnet
+    from src.config import config4 as config
+    from src.dataset import create_dataset4 as create_dataset
+
+if __name__ == '__main__':
+    target = args_opt.device_target
+
+    # init context
+    context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False)
+    if target != "GPU":
+        device_id = int(os.getenv('DEVICE_ID'))
+        context.set_context(device_id=device_id)
+
+    # create dataset
+    dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=False, batch_size=config.batch_size,
+                             target=target)
+    step_size = dataset.get_dataset_size()
+
+    # define net
+    net = resnet(class_num=config.class_num)
+
+    # load checkpoint
+    param_dict = load_checkpoint(args_opt.checkpoint_path)
+    load_param_into_net(net, param_dict)
+    net.set_train(False)
+
+    # define loss, model
+    if args_opt.dataset == "imagenet2012":
+        if not config.use_label_smooth:
+            config.label_smooth_factor = 0.0
+        loss = CrossEntropySmooth(sparse=True, reduction='mean',
+                                  smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
+    else:
+        loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
+
+    # define model
+    model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
+
+    # eval model
+    res = model.eval(dataset)
+    print("result:", res, "ckpt=", args_opt.checkpoint_path)
@@ -0,0 +1,25 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""hub config."""
+from src.resnet import resnet50, resnet101, se_resnet50
+
+def create_network(name, *args, **kwargs):
+    if name == 'resnet50':
+        return resnet50(*args, **kwargs)
+    if name == 'resnet101':
+        return resnet101(*args, **kwargs)
+    if name == 'se_resnet50':
+        return se_resnet50(*args, **kwargs)
+    raise NotImplementedError(f"{name} is not implemented in the repo")
@@ -0,0 +1,38 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""define loss function for network"""
+import mindspore.nn as nn
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+from mindspore.nn.loss.loss import _Loss
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+
+
+class CrossEntropySmooth(_Loss):
+    """CrossEntropy"""
+    def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
+        super(CrossEntropySmooth, self).__init__()
+        self.onehot = P.OneHot()
+        self.sparse = sparse
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
+
+    def construct(self, logit, label):
+        if self.sparse:
+            label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
+        loss = self.ce(logit, label)
+        return loss
@@ -0,0 +1,106 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+network config setting, will be used in train.py and eval.py
+"""
+from easydict import EasyDict as ed
+
+
+# config for resnet50, imagenet2012
+config2 = ed({
+    'class_num': 1001,
+    'batch_size': 256,
+    'loss_scale': 1024,
+    'momentum': 0.9,
+    'weight_decay': 1e-4,
+    'epoch_size': 5,
+    'pretrain_epoch_size': 0,
+    'save_checkpoint': True,
+    'save_checkpoint_epochs': 5,
+    'keep_checkpoint_max': 10,
+    'save_checkpoint_path': './',
+    'warmup_epochs': 0,
+    'lr_decay_mode': 'linear',
+    'use_label_smooth': True,
+    'label_smooth_factor': 0.1,
+    'lr_init': 0,
+    'lr_max': 0.8,
+    'lr_end': 0.0
+})
+
+
+# config for resent50, cifar10
+config1 = ed({
+    'class_num': 10,
+    'batch_size': 32,
+    'loss_scale': 1024,
+    'momentum': 0.9,
+    'weight_decay': 1e-4,
+    'epoch_size': 90,
+    'pretrain_epoch_size': 0,
+    'save_checkpoint': True,
+    'save_checkpoint_epochs': 5,
+    'keep_checkpoint_max': 10,
+    'save_checkpoint_path': './',
+    'warmup_epochs': 5,
+    'lr_decay_mode': 'poly',
+    'lr_init': 0.01,
+    'lr_end': 0.00001,
+    'lr_max': 0.1
+})
+
+
+# config for resent101, imagenet2012
+config3 = ed({
+    'class_num': 1001,
+    'batch_size': 32,
+    'loss_scale': 1024,
+    'momentum': 0.9,
+    'weight_decay': 1e-4,
+    'epoch_size': 120,
+    'pretrain_epoch_size': 0,
+    'save_checkpoint': True,
+    'save_checkpoint_epochs': 5,
+    'keep_checkpoint_max': 10,
+    'save_checkpoint_path': './',
+    'warmup_epochs': 0,
+    'lr_decay_mode': 'cosine',
+    'use_label_smooth': True,
+    'label_smooth_factor': 0.1,
+    'lr': 0.1
+})
+
+# config for se-resnet50, imagenet2012
+config4 = ed({
+    'class_num': 1001,
+    'batch_size': 32,
+    'loss_scale': 1024,
+    'momentum': 0.9,
+    'weight_decay': 1e-4,
+    'epoch_size': 28,
+    'train_epoch_size': 24,
+    'pretrain_epoch_size': 0,
+    'save_checkpoint': True,
+    'save_checkpoint_epochs': 4,
+    'keep_checkpoint_max': 10,
+    'save_checkpoint_path': './',
+    'warmup_epochs': 3,
+    'lr_decay_mode': 'cosine',
+    'use_label_smooth': True,
+    'label_smooth_factor': 0.1,
+    'lr_init': 0.0,
+    'lr_max': 0.3,
+    'lr_end': 0.0001
+})
@@ -0,0 +1,263 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+create train or eval dataset.
+"""
+import os
+import mindspore.common.dtype as mstype
+import mindspore.dataset.engine as de
+import mindspore.dataset.vision.c_transforms as C
+import mindspore.dataset.transforms.c_transforms as C2
+from mindspore.communication.management import init, get_rank, get_group_size
+
+
+def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or evaluate cifar10 dataset for resnet50
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    else:
+        init()
+        rank_id = get_rank()
+        device_num = get_group_size()
+
+    if device_num == 1:
+        ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                               num_shards=device_num, shard_id=rank_id)
+
+    # define map operations
+    trans = []
+    if do_train:
+        trans += [
+            C.RandomCrop((32, 32), (4, 4, 4, 4)),
+            C.RandomHorizontalFlip(prob=0.5)
+        ]
+
+    trans += [
+        C.Resize((224, 224)),
+        C.Rescale(1.0 / 255.0, 0.0),
+        C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
+        C.HWC2CHW()
+    ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+    ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+    # apply dataset repeat operation
+    ds = ds.repeat(repeat_num)
+
+    return ds
+
+
+def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or eval imagenet2012 dataset for resnet50
+
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    else:
+        init()
+        rank_id = get_rank()
+        device_num = get_group_size()
+
+    if device_num == 1:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+
+    image_size = 224
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    # define map operations
+    if do_train:
+        trans = [
+            C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+            C.RandomHorizontalFlip(prob=0.5),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+    else:
+        trans = [
+            C.Decode(),
+            C.Resize(256),
+            C.CenterCrop(image_size),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
+    ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    ds = ds.repeat(repeat_num)
+
+    return ds
+
+
+def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or eval imagenet2012 dataset for resnet101
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+
+    Returns:
+        dataset
+    """
+    device_num, rank_id = _get_rank_info()
+
+    if device_num == 1:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+    image_size = 224
+    mean = [0.475 * 255, 0.451 * 255, 0.392 * 255]
+    std = [0.275 * 255, 0.267 * 255, 0.278 * 255]
+
+    # define map operations
+    if do_train:
+        trans = [
+            C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+            C.RandomHorizontalFlip(rank_id / (rank_id + 1)),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+    else:
+        trans = [
+            C.Decode(),
+            C.Resize(256),
+            C.CenterCrop(image_size),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
+    ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+    # apply dataset repeat operation
+    ds = ds.repeat(repeat_num)
+
+    return ds
+
+
+def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or eval imagenet2012 dataset for se-resnet50
+
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    if device_num == 1:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True)
+    else:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+    image_size = 224
+    mean = [123.68, 116.78, 103.94]
+    std = [1.0, 1.0, 1.0]
+
+    # define map operations
+    if do_train:
+        trans = [
+            C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+            C.RandomHorizontalFlip(prob=0.5),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+    else:
+        trans = [
+            C.Decode(),
+            C.Resize(292),
+            C.CenterCrop(256),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+    ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=12)
+    ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    ds = ds.repeat(repeat_num)
+
+    return ds
+
+
+def _get_rank_info():
+    """
+    get rank size and rank id
+    """
+    rank_size = int(os.environ.get("RANK_SIZE", 1))
+
+    if rank_size > 1:
+        rank_size = get_group_size()
+        rank_id = get_rank()
+    else:
+        rank_size = 1
+        rank_id = 0
+
+    return rank_size, rank_id
@@ -0,0 +1,205 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""learning rate generator"""
+import math
+import numpy as np
+
+
+def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps):
+    """
+    Applies three steps decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps]
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+        else:
+            if i < decay_epoch_index[0]:
+                lr = lr_max
+            elif i < decay_epoch_index[1]:
+                lr = lr_max * 0.1
+            elif i < decay_epoch_index[2]:
+                lr = lr_max * 0.01
+            else:
+                lr = lr_max * 0.001
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Applies polynomial decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    lr_each_step = []
+    if warmup_steps != 0:
+        inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+    else:
+        inc_each_step = 0
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = float(lr_init) + inc_each_step * float(i)
+        else:
+            base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
+            lr = float(lr_max) * base * base
+            if lr < 0.0:
+                lr = 0.0
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Applies cosine decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    decay_steps = total_steps - warmup_steps
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+            lr = float(lr_init) + lr_inc * (i + 1)
+        else:
+            cosine_decay = 0.5 * (1 + math.cos(math.pi * (i-warmup_steps) / decay_steps))
+            lr = (lr_max-lr_end)*cosine_decay + lr_end
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Applies liner decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+        else:
+            lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps)
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+
+def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode):
+    """
+    generate learning rate array
+
+    Args:
+       lr_init(float): init learning rate
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate
+       warmup_epochs(int): number of warmup epochs
+       total_epochs(int): total epoch of training
+       steps_per_epoch(int): steps of one epoch
+       lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or liner(default)
+
+    Returns:
+       np.array, learning rate array
+    """
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    warmup_steps = steps_per_epoch * warmup_epochs
+
+    if lr_decay_mode == 'steps':
+        lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps)
+    elif lr_decay_mode == 'poly':
+        lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+    elif lr_decay_mode == 'cosine':
+        lr_each_step = _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+    else:
+        lr_each_step = _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    return lr_each_step
+
+
+def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    lr = float(init_lr) + lr_inc * current_step
+    return lr
+
+
+def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0):
+    """
+    generate learning rate array with cosine
+
+    Args:
+       lr(float): base learning rate
+       steps_per_epoch(int): steps size of one epoch
+       warmup_epochs(int): number of warmup epochs
+       max_epoch(int): total epochs of training
+       global_step(int): the current start index of lr array
+    Returns:
+       np.array, learning rate array
+    """
+    base_lr = lr
+    warmup_init_lr = 0
+    total_steps = int(max_epoch * steps_per_epoch)
+    warmup_steps = int(warmup_epochs * steps_per_epoch)
+    decay_steps = total_steps - warmup_steps
+
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
+        else:
+            linear_decay = (total_steps - i) / decay_steps
+            cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
+            decayed = linear_decay * cosine_decay + 0.00001
+            lr = base_lr * decayed
+        lr_each_step.append(lr)
+
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[global_step:]
+    return learning_rate
@@ -0,0 +1,393 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""ResNet."""
+import numpy as np
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common.tensor import Tensor
+from scipy.stats import truncnorm
+
+def _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size):
+    fan_in = in_channel * kernel_size * kernel_size
+    scale = 1.0
+    scale /= max(1., fan_in)
+    stddev = (scale ** 0.5) / .87962566103423978
+    mu, sigma = 0, stddev
+    weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size)
+    weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size))
+    return Tensor(weight, dtype=mstype.float32)
+
+def _weight_variable(shape, factor=0.01):
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+
+def _conv3x3(in_channel, out_channel, stride=1, use_se=False):
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=3)
+    else:
+        weight_shape = (out_channel, in_channel, 3, 3)
+        weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=3, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv1x1(in_channel, out_channel, stride=1, use_se=False):
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=1)
+    else:
+        weight_shape = (out_channel, in_channel, 1, 1)
+        weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=1, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv7x7(in_channel, out_channel, stride=1, use_se=False):
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=7)
+    else:
+        weight_shape = (out_channel, in_channel, 7, 7)
+        weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _bn(channel):
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _bn_last(channel):
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _fc(in_channel, out_channel, use_se=False):
+    if use_se:
+        weight = np.random.normal(loc=0, scale=0.01, size=out_channel*in_channel)
+        weight = Tensor(np.reshape(weight, (out_channel, in_channel)), dtype=mstype.float32)
+    else:
+        weight_shape = (out_channel, in_channel)
+        weight = _weight_variable(weight_shape)
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+
+class ResidualBlock(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channel (int): Input channel.
+        out_channel (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        use_se (bool): enable SE-ResNet50 net. Default: False.
+        se_block(bool): use se block in SE-ResNet50 net. Default: False.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlock(3, 256, stride=2)
+    """
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1,
+                 use_se=False, se_block=False):
+        super(ResidualBlock, self).__init__()
+        self.stride = stride
+        self.use_se = use_se
+        self.se_block = se_block
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1, use_se=self.use_se)
+        self.bn1 = _bn(channel)
+        if self.use_se and self.stride != 1:
+            self.e2 = nn.SequentialCell([_conv3x3(channel, channel, stride=1, use_se=True), _bn(channel),
+                                         nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same')])
+        else:
+            self.conv2 = _conv3x3(channel, channel, stride=stride, use_se=self.use_se)
+            self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se)
+        self.bn3 = _bn_last(out_channel)
+        if self.se_block:
+            self.se_global_pool = P.ReduceMean(keep_dims=False)
+            self.se_dense_0 = _fc(out_channel, int(out_channel/4), use_se=self.use_se)
+            self.se_dense_1 = _fc(int(out_channel/4), out_channel, use_se=self.use_se)
+            self.se_sigmoid = nn.Sigmoid()
+            self.se_mul = P.Mul()
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            if self.use_se:
+                if stride == 1:
+                    self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel,
+                                                                         stride, use_se=self.use_se), _bn(out_channel)])
+                else:
+                    self.down_sample_layer = nn.SequentialCell([nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same'),
+                                                                _conv1x1(in_channel, out_channel, 1,
+                                                                         use_se=self.use_se), _bn(out_channel)])
+            else:
+                self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride,
+                                                                     use_se=self.use_se), _bn(out_channel)])
+        self.add = P.TensorAdd()
+
+    def construct(self, x):
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        if self.use_se and self.stride != 1:
+            out = self.e2(out)
+        else:
+            out = self.conv2(out)
+            out = self.bn2(out)
+            out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.se_block:
+            out_se = out
+            out = self.se_global_pool(out, (2, 3))
+            out = self.se_dense_0(out)
+            out = self.relu(out)
+            out = self.se_dense_1(out)
+            out = self.se_sigmoid(out)
+            out = F.reshape(out, F.shape(out) + (1, 1))
+            out = self.se_mul(out, out_se)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+
+class ResNet(nn.Cell):
+    """
+    ResNet architecture.
+
+    Args:
+        block (Cell): Block for network.
+        layer_nums (list): Numbers of block in different layers.
+        in_channels (list): Input channel in each layer.
+        out_channels (list): Output channel in each layer.
+        strides (list):  Stride size in each layer.
+        num_classes (int): The number of classes that the training images are belonging to.
+        use_se (bool): enable SE-ResNet50 net. Default: False.
+        se_block(bool): use se block in SE-ResNet50 net in layer 3 and layer 4. Default: False.
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResNet(ResidualBlock,
+        >>>        [3, 4, 6, 3],
+        >>>        [64, 256, 512, 1024],
+        >>>        [256, 512, 1024, 2048],
+        >>>        [1, 2, 2, 2],
+        >>>        10)
+    """
+
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes,
+                 use_se=False):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+        self.use_se = use_se
+        self.se_block = False
+        if self.use_se:
+            self.se_block = True
+
+        if self.use_se:
+            self.conv1_0 = _conv3x3(3, 32, stride=2, use_se=self.use_se)
+            self.bn1_0 = _bn(32)
+            self.conv1_1 = _conv3x3(32, 32, stride=1, use_se=self.use_se)
+            self.bn1_1 = _bn(32)
+            self.conv1_2 = _conv3x3(32, 64, stride=1, use_se=self.use_se)
+        else:
+            self.conv1 = _conv7x7(3, 64, stride=2)
+        self.bn1 = _bn(64)
+        self.relu = P.ReLU()
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0],
+                                       use_se=self.use_se)
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1],
+                                       use_se=self.use_se)
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2],
+                                       use_se=self.use_se,
+                                       se_block=self.se_block)
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3],
+                                       use_se=self.use_se,
+                                       se_block=self.se_block)
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes, use_se=self.use_se)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride, use_se=False, se_block=False):
+        """
+        Make stage network of ResNet.
+
+        Args:
+            block (Cell): Resnet block.
+            layer_num (int): Layer number.
+            in_channel (int): Input channel.
+            out_channel (int): Output channel.
+            stride (int): Stride size for the first convolutional layer.
+            se_block(bool): use se block in SE-ResNet50 net. Default: False.
+        Returns:
+            SequentialCell, the output layer.
+
+        Examples:
+            >>> _make_layer(ResidualBlock, 3, 128, 256, 2)
+        """
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride, use_se=use_se)
+        layers.append(resnet_block)
+        if se_block:
+            for _ in range(1, layer_num - 1):
+                resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
+                layers.append(resnet_block)
+            resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se, se_block=se_block)
+            layers.append(resnet_block)
+        else:
+            for _ in range(1, layer_num):
+                resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
+                layers.append(resnet_block)
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        if self.use_se:
+            x = self.conv1_0(x)
+            x = self.bn1_0(x)
+            x = self.relu(x)
+            x = self.conv1_1(x)
+            x = self.bn1_1(x)
+            x = self.relu(x)
+            x = self.conv1_2(x)
+        else:
+            x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return out
+
+
+def resnet50(class_num=10):
+    """
+    Get ResNet50 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet50 neural network.
+
+    Examples:
+        >>> net = resnet50(10)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
+
+def se_resnet50(class_num=1001):
+    """
+    Get SE-ResNet50 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of SE-ResNet50 neural network.
+
+    Examples:
+        >>> net = se-resnet50(1001)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num,
+                  use_se=True)
+
+def resnet101(class_num=1001):
+    """
+    Get ResNet101 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet101 neural network.
+
+    Examples:
+        >>> net = resnet101(1001)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 23, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
@@ -0,0 +1,191 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""train resnet."""
+import os
+import argparse
+import ast
+from mindspore import context
+from mindspore import Tensor
+from mindspore.nn.optim.momentum import Momentum
+from mindspore.train.model import Model
+from mindspore.context import ParallelMode
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
+from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.communication.management import init, get_rank, get_group_size
+from mindspore.common import set_seed
+import mindspore.nn as nn
+import mindspore.common.initializer as weight_init
+from src.lr_generator import get_lr, warmup_cosine_annealing_lr
+from src.CrossEntropySmooth import CrossEntropySmooth
+
+parser = argparse.ArgumentParser(description='Image classification')
+parser.add_argument('--net', type=str, default=None, help='Resnet Model, either resnet50 or resnet101')
+parser.add_argument('--dataset', type=str, default=None, help='Dataset, either cifar10 or imagenet2012')
+parser.add_argument('--run_distribute', type=ast.literal_eval, default=False, help='Run distribute')
+parser.add_argument('--device_num', type=int, default=1, help='Device num.')
+
+parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
+parser.add_argument('--device_target', type=str, default='Ascend', help='Device target')
+parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path')
+parser.add_argument('--parameter_server', type=ast.literal_eval, default=False, help='Run parameter server train')
+args_opt = parser.parse_args()
+
+set_seed(1)
+
+if args_opt.net == "resnet50":
+    from src.resnet import resnet50 as resnet
+    if args_opt.dataset == "cifar10":
+        from src.config import config1 as config
+        from src.dataset import create_dataset1 as create_dataset
+    else:
+        from src.config import config2 as config
+        from src.dataset import create_dataset2 as create_dataset
+elif args_opt.net == "resnet101":
+    from src.resnet import resnet101 as resnet
+    from src.config import config3 as config
+    from src.dataset import create_dataset3 as create_dataset
+else:
+    from src.resnet import se_resnet50 as resnet
+    from src.config import config4 as config
+    from src.dataset import create_dataset4 as create_dataset
+
+
+if __name__ == '__main__':
+    target = args_opt.device_target
+    ckpt_save_dir = config.save_checkpoint_path
+
+    # init context
+    context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False)
+    if args_opt.parameter_server:
+        context.set_ps_context(enable_ps=True)
+    if args_opt.run_distribute:
+        if target == "Ascend":
+            device_id = int(os.getenv('DEVICE_ID'))
+            context.set_context(device_id=device_id, enable_auto_mixed_precision=True)
+            context.set_auto_parallel_context(device_num=args_opt.device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+            if args_opt.net == "resnet50" or args_opt.net == "se-resnet50":
+                context.set_auto_parallel_context(all_reduce_fusion_config=[85, 160])
+            else:
+                context.set_auto_parallel_context(all_reduce_fusion_config=[180, 313])
+            init()
+        # GPU target
+        else:
+            init()
+            context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+            if args_opt.net == "resnet50":
+                context.set_auto_parallel_context(all_reduce_fusion_config=[85, 160])
+        ckpt_save_dir = config.save_checkpoint_path + "ckpt_" + str(get_rank()) + "/"
+
+    # create dataset
+    dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=True, repeat_num=1,
+                             batch_size=config.batch_size, target=target)
+    step_size = dataset.get_dataset_size()
+
+    # define net
+    net = resnet(class_num=config.class_num)
+    if args_opt.parameter_server:
+        net.set_param_ps()
+
+    # init weight
+    if args_opt.pre_trained:
+        param_dict = load_checkpoint(args_opt.pre_trained)
+        load_param_into_net(net, param_dict)
+    else:
+        for _, cell in net.cells_and_names():
+            if isinstance(cell, nn.Conv2d):
+                cell.weight.set_data(weight_init.initializer(weight_init.XavierUniform(),
+                                                             cell.weight.shape,
+                                                             cell.weight.dtype))
+            if isinstance(cell, nn.Dense):
+                cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(),
+                                                             cell.weight.shape,
+                                                             cell.weight.dtype))
+
+    # init lr
+    if args_opt.net == "resnet50" or args_opt.net == "se-resnet50":
+        lr = get_lr(lr_init=config.lr_init, lr_end=config.lr_end, lr_max=config.lr_max,
+                    warmup_epochs=config.warmup_epochs, total_epochs=config.epoch_size, steps_per_epoch=step_size,
+                    lr_decay_mode=config.lr_decay_mode)
+    else:
+        lr = warmup_cosine_annealing_lr(config.lr, step_size, config.warmup_epochs, config.epoch_size,
+                                        config.pretrain_epoch_size * step_size)
+    lr = Tensor(lr)
+
+    # define opt
+    decayed_params = []
+    no_decayed_params = []
+    for param in net.trainable_params():
+        if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name:
+            decayed_params.append(param)
+        else:
+            no_decayed_params.append(param)
+
+    group_params = [{'params': decayed_params, 'weight_decay': config.weight_decay},
+                    {'params': no_decayed_params},
+                    {'order_params': net.trainable_params()}]
+    opt = Momentum(group_params, lr, config.momentum, loss_scale=config.loss_scale)
+    # define loss, model
+    if target == "Ascend":
+        if args_opt.dataset == "imagenet2012":
+            if not config.use_label_smooth:
+                config.label_smooth_factor = 0.0
+            loss = CrossEntropySmooth(sparse=True, reduction="mean",
+                                      smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
+        else:
+            loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
+        loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
+        model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'},
+                      amp_level="O2", keep_batchnorm_fp32=False)
+    else:
+        # GPU target
+        if args_opt.dataset == "imagenet2012":
+            if not config.use_label_smooth:
+                config.label_smooth_factor = 0.0
+            loss = CrossEntropySmooth(sparse=True, reduction="mean",
+                                      smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
+        else:
+            loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
+
+        if (args_opt.net == "resnet101" or args_opt.net == "resnet50") and not args_opt.parameter_server:
+            opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum, config.weight_decay,
+                           config.loss_scale)
+            loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
+            # Mixed precision
+            model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'},
+                          amp_level="O2", keep_batchnorm_fp32=True)
+        else:
+            ## fp32 training
+            opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum, config.weight_decay)
+            model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+    # define callbacks
+    time_cb = TimeMonitor(data_size=step_size)
+    loss_cb = LossMonitor()
+    cb = [time_cb, loss_cb]
+    if config.save_checkpoint:
+        config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_epochs * step_size,
+                                     keep_checkpoint_max=config.keep_checkpoint_max)
+        ckpt_cb = ModelCheckpoint(prefix="resnet", directory=ckpt_save_dir, config=config_ck)
+        cb += [ckpt_cb]
+
+    # train model
+    if args_opt.net == "se-resnet50":
+        config.epoch_size = config.train_epoch_size
+    model.train(config.epoch_size - config.pretrain_epoch_size, dataset, callbacks=cb,
+                dataset_sink_mode=(not args_opt.parameter_server))
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+rm -rf /var/log/npu/slog/host-0/*
+if [ -d /usr/local/Ascend/nnae/latest ];then
+
+	export LD_LIBRARY_PATH=/usr/local/:/usr/local/lib/:/usr/lib/:/usr/local/Ascend/nnae/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/Ascend/driver/tools/hccn_tool/:/usr/local/mpirun4.0/lib
+	export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages
+	export PATH=$PATH:/usr/local/Ascend/nnae/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
+	export ASCEND_OPP_PATH=/usr/local/Ascend/nnae/latest/opp
+        export TBE_IMPL_PATH=/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core
+else
+	export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/mpirun4.0/lib
+	export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/ascend-toolkit/latest//fwkacllib/python/site-packages/:/usr/local/Ascend/ascend-toolkit/latest/tfplugin/python/site-packages:$projectDir
+	export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
+	export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
+	export TBE_IMPL_PATH=TBE_IMPL_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core
+fi
+
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+
+yamlPath=$1
+toolsPath=$2
+currtime=$3
+currentDir=$(cd "$(dirname "$0")/.."; pwd)
+train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}
+mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
+
+source ${currentDir}/config/npu_set_env.sh
+
+export REMARK_LOG_FILE=hw_resnet50.log
+benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
+export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}
+
+eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
+
+device_id=${eval_device_id}
+export DEVICE_NUM=1
+export DEVICE_ID=${device_id}
+export RANK_SIZE=${DEVICE_NUM}
+export RANK_ID=${device_id}
+
+
+
+python3.7 ${currentDir}/code/eval.py \
+    --net=resnet50 \
+    --dataset=imagenet2012 \
+    --dataset_path=${data_url} \
+    --checkpoint_path=${checkpoint_path}  > ${train_job_dir}/eval.out 2>&1
@@ -0,0 +1,95 @@
+#!/bin/bash
+
+rank_size=$1
+yamlPath=$2
+toolsPath=$3
+if [ -f /.dockerenv ];then
+        CLUSTER=$4
+MPIRUN_ALL_IP="$5"
+        export CLUSTER=${CLUSTER}
+fi
+currentDir=$(cd "$(dirname "$0")/.."; pwd)
+# 配置环境变量并调用 train 方法
+currtime=`date +%Y%m%d%H%M%S`
+mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
+train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
+echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} &"
+# user env
+export HCCL_CONNECT_TIMEOUT=600
+export JOB_ID=9999001
+export SLOG_PRINT_TO_STDOUT=0
+export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
+
+# 从 yaml 获取配置
+eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
+
+
+
+data_url_new=`echo ${data_url//\//\\\\/}`
+
+jsonFilePath=${currentDir}/code/src/config.py
+echo "start to modify inner config file"
+#echo "jsonfilepath is "${jsonFilePath}
+
+sed -i "0,/epoch_size.*$/s//epoch_size\': ${epoches},/" ${jsonFilePath}
+sed -i "0,/batch_size.*$/s//batch_size\': ${batch_size},/" ${jsonFilePath}
+sed -i "0,/save_checkpoint_epochs.*$/s//save_checkpoint_epochs\': ${save_checkpoint_epochs},/" ${jsonFilePath}
+sed -i "0,/loss_scale.*$/s//loss_scale\': ${loss_scale},/" ${jsonFilePath}
+sed -i 's/\r//g' ${jsonFilePath}
+if [ $? -eq 0 ] ;
+then
+    echo "modify inner config file success"
+else
+    echo "modify inner config file fail"
+        exit
+fi
+
+
+
+# device 列表, 若无指定 device 根据 rank_size 顺序选择
+eval device_group=\$device_group_${rank_size}p
+if [ x"${device_group}" == x"" ] || [ ${rank_size} -ge 8 ];then
+    device_group="$(seq 0 "$(expr $rank_size - 1)")"
+fi
+
+# get last device id in device_group, hw log in performance from the dir named last_device_id  
+device_group_str=`echo ${device_group} | sed 's/ //g'`
+first_device_id=`echo ${device_group_str: 0:1}`
+
+rank_id=0
+
+if [ x"${CLUSTER}" == x"True" ];then
+    # ln hw log
+    ln -snf ${train_job_dir}/0/hw_resnet50.log ${train_job_dir}
+    this_ip=$(hostname -I |awk '{print $1}')
+    for ip in $MPIRUN_ALL_IP;do
+        if [ x"$ip" != x"$this_ip" ];then
+            scp $yamlPath root@$ip:$yamlPath
+            scp $jsonFilePath root@$ip:$jsonFilePath
+        fi
+    done
+    export PATH=$PATH:/usr/local/mpirun4.0/bin
+    mpirun -H ${mpirun_ip} \
+    --bind-to none -map-by slot\
+    --allow-run-as-root \
+    --mca btl_tcp_if_exclude lo,docker0,endvnic,virbr0,vethf40501b,docker_gwbridge,br-f42ac38052b4\
+    --prefix /usr/local/mpirun4.0/ \
+    ${currentDir}/scripts/train.sh 0 $rank_size $yamlPath $currtime ${toolsPath} ${CLUSTER}
+elif [ x"$mode" == x"train" ];then
+    # ln hw log
+    ln -snf ${train_job_dir}/${first_device_id}/hw_resnet50.log ${train_job_dir}
+    for device_id in $device_group;do
+      ${currentDir}/scripts/train.sh $device_id $rank_size $yamlPath $currtime ${toolsPath} $rank_id &
+      let rank_id++
+    done
+else
+    echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${ckpt_path%results*}/results &"
+    ln -snf ${train_job_dir}/${first_device_id}/hw_resnet50.log ${train_job_dir}
+    bash ${currentDir}/scripts/eval.sh ${yamlPath} ${toolsPath} $currtime
+
+fi
+wait
+
+
+echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} &"
+
@@ -0,0 +1,134 @@
+#!/usr/bin/env bash
+
+device_id=$1
+rank_size=$2
+yamlPath=$3
+currtime=$4
+toolsPath=$5
+
+currentDir=$(cd "$(dirname "$0")/.."; pwd)
+
+export REMARK_LOG_FILE=hw_resnet50.log
+
+# ${mainDir%train*}/train/result/tensorflow/Bert/TrainingJob-${currtime}  # 
+mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
+export train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
+
+source ${currentDir}/config/npu_set_env.sh
+
+benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
+#atlasboost_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
+code_dir_path=${currentDir}/code
+export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}:${code_dir_path}
+
+# 从 yaml 获取配置
+eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
+
+# user env
+export YAML_PATH=$3
+export HCCL_CONNECT_TIMEOUT=600
+export JOB_ID=9999001
+export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
+export RANK_SIZE=${rank_size}
+export RANK_INDEX=0
+export SLOG_PRINT_TO_STDOUT=0
+export DEVICE_ID=$1
+DEVICE_INDEX=$(( DEVICE_ID + RANK_INDEX * 8 ))
+export DEVICE_INDEX=${DEVICE_INDEX}
+export MODEL_CKPT_PATH=${train_job_dir}/${device_id}/ckpt${device_id}
+export SERVER_ID=0
+
+cd ${train_job_dir}
+curd_dir=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
+export PYTHONPATH=$PYTHONPATH:${curd_dir}
+
+if [ x"$6" != x"True" ];then
+        rank_id=$6
+        export RANK_ID=$6
+else
+        device_id_mo=$(python3.7 -c "import src.tensorflow.mpi_ops as atlasboost;atlasboost.init(); \
+                device_id = atlasboost.local_rank();cluster_device_id = str(device_id); \
+                atlasboost.set_device_id(device_id);print(atlasboost.rank())")
+        device_id_mo=`echo $device_id_mo`
+        rank_id=${device_id_mo##* }
+        export RANK_ID=${rank_id}
+        device=${device_id_mo##*deviceid = }
+        device_id=${device%% phyid=*}
+        export DEVICE_ID=${device_id}
+        hccljson=${train_job_dir}/*.json
+        cp ${hccljson} ${currentDir}/config/${rank_size}p.json
+fi
+
+#mkdir exec path
+mkdir -p ${train_job_dir}/${device_id}
+cd ${train_job_dir}/${device_id}
+
+startTime=`date +%Y%m%d-%H:%M:%S`
+startTime_s=`date +%s`
+
+
+
+# 根据单卡/多卡区分调用参数
+
+
+if [ x"$6" == x"True" ];then
+	export CLUSTER=True
+	echo "run cluster"
+		--net=resnet50 \
+		--dataset=imagenet2012 \
+		--run_distribute=${run_distribute} \
+		--pre_trained=${pre_trained} \
+		--dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
+elif [ ${rank_size} -le 8 ];then
+	# 多卡单机
+	if [ ${rank_size} -eq 1 ]; then
+		run_distribute=False
+		device_num=1
+	else
+		run_distribute=True
+		device_num=${rank_size}
+	fi
+	export DEVICE_NUM=${device_num}
+	if [ x"${pre_trained}" == x"None" ];then
+	python3.7 ${currentDir}/code/train.py \
+			 --net=resnet50 \
+			 --dataset=imagenet2012 \
+			 --run_distribute=${run_distribute} \
+			 --device_num=${device_num} \
+			 --dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
+
+	else
+		python3.7 ${currentDir}/code/train.py \
+			--net=resnet50 \
+			--dataset=${data_url} \
+			--run_distribute=${run_distribute} \
+			--device_num=${device_num} \
+			--pre_trained=${pre_trained} \
+			--dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
+	fi
+fi
+
+
+if [ $? -eq 0 ] ;
+then
+    echo ":::ABK 1.0.0 resnet50 train success"
+    echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/train_${device_id}.log
+    echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/${device_id}/hw_resnet50.log
+else
+    echo ":::ABK 1.0.0 resnet50 train failed"
+    echo ":::ABK 1.0.0 resnet50 train failed" >> ${train_job_dir}/train_${device_id}.log
+    echo ":::ABK 1.0.0 resnet50 train failed" >> ${train_job_dir}/${device_id}/hw_resnet50.log
+fi
+
+endTime=`date +%Y%m%d-%H:%M:%S`
+endTime_s=`date +%s`
+
+sumTime=$[ $endTime_s - $startTime_s ]
+
+hour=$(( $sumTime/3600 ))
+min=$(( ($sumTime-${hour}*3600)/60 ))
+sec=$(( $sumTime-${hour}*3600-${min}*60 ))
+echo ":::ABK 1.0.0 resnet50 train total time：${hour}:${min}:${sec}"
+
+echo ":::ABK 1.0.0 resnet50 train total time：${hour}:${min}:${sec}" >> ${train_job_dir}/${device_id}/hw_resnet50.log
+