[add]上传训练benchmark by z00560161

This commit is contained in:
liang_chaoming@huawei.com
2020-10-19 20:22:23 +08:00
parent 22b83024f5
commit 82522e2f61
1225 changed files with 345421 additions and 0 deletions
@@ -0,0 +1,480 @@
# Contents
- [ResNet Description](#resnet-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Features](#features)
- [Mixed Precision](#mixed-precision)
- [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start)
- [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Training Process](#training-process)
- [Evaluation Process](#evaluation-process)
- [Model Description](#model-description)
- [Performance](#performance)
- [Evaluation Performance](#evaluation-performance)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [ResNet Description](#contents)
## Description
ResNet (residual neural network) was proposed by Kaiming He and other four Chinese of Microsoft Research Institute. Through the use of ResNet unit, it successfully trained 152 layers of neural network, and won the championship in ilsvrc2015. The error rate on top 5 was 3.57%, and the parameter quantity was lower than vggnet, so the effect was very outstanding. Traditional convolution network or full connection network will have more or less information loss. At the same time, it will lead to the disappearance or explosion of gradient, which leads to the failure of deep network training. ResNet solves this problem to a certain extent. By passing the input information to the output, the integrity of the information is protected. The whole network only needs to learn the part of the difference between input and output, which simplifies the learning objectives and difficulties.The structure of ResNet can accelerate the training of neural network very quickly, and the accuracy of the model is also greatly improved. At the same time, ResNet is very popular, even can be directly used in the concept net network.
These are examples of training ResNet50/ResNet101/SE-ResNet50 with CIFAR-10/ImageNet2012 dataset in MindSpore.ResNet50 and ResNet101 can reference [paper 1](https://arxiv.org/pdf/1512.03385.pdf) below, and SE-ResNet50 is a variant of ResNet50 which reference [paper 2](https://arxiv.org/abs/1709.01507) and [paper 3](https://arxiv.org/abs/1812.01187) below, Training SE-ResNet50 for just 24 epochs using 8 Ascend 910, we can reach top-1 accuracy of 75.9%.(Training ResNet101 with dataset CIFAR-10 and SE-ResNet50 with CIFAR-10 is not supported yet.)
## Paper
1.[paper](https://arxiv.org/pdf/1512.03385.pdf):Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition"
2.[paper](https://arxiv.org/abs/1709.01507):Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu. "Squeeze-and-Excitation Networks"
3.[paper](https://arxiv.org/abs/1812.01187):Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
# [Model Architecture](#contents)
The overall network architecture of ResNet is show below:
[Link](https://arxiv.org/pdf/1512.03385.pdf)
# [Dataset](#contents)
Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>)
- Dataset size60,000 32*32 colorful images in 10 classes
- Train50,000 images
- Test 10,000 images
- Data formatbinary files
- NoteData will be processed in dataset.py
- Download the dataset, the directory structure is as follows:
```
├─cifar-10-batches-bin
└─cifar-10-verify-bin
```
Dataset used: [ImageNet2012](http://www.image-net.org/)
- Dataset size 224*224 colorful images in 1000 classes
- Train1,281,167 images
- Test 50,000 images
- Data formatjpeg
- NoteData will be processed in dataset.py
- Download the dataset, the directory structure is as follows:
```
└─dataset
├─ilsvrc # train dataset
└─validation_preprocess # evaluate dataset
```
# [Features](#contents)
## Mixed Precision
The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data types, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching reduce precision.
# [Environment Requirements](#contents)
- HardwareAscend/GPU
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Quick Start](#contents)
After installing MindSpore via the official website, you can start training and evaluation as follows:
- Runing on Ascend
```
# distributed training
Usage: sh run_distribute_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# standalone training
Usage: sh run_standalone_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH]
[PRETRAINED_CKPT_PATH](optional)
# run evaluation example
Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
- Runing on GPU
```
# distributed training example
sh run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# standalone training example
sh run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# infer example
sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
# [Script Description](#contents)
## [Script and Sample Code](#contents)
```shell
.
└──resnet
├── README.md
├── script
├── run_distribute_train.sh # launch ascend distributed training(8 pcs)
├── run_parameter_server_train.sh # launch ascend parameter server training(8 pcs)
├── run_eval.sh # launch ascend evaluation
├── run_standalone_train.sh # launch ascend standalone training(1 pcs)
├── run_distribute_train_gpu.sh # launch gpu distributed training(8 pcs)
├── run_parameter_server_train_gpu.sh # launch gpu parameter server training(8 pcs)
├── run_eval_gpu.sh # launch gpu evaluation
└── run_standalone_train_gpu.sh # launch gpu standalone training(1 pcs)
├── src
├── config.py # parameter configuration
├── dataset.py # data preprocessing
├── crossentropy.py # loss definition for ImageNet2012 dataset
├── lr_generator.py # generate learning rate for each step
└── resnet.py # resnet backbone, including resnet50 and resnet101 and se-resnet50
├── eval.py # eval net
└── train.py # train net
```
## [Script Parameters](#contents)
Parameters for both training and evaluation can be set in config.py.
- Config for ResNet50, CIFAR-10 dataset
```
"class_num": 10, # dataset class num
"batch_size": 32, # batch size of input tensor
"loss_scale": 1024, # loss scale
"momentum": 0.9, # momentum
"weight_decay": 1e-4, # weight decay
"epoch_size": 90, # only valid for taining, which is always 1 for inference
"pretrain_epoch_size": 0, # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_epochs": 5, # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last step
"keep_checkpoint_max": 10, # only keep the last keep_checkpoint_max checkpoint
"save_checkpoint_path": "./", # path to save checkpoint
"warmup_epochs": 5, # number of warmup epoch
"lr_decay_mode": "poly" # decay mode can be selected in steps, ploy and default
"lr_init": 0.01, # initial learning rate
"lr_end": 0.00001, # final learning rate
"lr_max": 0.1, # maximum learning rate
```
- Config for ResNet50, ImageNet2012 dataset
```
"class_num": 1001, # dataset class number
"batch_size": 32, # batch size of input tensor
"loss_scale": 1024, # loss scale
"momentum": 0.9, # momentum optimizer
"weight_decay": 1e-4, # weight decay
"epoch_size": 90, # only valid for taining, which is always 1 for inference
"pretrain_epoch_size": 0, # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_epochs": 5, # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
"keep_checkpoint_max": 10, # only keep the last keep_checkpoint_max checkpoint
"save_checkpoint_path": "./", # path to save checkpoint relative to the executed path
"warmup_epochs": 0, # number of warmup epoch
"lr_decay_mode": "Linear", # decay mode for generating learning rate
"label_smooth": True, # label smooth
"label_smooth_factor": 0.1, # label smooth factor
"lr_init": 0, # initial learning rate
"lr_max": 0.1, # maximum learning rate
"lr_end": 0.0, # minimum learning rate
```
- Config for ResNet101, ImageNet2012 dataset
```
"class_num": 1001, # dataset class number
"batch_size": 32, # batch size of input tensor
"loss_scale": 1024, # loss scale
"momentum": 0.9, # momentum optimizer
"weight_decay": 1e-4, # weight decay
"epoch_size": 120, # epoch size for training
"pretrain_epoch_size": 0, # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_epochs": 5, # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
"keep_checkpoint_max": 10, # only keep the last keep_checkpoint_max checkpoint
"save_checkpoint_path": "./", # path to save checkpoint relative to the executed path
"warmup_epochs": 0, # number of warmup epoch
"lr_decay_mode": "cosine" # decay mode for generating learning rate
"label_smooth": 1, # label_smooth
"label_smooth_factor": 0.1, # label_smooth_factor
"lr": 0.1 # base learning rate
```
- Config for SE-ResNet50, ImageNet2012 dataset
```
"class_num": 1001, # dataset class number
"batch_size": 32, # batch size of input tensor
"loss_scale": 1024, # loss scale
"momentum": 0.9, # momentum optimizer
"weight_decay": 1e-4, # weight decay
"epoch_size": 28 , # epoch size for creating learning rate
"train_epoch_size": 24 # actual train epoch size
"pretrain_epoch_size": 0, # epoch size that model has been trained before loading pretrained checkpoint, actual training epoch size is equal to epoch_size minus pretrain_epoch_size
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_epochs": 4, # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
"keep_checkpoint_max": 10, # only keep the last keep_checkpoint_max checkpoint
"save_checkpoint_path": "./", # path to save checkpoint relative to the executed path
"warmup_epochs": 3, # number of warmup epoch
"lr_decay_mode": "cosine" # decay mode for generating learning rate
"label_smooth": True, # label_smooth
"label_smooth_factor": 0.1, # label_smooth_factor
"lr_init": 0.0, # initial learning rate
"lr_max": 0.3, # maximum learning rate
"lr_end": 0.0001, # end learning rate
```
## [Training Process](#contents)
### Usage
#### Running on Ascend
```
# distributed training
Usage: sh run_distribute_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# standalone training
Usage: sh run_standalone_train.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH]
[PRETRAINED_CKPT_PATH](optional)
# run evaluation example
Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
Please follow the instructions in the link [hccn_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the followings in log.
#### Running on GPU
```
# distributed training example
sh run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# standalone training example
sh run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# infer example
sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
#### Running parameter server mode training
- Parameter server training Ascend example
```
sh run_parameter_server_train.sh [resnet50|resnet101] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
```
- Parameter server training GPU example
```
sh run_parameter_server_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
```
### Result
- Training ResNet50 with CIFAR-10 dataset
```
# distribute training result(8 pcs)
epoch: 1 step: 195, loss is 1.9601055
epoch: 2 step: 195, loss is 1.8555021
epoch: 3 step: 195, loss is 1.6707983
epoch: 4 step: 195, loss is 1.8162166
epoch: 5 step: 195, loss is 1.393667
...
```
- Training ResNet50 with ImageNet2012 dataset
```
# distribute training result(8 pcs)
epoch: 1 step: 5004, loss is 4.8995576
epoch: 2 step: 5004, loss is 3.9235563
epoch: 3 step: 5004, loss is 3.833077
epoch: 4 step: 5004, loss is 3.2795618
epoch: 5 step: 5004, loss is 3.1978393
...
```
- Training ResNet101 with ImageNet2012 dataset
```
# distribute training result(8p)
epoch: 1 step: 5004, loss is 4.805483
epoch: 2 step: 5004, loss is 3.2121816
epoch: 3 step: 5004, loss is 3.429647
epoch: 4 step: 5004, loss is 3.3667371
epoch: 5 step: 5004, loss is 3.1718972
...
epoch: 67 step: 5004, loss is 2.2768745
epoch: 68 step: 5004, loss is 1.7223864
epoch: 69 step: 5004, loss is 2.0665488
epoch: 70 step: 5004, loss is 1.8717369
...
```
- Training SE-ResNet50 with ImageNet2012 dataset
```
# distribute training result(8 pcs)
epoch: 1 step: 5004, loss is 5.1779146
epoch: 2 step: 5004, loss is 4.139395
epoch: 3 step: 5004, loss is 3.9240637
epoch: 4 step: 5004, loss is 3.5011306
epoch: 5 step: 5004, loss is 3.3501816
...
```
## [Evaluation Process](#contents)
### Usage
#### Running on Ascend
```
# evaluation
Usage: sh run_eval.sh [resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
```
# evaluation example
sh run_eval.sh resnet50 cifar10 ~/cifar10-10-verify-bin ~/resnet50_cifar10/train_parallel0/resnet-90_195.ckpt
```
> checkpoint can be produced in training process.
#### Running on GPU
```
sh run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```
### Result
Evaluation result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log.
- Evaluating ResNet50 with CIFAR-10 dataset
```
result: {'acc': 0.91446314102564111} ckpt=~/resnet50_cifar10/train_parallel0/resnet-90_195.ckpt
```
- Evaluating ResNet50 with ImageNet2012 dataset
```
result: {'acc': 0.7671054737516005} ckpt=train_parallel0/resnet-90_5004.ckpt
```
- Evaluating ResNet101 with ImageNet2012 dataset
```
result: {'top_5_accuracy': 0.9429417413572343, 'top_1_accuracy': 0.7853513124199744} ckpt=train_parallel0/resnet-120_5004.ckpt
```
- Evaluating SE-ResNet50 with ImageNet2012 dataset
```
result: {'top_5_accuracy': 0.9342589628681178, 'top_1_accuracy': 0.768065781049936} ckpt=train_parallel0/resnet-24_5004.ckpt
```
# [Model Description](#contents)
## [Performance](#contents)
### Evaluation Performance
#### ResNet50 on CIFAR-10
| Parameters | Ascend 910 | GPU |
| -------------------------- | -------------------------------------- |---------------------------------- |
| Model Version | ResNet50-v1.5 |ResNet50-v1.5|
| Resource | Ascend 910CPU 2.60GHz 56coresMemory 314G | GPU(Tesla V100 SXM2)CPU 2.1GHz 24coresMemory 128G
| uploaded Date | 04/01/2020 (month/day/year) | 08/01/2020 (month/day/year)
| MindSpore Version | 0.1.0-alpha |0.6.0-alpha |
| Dataset | CIFAR-10 | CIFAR-10
| Training Parameters | epoch=90, steps per epoch=195, batch_size = 32 |epoch=90, steps per epoch=195, batch_size = 32 |
| Optimizer | Momentum |Momentum|
| Loss Function | Softmax Cross Entropy |Softmax Cross Entropy |
| outputs | probability | probability |
| Loss | 0.000356 | 0.000716 |
| Speed | 18.4ms/step8pcs |69ms/step8pcs|
| Total time | 6 mins | 20.2 mins|
| Parameters (M) | 25.5 | 25.5 |
| Checkpoint for Fine tuning | 179.7M (.ckpt file) |179.7M (.ckpt file)|
| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
#### ResNet50 on ImageNet2012
| Parameters | Ascend 910 | GPU |
| -------------------------- | -------------------------------------- |---------------------------------- |
| Model Version | ResNet50-v1.5 |ResNet50-v1.5|
| Resource | Ascend 910CPU 2.60GHz 56coresMemory 314G | GPU(Tesla V100 SXM2)CPU 2.1GHz 24coresMemory 128G
| uploaded Date | 04/01/2020 (month/day/year) | 08/01/2020 (month/day/year)
| MindSpore Version | 0.1.0-alpha |0.6.0-alpha |
| Dataset | ImageNet2012 | ImageNet2012|
| Training Parameters | epoch=90, steps per epoch=5004, batch_size = 32 |epoch=90, steps per epoch=5004, batch_size = 32 |
| Optimizer | Momentum |Momentum|
| Loss Function | Softmax Cross Entropy |Softmax Cross Entropy |
| outputs | probability | probability |
| Loss | 1.8464266 | 1.9023 |
| Speed | 18.4ms/step8pcs |67.1ms/step8pcs|
| Total time | 139 mins | 500 mins|
| Parameters (M) | 25.5 | 25.5 |
| Checkpoint for Fine tuning | 197M (.ckpt file) |197M (.ckpt file) |
| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
#### ResNet101 on ImageNet2012
| Parameters | Ascend 910 | GPU |
| -------------------------- | -------------------------------------- |---------------------------------- |
| Model Version | ResNet101 |ResNet101|
| Resource | Ascend 910CPU 2.60GHz 56coresMemory 314G | GPU(Tesla V100 SXM2)CPU 2.1GHz 24coresMemory 128G
| uploaded Date | 04/01/2020 (month/day/year) | 08/01/2020 (month/day/year)
| MindSpore Version | 0.1.0-alpha |0.6.0-alpha |
| Dataset | ImageNet2012 | ImageNet2012|
| Training Parameters | epoch=120, steps per epoch=5004, batch_size = 32 |epoch=120, steps per epoch=5004, batch_size = 32 |
| Optimizer | Momentum |Momentum|
| Loss Function | Softmax Cross Entropy |Softmax Cross Entropy |
| outputs | probability | probability |
| Loss | 1.6453942 | 1.7023412 |
| Speed | 30.3ms/step8pcs |108.6ms/step8pcs|
| Total time | 301 mins | 1100 mins|
| Parameters (M) | 44.6 | 44.6 |
| Checkpoint for Fine tuning | 343M (.ckpt file) |343M (.ckpt file) |
| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
#### SE-ResNet50 on ImageNet2012
| Parameters | Ascend 910
| -------------------------- | ------------------------------------------------------------------------ |
| Model Version | SE-ResNet50 |
| Resource | Ascend 910CPU 2.60GHz 56coresMemory 314G |
| uploaded Date | 08/16/2020 (month/day/year) |
| MindSpore Version | 0.7.0-alpha |
| Dataset | ImageNet2012 |
| Training Parameters | epoch=24, steps per epoch=5004, batch_size = 32 |
| Optimizer | Momentum |
| Loss Function | Softmax Cross Entropy |
| outputs | probability |
| Loss | 1.754404 |
| Speed | 24.6ms/step8pcs |
| Total time | 49.3 mins |
| Parameters (M) | 25.5 |
| Checkpoint for Fine tuning | 215.9M (.ckpt file) |
| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
# [Description of Random Situation](#contents)
In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
# [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
@@ -0,0 +1,89 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train resnet."""
import os
import argparse
from mindspore import context
from mindspore.common import set_seed
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from mindspore.train.model import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.CrossEntropySmooth import CrossEntropySmooth
parser = argparse.ArgumentParser(description='Image classification')
parser.add_argument('--net', type=str, default=None, help='Resnet Model, either resnet50 or resnet101')
parser.add_argument('--dataset', type=str, default=None, help='Dataset, either cifar10 or imagenet2012')
parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
parser.add_argument('--device_target', type=str, default='Ascend', help='Device target')
args_opt = parser.parse_args()
set_seed(1)
if args_opt.net == "resnet50":
from src.resnet import resnet50 as resnet
if args_opt.dataset == "cifar10":
from src.config import config1 as config
from src.dataset import create_dataset1 as create_dataset
else:
from src.config import config2 as config
from src.dataset import create_dataset2 as create_dataset
elif args_opt.net == "resnet101":
from src.resnet import resnet101 as resnet
from src.config import config3 as config
from src.dataset import create_dataset3 as create_dataset
else:
from src.resnet import se_resnet50 as resnet
from src.config import config4 as config
from src.dataset import create_dataset4 as create_dataset
if __name__ == '__main__':
target = args_opt.device_target
# init context
context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False)
if target != "GPU":
device_id = int(os.getenv('DEVICE_ID'))
context.set_context(device_id=device_id)
# create dataset
dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=False, batch_size=config.batch_size,
target=target)
step_size = dataset.get_dataset_size()
# define net
net = resnet(class_num=config.class_num)
# load checkpoint
param_dict = load_checkpoint(args_opt.checkpoint_path)
load_param_into_net(net, param_dict)
net.set_train(False)
# define loss, model
if args_opt.dataset == "imagenet2012":
if not config.use_label_smooth:
config.label_smooth_factor = 0.0
loss = CrossEntropySmooth(sparse=True, reduction='mean',
smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
else:
loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# define model
model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
# eval model
res = model.eval(dataset)
print("result:", res, "ckpt=", args_opt.checkpoint_path)
@@ -0,0 +1,25 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""hub config."""
from src.resnet import resnet50, resnet101, se_resnet50
def create_network(name, *args, **kwargs):
if name == 'resnet50':
return resnet50(*args, **kwargs)
if name == 'resnet101':
return resnet101(*args, **kwargs)
if name == 'se_resnet50':
return se_resnet50(*args, **kwargs)
raise NotImplementedError(f"{name} is not implemented in the repo")
@@ -0,0 +1,38 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""define loss function for network"""
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.common import dtype as mstype
from mindspore.nn.loss.loss import _Loss
from mindspore.ops import functional as F
from mindspore.ops import operations as P
class CrossEntropySmooth(_Loss):
"""CrossEntropy"""
def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
super(CrossEntropySmooth, self).__init__()
self.onehot = P.OneHot()
self.sparse = sparse
self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
def construct(self, logit, label):
if self.sparse:
label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
loss = self.ce(logit, label)
return loss
@@ -0,0 +1,106 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
network config setting, will be used in train.py and eval.py
"""
from easydict import EasyDict as ed
# config for resnet50, imagenet2012
config2 = ed({
'class_num': 1001,
'batch_size': 256,
'loss_scale': 1024,
'momentum': 0.9,
'weight_decay': 1e-4,
'epoch_size': 5,
'pretrain_epoch_size': 0,
'save_checkpoint': True,
'save_checkpoint_epochs': 5,
'keep_checkpoint_max': 10,
'save_checkpoint_path': './',
'warmup_epochs': 0,
'lr_decay_mode': 'linear',
'use_label_smooth': True,
'label_smooth_factor': 0.1,
'lr_init': 0,
'lr_max': 0.8,
'lr_end': 0.0
})
# config for resent50, cifar10
config1 = ed({
'class_num': 10,
'batch_size': 32,
'loss_scale': 1024,
'momentum': 0.9,
'weight_decay': 1e-4,
'epoch_size': 90,
'pretrain_epoch_size': 0,
'save_checkpoint': True,
'save_checkpoint_epochs': 5,
'keep_checkpoint_max': 10,
'save_checkpoint_path': './',
'warmup_epochs': 5,
'lr_decay_mode': 'poly',
'lr_init': 0.01,
'lr_end': 0.00001,
'lr_max': 0.1
})
# config for resent101, imagenet2012
config3 = ed({
'class_num': 1001,
'batch_size': 32,
'loss_scale': 1024,
'momentum': 0.9,
'weight_decay': 1e-4,
'epoch_size': 120,
'pretrain_epoch_size': 0,
'save_checkpoint': True,
'save_checkpoint_epochs': 5,
'keep_checkpoint_max': 10,
'save_checkpoint_path': './',
'warmup_epochs': 0,
'lr_decay_mode': 'cosine',
'use_label_smooth': True,
'label_smooth_factor': 0.1,
'lr': 0.1
})
# config for se-resnet50, imagenet2012
config4 = ed({
'class_num': 1001,
'batch_size': 32,
'loss_scale': 1024,
'momentum': 0.9,
'weight_decay': 1e-4,
'epoch_size': 28,
'train_epoch_size': 24,
'pretrain_epoch_size': 0,
'save_checkpoint': True,
'save_checkpoint_epochs': 4,
'keep_checkpoint_max': 10,
'save_checkpoint_path': './',
'warmup_epochs': 3,
'lr_decay_mode': 'cosine',
'use_label_smooth': True,
'label_smooth_factor': 0.1,
'lr_init': 0.0,
'lr_max': 0.3,
'lr_end': 0.0001
})
@@ -0,0 +1,263 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
create train or eval dataset.
"""
import os
import mindspore.common.dtype as mstype
import mindspore.dataset.engine as de
import mindspore.dataset.vision.c_transforms as C
import mindspore.dataset.transforms.c_transforms as C2
from mindspore.communication.management import init, get_rank, get_group_size
def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
"""
create a train or evaluate cifar10 dataset for resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
Returns:
dataset
"""
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
init()
rank_id = get_rank()
device_num = get_group_size()
if device_num == 1:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
# define map operations
trans = []
if do_train:
trans += [
C.RandomCrop((32, 32), (4, 4, 4, 4)),
C.RandomHorizontalFlip(prob=0.5)
]
trans += [
C.Resize((224, 224)),
C.Rescale(1.0 / 255.0, 0.0),
C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
"""
create a train or eval imagenet2012 dataset for resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
Returns:
dataset
"""
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
init()
rank_id = get_rank()
device_num = get_group_size()
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(prob=0.5),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(256),
C.CenterCrop(image_size),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
"""
create a train or eval imagenet2012 dataset for resnet101
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
Returns:
dataset
"""
device_num, rank_id = _get_rank_info()
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [0.475 * 255, 0.451 * 255, 0.392 * 255]
std = [0.275 * 255, 0.267 * 255, 0.278 * 255]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(rank_id / (rank_id + 1)),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(256),
C.CenterCrop(image_size),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend"):
"""
create a train or eval imagenet2012 dataset for se-resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
Returns:
dataset
"""
if target == "Ascend":
device_num, rank_id = _get_rank_info()
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [123.68, 116.78, 103.94]
std = [1.0, 1.0, 1.0]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(prob=0.5),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(292),
C.CenterCrop(256),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=12)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def _get_rank_info():
"""
get rank size and rank id
"""
rank_size = int(os.environ.get("RANK_SIZE", 1))
if rank_size > 1:
rank_size = get_group_size()
rank_id = get_rank()
else:
rank_size = 1
rank_id = 0
return rank_size, rank_id
@@ -0,0 +1,205 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""learning rate generator"""
import math
import numpy as np
def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps):
"""
Applies three steps decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps]
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr = lr_init + (lr_max - lr_init) * i / warmup_steps
else:
if i < decay_epoch_index[0]:
lr = lr_max
elif i < decay_epoch_index[1]:
lr = lr_max * 0.1
elif i < decay_epoch_index[2]:
lr = lr_max * 0.01
else:
lr = lr_max * 0.001
lr_each_step.append(lr)
return lr_each_step
def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies polynomial decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
lr_each_step = []
if warmup_steps != 0:
inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
else:
inc_each_step = 0
for i in range(total_steps):
if i < warmup_steps:
lr = float(lr_init) + inc_each_step * float(i)
else:
base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
lr = float(lr_max) * base * base
if lr < 0.0:
lr = 0.0
lr_each_step.append(lr)
return lr_each_step
def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies cosine decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
decay_steps = total_steps - warmup_steps
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
lr = float(lr_init) + lr_inc * (i + 1)
else:
cosine_decay = 0.5 * (1 + math.cos(math.pi * (i-warmup_steps) / decay_steps))
lr = (lr_max-lr_end)*cosine_decay + lr_end
lr_each_step.append(lr)
return lr_each_step
def _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies liner decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr = lr_init + (lr_max - lr_init) * i / warmup_steps
else:
lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps)
lr_each_step.append(lr)
return lr_each_step
def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode):
"""
generate learning rate array
Args:
lr_init(float): init learning rate
lr_end(float): end learning rate
lr_max(float): max learning rate
warmup_epochs(int): number of warmup epochs
total_epochs(int): total epoch of training
steps_per_epoch(int): steps of one epoch
lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or liner(default)
Returns:
np.array, learning rate array
"""
lr_each_step = []
total_steps = steps_per_epoch * total_epochs
warmup_steps = steps_per_epoch * warmup_epochs
if lr_decay_mode == 'steps':
lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'poly':
lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'cosine':
lr_each_step = _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
else:
lr_each_step = _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
lr_each_step = np.array(lr_each_step).astype(np.float32)
return lr_each_step
def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
lr = float(init_lr) + lr_inc * current_step
return lr
def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0):
"""
generate learning rate array with cosine
Args:
lr(float): base learning rate
steps_per_epoch(int): steps size of one epoch
warmup_epochs(int): number of warmup epochs
max_epoch(int): total epochs of training
global_step(int): the current start index of lr array
Returns:
np.array, learning rate array
"""
base_lr = lr
warmup_init_lr = 0
total_steps = int(max_epoch * steps_per_epoch)
warmup_steps = int(warmup_epochs * steps_per_epoch)
decay_steps = total_steps - warmup_steps
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
else:
linear_decay = (total_steps - i) / decay_steps
cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
decayed = linear_decay * cosine_decay + 0.00001
lr = base_lr * decayed
lr_each_step.append(lr)
lr_each_step = np.array(lr_each_step).astype(np.float32)
learning_rate = lr_each_step[global_step:]
return learning_rate
@@ -0,0 +1,393 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""ResNet."""
import numpy as np
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.common.tensor import Tensor
from scipy.stats import truncnorm
def _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size):
fan_in = in_channel * kernel_size * kernel_size
scale = 1.0
scale /= max(1., fan_in)
stddev = (scale ** 0.5) / .87962566103423978
mu, sigma = 0, stddev
weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size)
weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size))
return Tensor(weight, dtype=mstype.float32)
def _weight_variable(shape, factor=0.01):
init_value = np.random.randn(*shape).astype(np.float32) * factor
return Tensor(init_value)
def _conv3x3(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=3)
else:
weight_shape = (out_channel, in_channel, 3, 3)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=3, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _conv1x1(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=1)
else:
weight_shape = (out_channel, in_channel, 1, 1)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _conv7x7(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=7)
else:
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
def _bn_last(channel):
return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1)
def _fc(in_channel, out_channel, use_se=False):
if use_se:
weight = np.random.normal(loc=0, scale=0.01, size=out_channel*in_channel)
weight = Tensor(np.reshape(weight, (out_channel, in_channel)), dtype=mstype.float32)
else:
weight_shape = (out_channel, in_channel)
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
class ResidualBlock(nn.Cell):
"""
ResNet V1 residual block definition.
Args:
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer. Default: 1.
use_se (bool): enable SE-ResNet50 net. Default: False.
se_block(bool): use se block in SE-ResNet50 net. Default: False.
Returns:
Tensor, output tensor.
Examples:
>>> ResidualBlock(3, 256, stride=2)
"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1,
use_se=False, se_block=False):
super(ResidualBlock, self).__init__()
self.stride = stride
self.use_se = use_se
self.se_block = se_block
channel = out_channel // self.expansion
self.conv1 = _conv1x1(in_channel, channel, stride=1, use_se=self.use_se)
self.bn1 = _bn(channel)
if self.use_se and self.stride != 1:
self.e2 = nn.SequentialCell([_conv3x3(channel, channel, stride=1, use_se=True), _bn(channel),
nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same')])
else:
self.conv2 = _conv3x3(channel, channel, stride=stride, use_se=self.use_se)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se)
self.bn3 = _bn_last(out_channel)
if self.se_block:
self.se_global_pool = P.ReduceMean(keep_dims=False)
self.se_dense_0 = _fc(out_channel, int(out_channel/4), use_se=self.use_se)
self.se_dense_1 = _fc(int(out_channel/4), out_channel, use_se=self.use_se)
self.se_sigmoid = nn.Sigmoid()
self.se_mul = P.Mul()
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
if self.use_se:
if stride == 1:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel,
stride, use_se=self.use_se), _bn(out_channel)])
else:
self.down_sample_layer = nn.SequentialCell([nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same'),
_conv1x1(in_channel, out_channel, 1,
use_se=self.use_se), _bn(out_channel)])
else:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride,
use_se=self.use_se), _bn(out_channel)])
self.add = P.TensorAdd()
def construct(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
if self.use_se and self.stride != 1:
out = self.e2(out)
else:
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.se_block:
out_se = out
out = self.se_global_pool(out, (2, 3))
out = self.se_dense_0(out)
out = self.relu(out)
out = self.se_dense_1(out)
out = self.se_sigmoid(out)
out = F.reshape(out, F.shape(out) + (1, 1))
out = self.se_mul(out, out_se)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
class ResNet(nn.Cell):
"""
ResNet architecture.
Args:
block (Cell): Block for network.
layer_nums (list): Numbers of block in different layers.
in_channels (list): Input channel in each layer.
out_channels (list): Output channel in each layer.
strides (list): Stride size in each layer.
num_classes (int): The number of classes that the training images are belonging to.
use_se (bool): enable SE-ResNet50 net. Default: False.
se_block(bool): use se block in SE-ResNet50 net in layer 3 and layer 4. Default: False.
Returns:
Tensor, output tensor.
Examples:
>>> ResNet(ResidualBlock,
>>> [3, 4, 6, 3],
>>> [64, 256, 512, 1024],
>>> [256, 512, 1024, 2048],
>>> [1, 2, 2, 2],
>>> 10)
"""
def __init__(self,
block,
layer_nums,
in_channels,
out_channels,
strides,
num_classes,
use_se=False):
super(ResNet, self).__init__()
if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
self.use_se = use_se
self.se_block = False
if self.use_se:
self.se_block = True
if self.use_se:
self.conv1_0 = _conv3x3(3, 32, stride=2, use_se=self.use_se)
self.bn1_0 = _bn(32)
self.conv1_1 = _conv3x3(32, 32, stride=1, use_se=self.use_se)
self.bn1_1 = _bn(32)
self.conv1_2 = _conv3x3(32, 64, stride=1, use_se=self.use_se)
else:
self.conv1 = _conv7x7(3, 64, stride=2)
self.bn1 = _bn(64)
self.relu = P.ReLU()
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
self.layer1 = self._make_layer(block,
layer_nums[0],
in_channel=in_channels[0],
out_channel=out_channels[0],
stride=strides[0],
use_se=self.use_se)
self.layer2 = self._make_layer(block,
layer_nums[1],
in_channel=in_channels[1],
out_channel=out_channels[1],
stride=strides[1],
use_se=self.use_se)
self.layer3 = self._make_layer(block,
layer_nums[2],
in_channel=in_channels[2],
out_channel=out_channels[2],
stride=strides[2],
use_se=self.use_se,
se_block=self.se_block)
self.layer4 = self._make_layer(block,
layer_nums[3],
in_channel=in_channels[3],
out_channel=out_channels[3],
stride=strides[3],
use_se=self.use_se,
se_block=self.se_block)
self.mean = P.ReduceMean(keep_dims=True)
self.flatten = nn.Flatten()
self.end_point = _fc(out_channels[3], num_classes, use_se=self.use_se)
def _make_layer(self, block, layer_num, in_channel, out_channel, stride, use_se=False, se_block=False):
"""
Make stage network of ResNet.
Args:
block (Cell): Resnet block.
layer_num (int): Layer number.
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer.
se_block(bool): use se block in SE-ResNet50 net. Default: False.
Returns:
SequentialCell, the output layer.
Examples:
>>> _make_layer(ResidualBlock, 3, 128, 256, 2)
"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride, use_se=use_se)
layers.append(resnet_block)
if se_block:
for _ in range(1, layer_num - 1):
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
layers.append(resnet_block)
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se, se_block=se_block)
layers.append(resnet_block)
else:
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
layers.append(resnet_block)
return nn.SequentialCell(layers)
def construct(self, x):
if self.use_se:
x = self.conv1_0(x)
x = self.bn1_0(x)
x = self.relu(x)
x = self.conv1_1(x)
x = self.bn1_1(x)
x = self.relu(x)
x = self.conv1_2(x)
else:
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
out = self.mean(c5, (2, 3))
out = self.flatten(out)
out = self.end_point(out)
return out
def resnet50(class_num=10):
"""
Get ResNet50 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet50 neural network.
Examples:
>>> net = resnet50(10)
"""
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
def se_resnet50(class_num=1001):
"""
Get SE-ResNet50 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of SE-ResNet50 neural network.
Examples:
>>> net = se-resnet50(1001)
"""
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num,
use_se=True)
def resnet101(class_num=1001):
"""
Get ResNet101 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet101 neural network.
Examples:
>>> net = resnet101(1001)
"""
return ResNet(ResidualBlock,
[3, 4, 23, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
@@ -0,0 +1,191 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train resnet."""
import os
import argparse
import ast
from mindspore import context
from mindspore import Tensor
from mindspore.nn.optim.momentum import Momentum
from mindspore.train.model import Model
from mindspore.context import ParallelMode
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from mindspore.train.loss_scale_manager import FixedLossScaleManager
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.communication.management import init, get_rank, get_group_size
from mindspore.common import set_seed
import mindspore.nn as nn
import mindspore.common.initializer as weight_init
from src.lr_generator import get_lr, warmup_cosine_annealing_lr
from src.CrossEntropySmooth import CrossEntropySmooth
parser = argparse.ArgumentParser(description='Image classification')
parser.add_argument('--net', type=str, default=None, help='Resnet Model, either resnet50 or resnet101')
parser.add_argument('--dataset', type=str, default=None, help='Dataset, either cifar10 or imagenet2012')
parser.add_argument('--run_distribute', type=ast.literal_eval, default=False, help='Run distribute')
parser.add_argument('--device_num', type=int, default=1, help='Device num.')
parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
parser.add_argument('--device_target', type=str, default='Ascend', help='Device target')
parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path')
parser.add_argument('--parameter_server', type=ast.literal_eval, default=False, help='Run parameter server train')
args_opt = parser.parse_args()
set_seed(1)
if args_opt.net == "resnet50":
from src.resnet import resnet50 as resnet
if args_opt.dataset == "cifar10":
from src.config import config1 as config
from src.dataset import create_dataset1 as create_dataset
else:
from src.config import config2 as config
from src.dataset import create_dataset2 as create_dataset
elif args_opt.net == "resnet101":
from src.resnet import resnet101 as resnet
from src.config import config3 as config
from src.dataset import create_dataset3 as create_dataset
else:
from src.resnet import se_resnet50 as resnet
from src.config import config4 as config
from src.dataset import create_dataset4 as create_dataset
if __name__ == '__main__':
target = args_opt.device_target
ckpt_save_dir = config.save_checkpoint_path
# init context
context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False)
if args_opt.parameter_server:
context.set_ps_context(enable_ps=True)
if args_opt.run_distribute:
if target == "Ascend":
device_id = int(os.getenv('DEVICE_ID'))
context.set_context(device_id=device_id, enable_auto_mixed_precision=True)
context.set_auto_parallel_context(device_num=args_opt.device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
if args_opt.net == "resnet50" or args_opt.net == "se-resnet50":
context.set_auto_parallel_context(all_reduce_fusion_config=[85, 160])
else:
context.set_auto_parallel_context(all_reduce_fusion_config=[180, 313])
init()
# GPU target
else:
init()
context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
if args_opt.net == "resnet50":
context.set_auto_parallel_context(all_reduce_fusion_config=[85, 160])
ckpt_save_dir = config.save_checkpoint_path + "ckpt_" + str(get_rank()) + "/"
# create dataset
dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=True, repeat_num=1,
batch_size=config.batch_size, target=target)
step_size = dataset.get_dataset_size()
# define net
net = resnet(class_num=config.class_num)
if args_opt.parameter_server:
net.set_param_ps()
# init weight
if args_opt.pre_trained:
param_dict = load_checkpoint(args_opt.pre_trained)
load_param_into_net(net, param_dict)
else:
for _, cell in net.cells_and_names():
if isinstance(cell, nn.Conv2d):
cell.weight.set_data(weight_init.initializer(weight_init.XavierUniform(),
cell.weight.shape,
cell.weight.dtype))
if isinstance(cell, nn.Dense):
cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(),
cell.weight.shape,
cell.weight.dtype))
# init lr
if args_opt.net == "resnet50" or args_opt.net == "se-resnet50":
lr = get_lr(lr_init=config.lr_init, lr_end=config.lr_end, lr_max=config.lr_max,
warmup_epochs=config.warmup_epochs, total_epochs=config.epoch_size, steps_per_epoch=step_size,
lr_decay_mode=config.lr_decay_mode)
else:
lr = warmup_cosine_annealing_lr(config.lr, step_size, config.warmup_epochs, config.epoch_size,
config.pretrain_epoch_size * step_size)
lr = Tensor(lr)
# define opt
decayed_params = []
no_decayed_params = []
for param in net.trainable_params():
if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name:
decayed_params.append(param)
else:
no_decayed_params.append(param)
group_params = [{'params': decayed_params, 'weight_decay': config.weight_decay},
{'params': no_decayed_params},
{'order_params': net.trainable_params()}]
opt = Momentum(group_params, lr, config.momentum, loss_scale=config.loss_scale)
# define loss, model
if target == "Ascend":
if args_opt.dataset == "imagenet2012":
if not config.use_label_smooth:
config.label_smooth_factor = 0.0
loss = CrossEntropySmooth(sparse=True, reduction="mean",
smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
else:
loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'},
amp_level="O2", keep_batchnorm_fp32=False)
else:
# GPU target
if args_opt.dataset == "imagenet2012":
if not config.use_label_smooth:
config.label_smooth_factor = 0.0
loss = CrossEntropySmooth(sparse=True, reduction="mean",
smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
else:
loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
if (args_opt.net == "resnet101" or args_opt.net == "resnet50") and not args_opt.parameter_server:
opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum, config.weight_decay,
config.loss_scale)
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
# Mixed precision
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'},
amp_level="O2", keep_batchnorm_fp32=True)
else:
## fp32 training
opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum, config.weight_decay)
model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
# define callbacks
time_cb = TimeMonitor(data_size=step_size)
loss_cb = LossMonitor()
cb = [time_cb, loss_cb]
if config.save_checkpoint:
config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_epochs * step_size,
keep_checkpoint_max=config.keep_checkpoint_max)
ckpt_cb = ModelCheckpoint(prefix="resnet", directory=ckpt_save_dir, config=config_ck)
cb += [ckpt_cb]
# train model
if args_opt.net == "se-resnet50":
config.epoch_size = config.train_epoch_size
model.train(config.epoch_size - config.pretrain_epoch_size, dataset, callbacks=cb,
dataset_sink_mode=(not args_opt.parameter_server))
@@ -0,0 +1,18 @@
#!/bin/bash
rm -rf /var/log/npu/slog/host-0/*
if [ -d /usr/local/Ascend/nnae/latest ];then
export LD_LIBRARY_PATH=/usr/local/:/usr/local/lib/:/usr/lib/:/usr/local/Ascend/nnae/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/Ascend/driver/tools/hccn_tool/:/usr/local/mpirun4.0/lib
export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages
export PATH=$PATH:/usr/local/Ascend/nnae/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
export ASCEND_OPP_PATH=/usr/local/Ascend/nnae/latest/opp
export TBE_IMPL_PATH=/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core
else
export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/mpirun4.0/lib
export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/ascend-toolkit/latest//fwkacllib/python/site-packages/:/usr/local/Ascend/ascend-toolkit/latest/tfplugin/python/site-packages:$projectDir
export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
export TBE_IMPL_PATH=TBE_IMPL_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core
fi
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
yamlPath=$1
toolsPath=$2
currtime=$3
currentDir=$(cd "$(dirname "$0")/.."; pwd)
train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}
mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
source ${currentDir}/config/npu_set_env.sh
export REMARK_LOG_FILE=hw_resnet50.log
benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}
eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
device_id=${eval_device_id}
export DEVICE_NUM=1
export DEVICE_ID=${device_id}
export RANK_SIZE=${DEVICE_NUM}
export RANK_ID=${device_id}
python3.7 ${currentDir}/code/eval.py \
--net=resnet50 \
--dataset=imagenet2012 \
--dataset_path=${data_url} \
--checkpoint_path=${checkpoint_path} > ${train_job_dir}/eval.out 2>&1
@@ -0,0 +1,95 @@
#!/bin/bash
rank_size=$1
yamlPath=$2
toolsPath=$3
if [ -f /.dockerenv ];then
CLUSTER=$4
MPIRUN_ALL_IP="$5"
export CLUSTER=${CLUSTER}
fi
currentDir=$(cd "$(dirname "$0")/.."; pwd)
# 配置环境变量并调用 train 方法
currtime=`date +%Y%m%d%H%M%S`
mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} &"
# user env
export HCCL_CONNECT_TIMEOUT=600
export JOB_ID=9999001
export SLOG_PRINT_TO_STDOUT=0
export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
# 从 yaml 获取配置
eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
data_url_new=`echo ${data_url//\//\\\\/}`
jsonFilePath=${currentDir}/code/src/config.py
echo "start to modify inner config file"
#echo "jsonfilepath is "${jsonFilePath}
sed -i "0,/epoch_size.*$/s//epoch_size\': ${epoches},/" ${jsonFilePath}
sed -i "0,/batch_size.*$/s//batch_size\': ${batch_size},/" ${jsonFilePath}
sed -i "0,/save_checkpoint_epochs.*$/s//save_checkpoint_epochs\': ${save_checkpoint_epochs},/" ${jsonFilePath}
sed -i "0,/loss_scale.*$/s//loss_scale\': ${loss_scale},/" ${jsonFilePath}
sed -i 's/\r//g' ${jsonFilePath}
if [ $? -eq 0 ] ;
then
echo "modify inner config file success"
else
echo "modify inner config file fail"
exit
fi
# device 列表, 若无指定 device 根据 rank_size 顺序选择
eval device_group=\$device_group_${rank_size}p
if [ x"${device_group}" == x"" ] || [ ${rank_size} -ge 8 ];then
device_group="$(seq 0 "$(expr $rank_size - 1)")"
fi
# get last device id in device_group, hw log in performance from the dir named last_device_id
device_group_str=`echo ${device_group} | sed 's/ //g'`
first_device_id=`echo ${device_group_str: 0:1}`
rank_id=0
if [ x"${CLUSTER}" == x"True" ];then
# ln hw log
ln -snf ${train_job_dir}/0/hw_resnet50.log ${train_job_dir}
this_ip=$(hostname -I |awk '{print $1}')
for ip in $MPIRUN_ALL_IP;do
if [ x"$ip" != x"$this_ip" ];then
scp $yamlPath root@$ip:$yamlPath
scp $jsonFilePath root@$ip:$jsonFilePath
fi
done
export PATH=$PATH:/usr/local/mpirun4.0/bin
mpirun -H ${mpirun_ip} \
--bind-to none -map-by slot\
--allow-run-as-root \
--mca btl_tcp_if_exclude lo,docker0,endvnic,virbr0,vethf40501b,docker_gwbridge,br-f42ac38052b4\
--prefix /usr/local/mpirun4.0/ \
${currentDir}/scripts/train.sh 0 $rank_size $yamlPath $currtime ${toolsPath} ${CLUSTER}
elif [ x"$mode" == x"train" ];then
# ln hw log
ln -snf ${train_job_dir}/${first_device_id}/hw_resnet50.log ${train_job_dir}
for device_id in $device_group;do
${currentDir}/scripts/train.sh $device_id $rank_size $yamlPath $currtime ${toolsPath} $rank_id &
let rank_id++
done
else
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${ckpt_path%results*}/results &"
ln -snf ${train_job_dir}/${first_device_id}/hw_resnet50.log ${train_job_dir}
bash ${currentDir}/scripts/eval.sh ${yamlPath} ${toolsPath} $currtime
fi
wait
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} &"
@@ -0,0 +1,134 @@
#!/usr/bin/env bash
device_id=$1
rank_size=$2
yamlPath=$3
currtime=$4
toolsPath=$5
currentDir=$(cd "$(dirname "$0")/.."; pwd)
export REMARK_LOG_FILE=hw_resnet50.log
# ${mainDir%train*}/train/result/tensorflow/Bert/TrainingJob-${currtime} #
mkdir -p ${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
export train_job_dir=${currentDir%train*}/train/result/ms_resnet50/training_job_${currtime}/
source ${currentDir}/config/npu_set_env.sh
benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
#atlasboost_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
code_dir_path=${currentDir}/code
export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}:${code_dir_path}
# 从 yaml 获取配置
eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "mindspore_config")
# user env
export YAML_PATH=$3
export HCCL_CONNECT_TIMEOUT=600
export JOB_ID=9999001
export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
export RANK_SIZE=${rank_size}
export RANK_INDEX=0
export SLOG_PRINT_TO_STDOUT=0
export DEVICE_ID=$1
DEVICE_INDEX=$(( DEVICE_ID + RANK_INDEX * 8 ))
export DEVICE_INDEX=${DEVICE_INDEX}
export MODEL_CKPT_PATH=${train_job_dir}/${device_id}/ckpt${device_id}
export SERVER_ID=0
cd ${train_job_dir}
curd_dir=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
export PYTHONPATH=$PYTHONPATH:${curd_dir}
if [ x"$6" != x"True" ];then
rank_id=$6
export RANK_ID=$6
else
device_id_mo=$(python3.7 -c "import src.tensorflow.mpi_ops as atlasboost;atlasboost.init(); \
device_id = atlasboost.local_rank();cluster_device_id = str(device_id); \
atlasboost.set_device_id(device_id);print(atlasboost.rank())")
device_id_mo=`echo $device_id_mo`
rank_id=${device_id_mo##* }
export RANK_ID=${rank_id}
device=${device_id_mo##*deviceid = }
device_id=${device%% phyid=*}
export DEVICE_ID=${device_id}
hccljson=${train_job_dir}/*.json
cp ${hccljson} ${currentDir}/config/${rank_size}p.json
fi
#mkdir exec path
mkdir -p ${train_job_dir}/${device_id}
cd ${train_job_dir}/${device_id}
startTime=`date +%Y%m%d-%H:%M:%S`
startTime_s=`date +%s`
# 根据单卡/多卡区分调用参数
if [ x"$6" == x"True" ];then
export CLUSTER=True
echo "run cluster"
--net=resnet50 \
--dataset=imagenet2012 \
--run_distribute=${run_distribute} \
--pre_trained=${pre_trained} \
--dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
elif [ ${rank_size} -le 8 ];then
# 多卡单机
if [ ${rank_size} -eq 1 ]; then
run_distribute=False
device_num=1
else
run_distribute=True
device_num=${rank_size}
fi
export DEVICE_NUM=${device_num}
if [ x"${pre_trained}" == x"None" ];then
python3.7 ${currentDir}/code/train.py \
--net=resnet50 \
--dataset=imagenet2012 \
--run_distribute=${run_distribute} \
--device_num=${device_num} \
--dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
else
python3.7 ${currentDir}/code/train.py \
--net=resnet50 \
--dataset=${data_url} \
--run_distribute=${run_distribute} \
--device_num=${device_num} \
--pre_trained=${pre_trained} \
--dataset_path=${data_url} > ${train_job_dir}/train_${device_id}.log 2>&1
fi
fi
if [ $? -eq 0 ] ;
then
echo ":::ABK 1.0.0 resnet50 train success"
echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/train_${device_id}.log
echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/${device_id}/hw_resnet50.log
else
echo ":::ABK 1.0.0 resnet50 train failed"
echo ":::ABK 1.0.0 resnet50 train failed" >> ${train_job_dir}/train_${device_id}.log
echo ":::ABK 1.0.0 resnet50 train failed" >> ${train_job_dir}/${device_id}/hw_resnet50.log
fi
endTime=`date +%Y%m%d-%H:%M:%S`
endTime_s=`date +%s`
sumTime=$[ $endTime_s - $startTime_s ]
hour=$(( $sumTime/3600 ))
min=$(( ($sumTime-${hour}*3600)/60 ))
sec=$(( $sumTime-${hour}*3600-${min}*60 ))
echo ":::ABK 1.0.0 resnet50 train total time${hour}:${min}:${sec}"
echo ":::ABK 1.0.0 resnet50 train total time${hour}:${min}:${sec}" >> ${train_job_dir}/${device_id}/hw_resnet50.log
@@ -0,0 +1,26 @@
# ResNet50_pytorch训练说明
### 1. 模型训练参数配置
在train/yaml/ResNet50.yaml中修改相应配置, 配置项含义:
```
pytorch_config:
data_url: 数据集路径
batch_size: 跑1p时batch_size为512;跑8p时batch_size为4096
epoches: 跑多少个epoch
mode: train_and_evaluate、evaluate两种模式
ckpt_path: /home/train/result/pt_resnet50/training_job_20200916042624/7/checkpoint_npu7model_best.pth.tar
docker_image: docker 镜像名称:版本号
lr: 默认参数1p 0.28p 2.048
```
------
@@ -0,0 +1,20 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#from . import logger
#from . import dataloaders
#from . import training
#from . import utils
#from . import mixup
#from . import resnet
#from . import smoothing
@@ -0,0 +1,369 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION
# Copyright (c) 2017- Facebook, Inc
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import os
import torch
import numpy as np
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from PIL import Image
DATA_BACKEND_CHOICES = ['pytorch', 'syntetic']
# try:
# from nvidia.dali.plugin.pytorch import DALIClassificationIterator
# from nvidia.dali.pipeline import Pipeline
# import nvidia.dali.ops as ops
# import nvidia.dali.types as types
# DATA_BACKEND_CHOICES.append('dali-gpu')
# DATA_BACKEND_CHOICES.append('dali-cpu')
# except ImportError:
# print("Please install DALI from https://www.github.com/NVIDIA/DALI to run this example.")
def load_jpeg_from_file(path, cuda=True, fp16=False):
img_transforms = transforms.Compose(
[transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor()]
)
img = img_transforms(Image.open(path))
with torch.no_grad():
# mean and std are not multiplied by 255 as they are in training script
# torch dataloader reads data into bytes whereas loading directly
# through PIL creates a tensor with floats in [0,1] range
mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
if cuda:
mean = mean.cuda()
std = std.cuda()
img = img.cuda()
if fp16:
mean = mean.half()
std = std.half()
img = img.half()
else:
img = img.float()
input = img.unsqueeze(0).sub_(mean).div_(std)
return input
# class HybridTrainPipe(Pipeline):
# def __init__(self, batch_size, num_threads, device_id, data_dir, crop, dali_cpu=False):
# super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed = 12 + device_id)
# if torch.distributed.is_initialized():
# rank = torch.distributed.get_rank()
# world_size = torch.distributed.get_world_size()
# else:
# rank = 0
# world_size = 1
# self.input = ops.FileReader(
# file_root = data_dir,
# shard_id = rank,
# num_shards = world_size,
# random_shuffle = True)
# if dali_cpu:
# dali_device = "cpu"
# self.decode = ops.ImageDecoder(device=dali_device, output_type=types.RGB)
# else:
# dali_device = "gpu"
# # This padding sets the size of the internal nvJPEG buffers to be able to handle all images from full-sized ImageNet
# # without additional reallocations
# self.decode = ops.ImageDecoder(device="mixed", output_type=types.RGB, device_memory_padding=211025920, host_memory_padding=140544512)
# self.res = ops.RandomResizedCrop(
# device=dali_device,
# size=[crop, crop],
# interp_type=types.INTERP_LINEAR,
# random_aspect_ratio=[0.75, 4./3.],
# random_area=[0.08, 1.0],
# num_attempts=100)
# self.cmnp = ops.CropMirrorNormalize(device = "gpu",
# output_dtype = types.FLOAT,
# output_layout = types.NCHW,
# crop = (crop, crop),
# image_type = types.RGB,
# mean = [0.485 * 255,0.456 * 255,0.406 * 255],
# std = [0.229 * 255,0.224 * 255,0.225 * 255])
# self.coin = ops.CoinFlip(probability = 0.5)
# def define_graph(self):
# rng = self.coin()
# self.jpegs, self.labels = self.input(name = "Reader")
# images = self.decode(self.jpegs)
# images = self.res(images)
# output = self.cmnp(images.gpu(), mirror = rng)
# return [output, self.labels]
# class HybridValPipe(Pipeline):
# def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size):
# super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed = 12 + device_id)
# if torch.distributed.is_initialized():
# rank = torch.distributed.get_rank()
# world_size = torch.distributed.get_world_size()
# else:
# rank = 0
# world_size = 1
# self.input = ops.FileReader(
# file_root = data_dir,
# shard_id = rank,
# num_shards = world_size,
# random_shuffle = False)
# self.decode = ops.ImageDecoder(device = "mixed", output_type = types.RGB)
# self.res = ops.Resize(device = "gpu", resize_shorter = size)
# self.cmnp = ops.CropMirrorNormalize(device = "gpu",
# output_dtype = types.FLOAT,
# output_layout = types.NCHW,
# crop = (crop, crop),
# image_type = types.RGB,
# mean = [0.485 * 255,0.456 * 255,0.406 * 255],
# std = [0.229 * 255,0.224 * 255,0.225 * 255])
# def define_graph(self):
# self.jpegs, self.labels = self.input(name = "Reader")
# images = self.decode(self.jpegs)
# images = self.res(images)
# output = self.cmnp(images)
# return [output, self.labels]
class DALIWrapper(object):
def gen_wrapper(dalipipeline, num_classes, one_hot):
for data in dalipipeline:
input = data[0]["data"]
target = torch.reshape(data[0]["label"], [-1]).cuda().long()
if one_hot:
target = expand(num_classes, torch.float, target)
yield input, target
dalipipeline.reset()
def __init__(self, dalipipeline, num_classes, one_hot):
self.dalipipeline = dalipipeline
self.num_classes = num_classes
self.one_hot = one_hot
def __iter__(self):
return DALIWrapper.gen_wrapper(self.dalipipeline, self.num_classes, self.one_hot)
def get_dali_train_loader(dali_cpu=False):
# def gdtl(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
# if torch.distributed.is_initialized():
# rank = torch.distributed.get_rank()
# world_size = torch.distributed.get_world_size()
# else:
# rank = 0
# world_size = 1
# traindir = os.path.join(data_path, 'train')
# pipe = HybridTrainPipe(batch_size=batch_size, num_threads=workers,
# device_id = rank % torch.cuda.device_count(),
# data_dir = traindir, crop = 224, dali_cpu=dali_cpu)
# pipe.build()
# train_loader = DALIClassificationIterator(pipe, size = int(pipe.epoch_size("Reader") / world_size))
# return DALIWrapper(train_loader, num_classes, one_hot), int(pipe.epoch_size("Reader") / (world_size * batch_size))
# return gdtl
def gdtl(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
return False
return gdvl
def get_dali_val_loader():
# def gdvl(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
# if torch.distributed.is_initialized():
# rank = torch.distributed.get_rank()
# world_size = torch.distributed.get_world_size()
# else:
# rank = 0
# world_size = 1
# valdir = os.path.join(data_path, 'val')
# pipe = HybridValPipe(batch_size=batch_size, num_threads=workers,
# device_id = rank % torch.cuda.device_count(),
# data_dir = valdir,
# crop = 224, size = 256)
# pipe.build()
# val_loader = DALIClassificationIterator(pipe, size = int(pipe.epoch_size("Reader") / world_size))
# return DALIWrapper(val_loader, num_classes, one_hot), int(pipe.epoch_size("Reader") / (world_size * batch_size))
# return gdvl
def gdvl(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
return False
return gdvl
def fast_collate(batch):
imgs = [img[0] for img in batch]
targets = torch.tensor([target[1] for target in batch], dtype=torch.int64)
w = imgs[0].size[0]
h = imgs[0].size[1]
tensor = torch.zeros( (len(imgs), 3, h, w), dtype=torch.uint8 )
for i, img in enumerate(imgs):
nump_array = np.asarray(img, dtype=np.uint8)
tens = torch.from_numpy(nump_array)
if(nump_array.ndim < 3):
nump_array = np.expand_dims(nump_array, axis=-1)
nump_array = np.rollaxis(nump_array, 2)
tensor[i] += torch.from_numpy(nump_array)
return tensor, targets
def expand(num_classes, dtype, tensor):
e = torch.zeros(tensor.size(0), num_classes, dtype=dtype, device=torch.device('cuda'))
e = e.scatter(1, tensor.unsqueeze(1), 1.0)
return e
class PrefetchedWrapper(object):
def prefetched_loader(loader, num_classes, fp16, one_hot):
mean = torch.tensor([0.485 * 255, 0.456 * 255, 0.406 * 255]).cuda().view(1,3,1,1)
std = torch.tensor([0.229 * 255, 0.224 * 255, 0.225 * 255]).cuda().view(1,3,1,1)
if fp16:
mean = mean.half()
std = std.half()
stream = torch.cuda.Stream()
first = True
for next_input, next_target in loader:
with torch.cuda.stream(stream):
next_input = next_input.cuda(non_blocking=True)
next_target = next_target.cuda(non_blocking=True)
if fp16:
next_input = next_input.half()
if one_hot:
next_target = expand(num_classes, torch.half, next_target)
else:
next_input = next_input.float()
if one_hot:
next_target = expand(num_classes, torch.float, next_target)
next_input = next_input.sub_(mean).div_(std)
if not first:
yield input, target
else:
first = False
torch.cuda.current_stream().wait_stream(stream)
input = next_input
target = next_target
yield input, target
def __init__(self, dataloader, num_classes, fp16, one_hot):
self.dataloader = dataloader
self.fp16 = fp16
self.epoch = 0
self.one_hot = one_hot
self.num_classes = num_classes
def __iter__(self):
if (self.dataloader.sampler is not None and
isinstance(self.dataloader.sampler,
torch.utils.data.distributed.DistributedSampler)):
self.dataloader.sampler.set_epoch(self.epoch)
self.epoch += 1
return PrefetchedWrapper.prefetched_loader(self.dataloader, self.num_classes, self.fp16, self.one_hot)
def get_pytorch_train_loader(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
traindir = os.path.join(data_path, 'train')
train_dataset = datasets.ImageFolder(
traindir,
transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
]))
if torch.distributed.is_initialized():
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
else:
train_sampler = None
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch_size, shuffle=(train_sampler is None),
num_workers=workers, worker_init_fn=_worker_init_fn, pin_memory=True, sampler=train_sampler, collate_fn=fast_collate, drop_last=True)
return PrefetchedWrapper(train_loader, num_classes, fp16, one_hot), len(train_loader)
def get_pytorch_val_loader(data_path, batch_size, num_classes, one_hot, workers=5, _worker_init_fn=None, fp16=False):
valdir = os.path.join(data_path, 'val')
val_dataset = datasets.ImageFolder(
valdir, transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
]))
if torch.distributed.is_initialized():
val_sampler = torch.utils.data.distributed.DistributedSampler(val_dataset)
else:
val_sampler = None
val_loader = torch.utils.data.DataLoader(
val_dataset,
sampler=val_sampler,
batch_size=batch_size, shuffle=False,
num_workers=workers, worker_init_fn=_worker_init_fn, pin_memory=True,
collate_fn=fast_collate)
return PrefetchedWrapper(val_loader, num_classes, fp16, one_hot), len(val_loader)
class SynteticDataLoader(object):
def __init__(self, fp16, batch_size, num_classes, num_channels, height, width, one_hot):
input_data = torch.empty(batch_size, num_channels, height, width).cuda().normal_(0, 1.0)
if one_hot:
input_target = torch.empty(batch_size, num_classes).cuda()
input_target[:, 0] = 1.0
else:
input_target = torch.randint(0, num_classes, (batch_size,))
input_target=input_target.cuda()
if fp16:
input_data = input_data.half()
self.input_data = input_data
self.input_target = input_target
def __iter__(self):
while True:
yield self.input_data, self.input_target
def get_syntetic_loader(data_path, batch_size, num_classes, one_hot, workers=None, _worker_init_fn=None, fp16=False):
return SynteticDataLoader(fp16, batch_size, 1000, 3, 224, 224, one_hot), -1
@@ -0,0 +1,310 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION
# Copyright (c) 2017- Facebook, Inc
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
from collections import OrderedDict
import dllogger
import numpy as np
def format_step(step):
if isinstance(step, str):
return step
s = ""
if len(step) > 0:
s += "Epoch: {} ".format(step[0])
if len(step) > 1:
s += "Iteration: {} ".format(step[1])
if len(step) > 2:
s += "Validation Iteration: {} ".format(step[2])
if len(step) == 0:
s = "Summary:"
return s
PERF_METER = lambda: Meter(AverageMeter(), AverageMeter(), AverageMeter())
LOSS_METER = lambda: Meter(AverageMeter(), AverageMeter(), MinMeter())
ACC_METER = lambda: Meter(AverageMeter(), AverageMeter(), MaxMeter())
LR_METER = lambda: Meter(LastMeter(), LastMeter(), LastMeter())
LAT_100 = lambda: Meter(QuantileMeter(1), QuantileMeter(1), QuantileMeter(1))
LAT_99 = lambda: Meter(QuantileMeter(0.99), QuantileMeter(0.99), QuantileMeter(0.99))
LAT_95 = lambda: Meter(QuantileMeter(0.95), QuantileMeter(0.95), QuantileMeter(0.95))
class Meter(object):
def __init__(self, iteration_aggregator, epoch_aggregator, run_aggregator):
self.run_aggregator = run_aggregator
self.epoch_aggregator = epoch_aggregator
self.iteration_aggregator = iteration_aggregator
def record(self, val, n=1):
self.iteration_aggregator.record(val, n=n)
def get_iteration(self):
v, n = self.iteration_aggregator.get_val()
return v
def reset_iteration(self):
v, n = self.iteration_aggregator.get_data()
self.iteration_aggregator.reset()
if v is not None:
self.epoch_aggregator.record(v, n=n)
def get_epoch(self):
v, n = self.epoch_aggregator.get_val()
return v
def reset_epoch(self):
v, n = self.epoch_aggregator.get_data()
self.epoch_aggregator.reset()
if v is not None:
self.run_aggregator.record(v, n=n)
def get_run(self):
v, n = self.run_aggregator.get_val()
return v
def reset_run(self):
self.run_aggregator.reset()
class QuantileMeter(object):
def __init__(self, q):
self.q = q
self.reset()
def reset(self):
self.vals = []
self.n = 0
def record(self, val, n=1):
if isinstance(val, list):
self.vals += val
self.n += len(val)
else:
self.vals += [val] * n
self.n += n
def get_val(self):
if not self.vals:
return None, self.n
return np.quantile(self.vals, self.q, interpolation='nearest'), self.n
def get_data(self):
return self.vals, self.n
class MaxMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.max = None
self.n = 0
def record(self, val, n=1):
if self.max is None:
self.max = val
else:
self.max = max(self.max, val)
self.n = n
def get_val(self):
return self.max, self.n
def get_data(self):
return self.max, self.n
class MinMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.min = None
self.n = 0
def record(self, val, n=1):
if self.min is None:
self.min = val
else:
self.min = max(self.min, val)
self.n = n
def get_val(self):
return self.min, self.n
def get_data(self):
return self.min, self.n
class LastMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.last = None
self.n = 0
def record(self, val, n=1):
self.last = val
self.n = n
def get_val(self):
return self.last, self.n
def get_data(self):
return self.last, self.n
class AverageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.n = 0
self.val = 0
def record(self, val, n=1):
self.n += n
self.val += val * n
def get_val(self):
if self.n == 0:
return None, 0
return self.val / self.n, self.n
def get_data(self):
if self.n == 0:
return None, 0
return self.val / self.n, self.n
class Logger(object):
def __init__(self, print_interval, backends, verbose=False):
self.epoch = -1
self.iteration = -1
self.val_iteration = -1
self.metrics = OrderedDict()
self.backends = backends
self.print_interval = print_interval
self.verbose = verbose
dllogger.init(backends)
def log_parameter(self, data, verbosity=0):
dllogger.log(step="PARAMETER", data=data, verbosity=verbosity)
def register_metric(self, metric_name, meter, verbosity=0, metadata={}):
if self.verbose:
print("Registering metric: {}".format(metric_name))
self.metrics[metric_name] = {'meter': meter, 'level': verbosity}
dllogger.metadata(metric_name, metadata)
def log_metric(self, metric_name, val, n=1):
self.metrics[metric_name]['meter'].record(val, n=n)
def start_iteration(self, val=False):
if val:
self.val_iteration += 1
else:
self.iteration += 1
def end_iteration(self, val=False):
it = self.val_iteration if val else self.iteration
if (it % self.print_interval == 0):
metrics = {
n: m
for n, m in self.metrics.items() if n.startswith('val') == val
}
step = (self.epoch,
self.iteration) if not val else (self.epoch,
self.iteration,
self.val_iteration)
verbositys = {m['level'] for _, m in metrics.items()}
for ll in verbositys:
llm = {n: m for n, m in metrics.items() if m['level'] == ll}
dllogger.log(step=step,
data={
n: m['meter'].get_iteration()
for n, m in llm.items()
},
verbosity=ll)
for n, m in metrics.items():
m['meter'].reset_iteration()
dllogger.flush()
def start_epoch(self):
self.epoch += 1
self.iteration = 0
self.val_iteration = 0
for n, m in self.metrics.items():
m['meter'].reset_epoch()
def end_epoch(self):
for n, m in self.metrics.items():
m['meter'].reset_iteration()
verbositys = {m['level'] for _, m in self.metrics.items()}
for ll in verbositys:
llm = {n: m for n, m in self.metrics.items() if m['level'] == ll}
dllogger.log(step=(self.epoch, ),
data={n: m['meter'].get_epoch()
for n, m in llm.items()})
def end(self):
for n, m in self.metrics.items():
m['meter'].reset_epoch()
verbositys = {m['level'] for _, m in self.metrics.items()}
for ll in verbositys:
llm = {n: m for n, m in self.metrics.items() if m['level'] == ll}
dllogger.log(step=tuple(),
data={n: m['meter'].get_run()
for n, m in llm.items()})
for n, m in self.metrics.items():
m['meter'].reset_epoch()
dllogger.flush()
def iteration_generator_wrapper(self, gen, val=False):
for g in gen:
self.start_iteration(val=val)
yield g
self.end_iteration(val=val)
def epoch_generator_wrapper(self, gen):
for g in gen:
self.start_epoch()
yield g
self.end_epoch()
@@ -0,0 +1,67 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
import numpy as np
def mixup(alpha, num_classes, data, target):
with torch.no_grad():
bs = data.size(0)
c = np.random.beta(alpha, alpha)
perm = torch.randperm(bs).cuda()
md = c * data + (1-c) * data[perm, :]
mt = c * target + (1-c) * target[perm, :]
return md, mt
class MixUpWrapper(object):
def __init__(self, alpha, num_classes, dataloader):
self.alpha = alpha
self.dataloader = dataloader
self.num_classes = num_classes
def mixup_loader(self, loader):
for input, target in loader:
i, t = mixup(self.alpha, self.num_classes, input, target)
yield i, t
def __iter__(self):
return self.mixup_loader(self.dataloader)
class NLLMultiLabelSmooth(nn.Module):
def __init__(self, smoothing = 0.0):
super(NLLMultiLabelSmooth, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
def forward(self, x, target):
if self.training:
x = x.float()
target = target.float()
logprobs = torch.nn.functional.log_softmax(x, dim = -1)
nll_loss = -logprobs * target
nll_loss = nll_loss.sum(-1)
smooth_loss = -logprobs.mean(dim=-1)
loss = self.confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
else:
return torch.nn.functional.cross_entropy(x, target)
@@ -0,0 +1,370 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION
# Copyright (c) 2017- Facebook, Inc
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import math
import torch
import torch.nn as nn
import numpy as np
__all__ = ['ResNet', 'build_resnet', 'resnet_versions', 'resnet_configs']
# ResNetBuilder {{{
class ResNetBuilder(object):
def __init__(self, version, config):
self.conv3x3_cardinality = 1 if 'cardinality' not in version.keys() else version['cardinality']
self.config = config
def conv(self, kernel_size, in_planes, out_planes, groups=1, stride=1):
conv = nn.Conv2d(
in_planes, out_planes,
kernel_size=kernel_size, groups=groups,
stride=stride, padding=int((kernel_size - 1)/2),
bias=False)
if self.config['nonlinearity'] == 'relu':
# torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
# Copy
# 用论文 “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015) 中提及的正态分布初始化输入 Tensor。初始化后的张量中的值采样 ) 且
# %20%5Ctimes%20%5Ctext%7Bfan%5C_in%7D%7D%7D%0D%0A%0D%0A)
# 也被称作 He initialization。
# 参数:
# tensor n 维 torch.Tensor
# a – 该层后面一层的整流函数中负的斜率 (默认为 0,此时为 Relu)
# mode 'fan_in' (default) 或者 'fan_out'。使用fan_in保持weights的方差在前向传播中不变;使用fan_out保持weights的方差在反向传播中不变。
# nonlinearity 非线性函数 (nn.functional 中的名字),推荐只使用 'relu' 或 'leaky_relu' (default)。
# 例子
# >>> w = torch.empty(3, 5)
# >>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
nn.init.kaiming_normal_(conv.weight,
mode=self.config['conv_init'],
nonlinearity=self.config['nonlinearity'])
return conv
def conv3x3(self, in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
c = self.conv(3, in_planes, out_planes, groups=self.conv3x3_cardinality, stride=stride)
return c
def conv1x1(self, in_planes, out_planes, stride=1):
"""1x1 convolution with padding"""
c = self.conv(1, in_planes, out_planes, stride=stride)
return c
def conv7x7(self, in_planes, out_planes, stride=1):
"""7x7 convolution with padding"""
c = self.conv(7, in_planes, out_planes, stride=stride)
return c
def conv5x5(self, in_planes, out_planes, stride=1):
"""5x5 convolution with padding"""
c = self.conv(5, in_planes, out_planes, stride=stride)
return c
def batchnorm(self, planes, last_bn=False):
bn = nn.BatchNorm2d(planes)
gamma_init_val = 0 if last_bn and self.config['last_bn_0_init'] else 1
nn.init.constant_(bn.weight, gamma_init_val)
nn.init.constant_(bn.bias, 0)
return bn
def activation(self):
return self.config['activation']()
# ResNetBuilder }}}
# BasicBlock {{{
class BasicBlock(nn.Module):
def __init__(self, builder, inplanes, planes, expansion, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = builder.conv3x3(inplanes, planes, stride)
self.bn1 = builder.batchnorm(planes)
self.relu = builder.activation()
self.conv2 = builder.conv3x3(planes, planes*expansion)
self.bn2 = builder.batchnorm(planes*expansion, last_bn=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
if self.bn1 is not None:
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
if self.bn2 is not None:
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
# BasicBlock }}}
# SqueezeAndExcitation {{{
class SqueezeAndExcitation(nn.Module):
def __init__(self, planes, squeeze):
super(SqueezeAndExcitation, self).__init__()
self.squeeze = nn.Linear(planes, squeeze)
self.expand = nn.Linear(squeeze, planes)
self.relu = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
out = torch.mean(x.view(x.size(0), x.size(1), -1), 2)
out = self.squeeze(out)
out = self.relu(out)
out = self.expand(out)
out = self.sigmoid(out)
out = out.unsqueeze(2).unsqueeze(3)
return out
# }}}
# Bottleneck {{{
class Bottleneck(nn.Module):
def __init__(self, builder, inplanes, planes, expansion, stride=1, se=False, se_squeeze=16, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = builder.conv1x1(inplanes, planes)
self.bn1 = builder.batchnorm(planes)
self.conv2 = builder.conv3x3(planes, planes, stride=stride)
self.bn2 = builder.batchnorm(planes)
self.conv3 = builder.conv1x1(planes, planes * expansion)
self.bn3 = builder.batchnorm(planes * expansion, last_bn=True)
self.relu = builder.activation()
self.downsample = downsample
self.stride = stride
self.squeeze = SqueezeAndExcitation(planes*expansion, se_squeeze) if se else None
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
if self.squeeze is None:
out += residual
else:
out = torch.addcmul(residual, 1.0, out, self.squeeze(out))
out = self.relu(out)
return out
def SEBottleneck(builder, inplanes, planes, expansion, stride=1, downsample=None):
return Bottleneck(builder, inplanes, planes, expansion, stride=stride, se=True, se_squeeze=16, downsample=downsample)
# Bottleneck }}}
# ResNet {{{
class ResNet(nn.Module):
def __init__(self, builder, block, expansion, layers, widths, num_classes=1000):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = builder.conv7x7(3, 64, stride=2)
self.bn1 = builder.batchnorm(64)
self.relu = builder.activation()
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(builder, block, expansion, widths[0], layers[0])
self.layer2 = self._make_layer(builder, block, expansion, widths[1], layers[1], stride=2)
self.layer3 = self._make_layer(builder, block, expansion, widths[2], layers[2], stride=2)
self.layer4 = self._make_layer(builder, block, expansion, widths[3], layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Linear(widths[3] * expansion, num_classes)
def _make_layer(self, builder, block, expansion, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * expansion:
dconv = builder.conv1x1(self.inplanes, planes * expansion,
stride=stride)
dbn = builder.batchnorm(planes * expansion)
if dbn is not None:
downsample = nn.Sequential(dconv, dbn)
else:
downsample = dconv
layers = []
layers.append(block(builder, self.inplanes, planes, expansion, stride=stride, downsample=downsample))
self.inplanes = planes * expansion
for i in range(1, blocks):
layers.append(block(builder, self.inplanes, planes, expansion))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
if self.bn1 is not None:
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
# ResNet }}}
resnet_configs = {
'classic' : {
'conv' : nn.Conv2d,
'conv_init' : 'fan_out',
'nonlinearity' : 'relu',
'last_bn_0_init' : False,
'activation' : lambda: nn.ReLU(inplace=True),
},
'fanin' : {
'conv' : nn.Conv2d,
'conv_init' : 'fan_in',
'nonlinearity' : 'relu',
'last_bn_0_init' : False,
'activation' : lambda: nn.ReLU(inplace=True),
},
'grp-fanin' : {
'conv' : nn.Conv2d,
'conv_init' : 'fan_in',
'nonlinearity' : 'relu',
'last_bn_0_init' : False,
'activation' : lambda: nn.ReLU(inplace=True),
},
'grp-fanout' : {
'conv' : nn.Conv2d,
'conv_init' : 'fan_out',
'nonlinearity' : 'relu',
'last_bn_0_init' : False,
'activation' : lambda: nn.ReLU(inplace=True),
},
}
resnet_versions = {
'resnet18' : {
'net' : ResNet,
'block' : BasicBlock,
'layers' : [2, 2, 2, 2],
'widths' : [64, 128, 256, 512],
'expansion' : 1,
'num_classes' : 1000,
},
'resnet34' : {
'net' : ResNet,
'block' : BasicBlock,
'layers' : [3, 4, 6, 3],
'widths' : [64, 128, 256, 512],
'expansion' : 1,
'num_classes' : 1000,
},
'resnet50' : {
'net' : ResNet,
'block' : Bottleneck,
'layers' : [3, 4, 6, 3],
'widths' : [64, 128, 256, 512],
'expansion' : 4,
'num_classes' : 1000,
},
'resnet101' : {
'net' : ResNet,
'block' : Bottleneck,
'layers' : [3, 4, 23, 3],
'widths' : [64, 128, 256, 512],
'expansion' : 4,
'num_classes' : 1000,
},
'resnet152' : {
'net' : ResNet,
'block' : Bottleneck,
'layers' : [3, 8, 36, 3],
'widths' : [64, 128, 256, 512],
'expansion' : 4,
'num_classes' : 1000,
},
'resnext101-32x4d' : {
'net' : ResNet,
'block' : Bottleneck,
'cardinality' : 32,
'layers' : [3, 4, 23, 3],
'widths' : [128, 256, 512, 1024],
'expansion' : 2,
'num_classes' : 1000,
},
'se-resnext101-32x4d' : {
'net' : ResNet,
'block' : SEBottleneck,
'cardinality' : 32,
'layers' : [3, 4, 23, 3],
'widths' : [128, 256, 512, 1024],
'expansion' : 2,
'num_classes' : 1000,
},
}
def build_resnet(version, config, verbose=True):
version = resnet_versions[version]
config = resnet_configs[config]
builder = ResNetBuilder(version, config)
if verbose:
print("Version: {}".format(version))
print("Config: {}".format(config))
model = version['net'](builder,
version['block'],
version['expansion'],
version['layers'],
version['widths'],
version['num_classes'])
return model
@@ -0,0 +1,91 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
class CrossEntropy(nn.CrossEntropyLoss):
def __init__(self, smooth_factor=0., num_classes=1000):
super(CrossEntropy, self).__init__()
self.on_value = 1.0 - smooth_factor
self.off_value = 1.0 * smooth_factor / (num_classes - 1)
def forward(self, input, target):
one_hot_label = torch.npu_one_hot(target, -1, input.size(1), self.on_value, self.off_value)
one_hot_label = one_hot_label.to(torch.float16)
loss = torch.npu_softmax_cross_entropy_with_logits(input.to(torch.float16), one_hot_label)
loss = torch.mean(loss, [0], keepdim=False, dtype=torch.float32)
return loss
class LabelSmoothingNpu(nn.Module):
"""
NLL loss with label smoothing.
"""
def __init__(self, smoothing=0.0):
"""
Constructor for the LabelSmoothing module.
:param smoothing: label smoothing factor
"""
super(LabelSmoothingNpu, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
self.epsilon = 0.1
self.num_classes = 1000
def forward(self, x, target):
CALCULATE_DEVICE = x.device
logprobs = torch.nn.functional.log_softmax(x, dim=-1).to("cpu")
targets = torch.zeros_like(logprobs).scatter_(1, target.unsqueeze(1), 1)
targets = (1 - self.epsilon) * targets + self.epsilon / self.num_classes
loss = (-targets * logprobs).mean(0).sum()
# nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
# nll_loss = nll_loss.squeeze(1)
# smooth_loss = -logprobs.mean(dim=-1)
# loss = self.confidence * nll_loss + self.smoothing * smooth_loss
return loss.to(CALCULATE_DEVICE)
class LabelSmoothingGpu(nn.Module):
"""
NLL loss with label smoothing.
"""
def __init__(self, smoothing=0.0):
"""
Constructor for the LabelSmoothing module.
:param smoothing: label smoothing factor
"""
super(LabelSmoothingGpu, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
# print("----------------------LabelSooothing.__init__")
# def __call__(self,x,target):
# print("----------------------LabelSooothing.__call__")
# return self.forward(self,x,target)
def forward(self, x, target):
logprobs = torch.nn.functional.log_softmax(x, dim=-1)
nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
nll_loss = nll_loss.squeeze(1)
smooth_loss = -logprobs.mean(dim=-1)
loss = self.confidence * nll_loss + self.smoothing * smooth_loss
#print("================",type(x),x.size())
#print("------------------",type(target),target.size(),target)
return loss.mean()
@@ -0,0 +1,51 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the BSD 3-Clause License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
class LabelSmoothing(nn.Module):
"""
NLL loss with label smoothing.
"""
def __init__(self, smoothing=0.0):
"""
Constructor for the LabelSmoothing module.
:param smoothing: label smoothing factor
"""
super(LabelSmoothing, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
# print("----------------------LabelSooothing.__init__")
# def __call__(self,x,target):
# print("----------------------LabelSooothing.__call__")
# return self.forward(self,x,target)
def forward(self, x, target):
device_x = x.device
device_target = target.device
x = x.to("cpu")
target = target.to("cpu")
logprobs = torch.nn.functional.log_softmax(x, dim=-1)
nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
nll_loss = nll_loss.squeeze(1)
smooth_loss = -logprobs.mean(dim=-1)
loss = self.confidence * nll_loss + self.smoothing * smooth_loss
#print("================",type(x),x.size())
#print("------------------",type(target),target.size(),target)
x = x.to(device_x)
target = target.to(device_target)
return loss.mean()
@@ -0,0 +1,534 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION
# Copyright (c) 2017- Facebook, Inc
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import os
import time
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
from . import logger as log
from . import resnet as nvmodels
from . import utils
import dllogger
try:
#from apex.parallel import DistributedDataParallel as DDP #可以采用pytorch torch.distributed
from apex.fp16_utils import *
from apex import amp
except ImportError:
raise ImportError(
"Please install apex from https://www.github.com/nvidia/apex to run this example."
)
ACC_METADATA = {'unit': '%','format': ':.2f'}
IPS_METADATA = {'unit': 'img/s', 'format': ':.2f'}
TIME_METADATA = {'unit': 's', 'format': ':.5f'}
LOSS_METADATA = {'format': ':.5f'}
class ModelAndLoss(nn.Module):
def __init__(self,
arch,
loss,
pretrained_weights=None,
cuda=True,
fp16=False):
super(ModelAndLoss, self).__init__()
self.arch = arch
print("=> creating model '{}'".format(arch))
model = nvmodels.build_resnet(arch[0], arch[1])
if pretrained_weights is not None:
print("=> using pre-trained model from a file '{}'".format(arch))
model.load_state_dict(pretrained_weights)
if cuda:
model = model.cuda()
if fp16:
model = network_to_half(model)
# define loss function (criterion) and optimizer
criterion = loss()
if cuda:
criterion = criterion.cuda()
self.model = model
self.loss = criterion
def forward(self, data, target):
output = self.model(data)
loss = self.loss(output, target)
return loss, output
def distributed(self):
#self.model = DDP(self.model)
return
def load_model_state(self, state):
if not state is None:
self.model.load_state_dict(state)
def get_optimizer(parameters,
fp16,
lr,
momentum,
weight_decay,
nesterov=False,
state=None,
static_loss_scale=1.,
dynamic_loss_scale=False,
bn_weight_decay=False):
if bn_weight_decay:
print(" ! Weight decay applied to BN parameters ")
optimizer = torch.optim.SGD([v for n, v in parameters],
lr,
momentum=momentum,
weight_decay=weight_decay,
nesterov=nesterov)
else:
print(" ! Weight decay NOT applied to BN parameters ")
bn_params = [v for n, v in parameters if 'bn' in n]
rest_params = [v for n, v in parameters if not 'bn' in n]
print(len(bn_params))
print(len(rest_params))
optimizer = torch.optim.SGD([{
'params': bn_params,
'weight_decay': 0
}, {
'params': rest_params,
'weight_decay': weight_decay
}],
lr,
momentum=momentum,
weight_decay=weight_decay,
nesterov=nesterov)
if fp16:
optimizer = FP16_Optimizer(optimizer,
static_loss_scale=static_loss_scale,
dynamic_loss_scale=dynamic_loss_scale,
verbose=False)
if not state is None:
optimizer.load_state_dict(state)
return optimizer
def lr_policy(lr_fn, logger=None):
if logger is not None:
logger.register_metric('lr',
log.LR_METER(),
verbosity=dllogger.Verbosity.VERBOSE)
def _alr(optimizer, iteration, epoch):
lr = lr_fn(iteration, epoch)
if logger is not None:
logger.log_metric('lr', lr)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return _alr
def lr_step_policy(base_lr, steps, decay_factor, warmup_length, logger=None):
def _lr_fn(iteration, epoch):
if epoch < warmup_length:
lr = base_lr * (epoch + 1) / warmup_length
else:
lr = base_lr
for s in steps:
if epoch >= s:
lr *= decay_factor
return lr
return lr_policy(_lr_fn, logger=logger)
def lr_linear_policy(base_lr, warmup_length, epochs, logger=None):
def _lr_fn(iteration, epoch):
if epoch < warmup_length:
lr = base_lr * (epoch + 1) / warmup_length
else:
e = epoch - warmup_length
es = epochs - warmup_length
lr = base_lr * (1 - (e / es))
return lr
return lr_policy(_lr_fn, logger=logger)
def lr_cosine_policy(base_lr, warmup_length, epochs, logger=None):
def _lr_fn(iteration, epoch):
if epoch < warmup_length:
lr = base_lr * (epoch + 1) / warmup_length
else:
e = epoch - warmup_length
es = epochs - warmup_length
lr = 0.5 * (1 + np.cos(np.pi * e / es)) * base_lr
return lr
return lr_policy(_lr_fn, logger=logger)
def lr_exponential_policy(base_lr,
warmup_length,
epochs,
final_multiplier=0.001,
logger=None):
es = epochs - warmup_length
epoch_decay = np.power(2, np.log2(final_multiplier) / es)
def _lr_fn(iteration, epoch):
if epoch < warmup_length:
lr = base_lr * (epoch + 1) / warmup_length
else:
e = epoch - warmup_length
lr = base_lr * (epoch_decay**e)
return lr
return lr_policy(_lr_fn, logger=logger)
def get_train_step(model_and_loss,
optimizer,
fp16,
use_amp=False,
batch_size_multiplier=1):
def _step(input, target, optimizer_step=True):
input_var = Variable(input)
target_var = Variable(target)
loss, output = model_and_loss(input_var, target_var)
if torch.distributed.is_initialized():
print('utils.reduce_tensor(loss.data)')
reduced_loss = utils.reduce_tensor(loss.data)
else:
reduced_loss = loss.data
if fp16:
optimizer.backward(loss)
elif use_amp:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if optimizer_step:
opt = optimizer.optimizer if isinstance(
optimizer, FP16_Optimizer) else optimizer
for param_group in opt.param_groups:
for param in param_group['params']:
param.grad /= batch_size_multiplier
optimizer.step()
optimizer.zero_grad()
torch.cuda.synchronize()
return reduced_loss
return _step
def train(train_loader,
model_and_loss,
optimizer,
lr_scheduler,
fp16,
logger,
epoch,
use_amp=False,
prof=-1,
batch_size_multiplier=1,
register_metrics=True):
if register_metrics and logger is not None:
logger.register_metric('train.loss',
log.LOSS_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=LOSS_METADATA)
logger.register_metric('train.compute_ips',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=IPS_METADATA)
logger.register_metric('train.total_ips',
log.PERF_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=IPS_METADATA)
logger.register_metric('train.data_time',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
logger.register_metric('train.compute_time',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
step = get_train_step(model_and_loss,
optimizer,
fp16,
use_amp=use_amp,
batch_size_multiplier=batch_size_multiplier)
model_and_loss.train()
end = time.time()
optimizer.zero_grad()
data_iter = enumerate(train_loader)
if logger is not None:
data_iter = logger.iteration_generator_wrapper(data_iter)
if prof > 0:
data_iter = utils.first_n(prof, data_iter)
for i, (input, target) in data_iter:
bs = input.size(0)
lr_scheduler(optimizer, i, epoch)
data_time = time.time() - end
optimizer_step = ((i + 1) % batch_size_multiplier) == 0
loss = step(input, target, optimizer_step=optimizer_step)
it_time = time.time() - end
if logger is not None:
logger.log_metric('train.loss', to_python_float(loss), bs)
logger.log_metric('train.compute_ips',
calc_ips(bs, it_time - data_time))
logger.log_metric('train.total_ips', calc_ips(bs, it_time))
logger.log_metric('train.data_time', data_time)
logger.log_metric('train.compute_time', it_time - data_time)
end = time.time()
def get_val_step(model_and_loss):
def _step(input, target):
input_var = Variable(input)
target_var = Variable(target)
with torch.no_grad():
loss, output = model_and_loss(input_var, target_var)
prec1, prec5 = utils.accuracy(output.data, target, topk=(1, 5))
if torch.distributed.is_initialized():
reduced_loss = utils.reduce_tensor(loss.data)
prec1 = utils.reduce_tensor(prec1)
prec5 = utils.reduce_tensor(prec5)
else:
reduced_loss = loss.data
torch.cuda.synchronize()
return reduced_loss, prec1, prec5
return _step
def validate(val_loader,
model_and_loss,
fp16,
logger,
epoch,
prof=-1,
register_metrics=True):
if register_metrics and logger is not None:
logger.register_metric('val.top1',
log.ACC_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=ACC_METADATA)
logger.register_metric('val.top5',
log.ACC_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=ACC_METADATA)
logger.register_metric('val.loss',
log.LOSS_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=LOSS_METADATA)
logger.register_metric('val.compute_ips',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=IPS_METADATA)
logger.register_metric('val.total_ips',
log.PERF_METER(),
verbosity=dllogger.Verbosity.DEFAULT,
metadata=IPS_METADATA)
logger.register_metric('val.data_time',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
logger.register_metric('val.compute_latency',
log.PERF_METER(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
logger.register_metric('val.compute_latency_at100',
log.LAT_100(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
logger.register_metric('val.compute_latency_at99',
log.LAT_99(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
logger.register_metric('val.compute_latency_at95',
log.LAT_95(),
verbosity=dllogger.Verbosity.VERBOSE,
metadata=TIME_METADATA)
step = get_val_step(model_and_loss)
top1 = log.AverageMeter()
# switch to evaluate mode
model_and_loss.eval()
end = time.time()
data_iter = enumerate(val_loader)
if not logger is None:
data_iter = logger.iteration_generator_wrapper(data_iter, val=True)
if prof > 0:
data_iter = utils.first_n(prof, data_iter)
for i, (input, target) in data_iter:
bs = input.size(0)
data_time = time.time() - end
loss, prec1, prec5 = step(input, target)
it_time = time.time() - end
top1.record(to_python_float(prec1), bs)
if logger is not None:
logger.log_metric('val.top1', to_python_float(prec1), bs)
logger.log_metric('val.top5', to_python_float(prec5), bs)
logger.log_metric('val.loss', to_python_float(loss), bs)
logger.log_metric('val.compute_ips',
calc_ips(bs, it_time - data_time))
logger.log_metric('val.total_ips', calc_ips(bs, it_time))
logger.log_metric('val.data_time', data_time)
logger.log_metric('val.compute_latency', it_time - data_time)
logger.log_metric('val.compute_latency_at95', it_time - data_time)
logger.log_metric('val.compute_latency_at99', it_time - data_time)
logger.log_metric('val.compute_latency_at100', it_time - data_time)
end = time.time()
return top1.get_val()
# Train loop {{{
def calc_ips(batch_size, time):
world_size = torch.distributed.get_world_size(
) if torch.distributed.is_initialized() else 1
tbs = world_size * batch_size
return tbs / time
def train_loop(model_and_loss,
optimizer,
lr_scheduler,
train_loader,
val_loader,
epochs,
fp16,
logger,
should_backup_checkpoint,
use_amp=False,
batch_size_multiplier=1,
best_prec1=0,
start_epoch=0,
prof=-1,
skip_training=False,
skip_validation=False,
save_checkpoints=True,
checkpoint_dir='./'):
prec1 = -1
epoch_iter = range(start_epoch, epochs)
for epoch in epoch_iter:
if logger is not None:
logger.start_epoch()
if not skip_training:
train(train_loader,
model_and_loss,
optimizer,
lr_scheduler,
fp16,
logger,
epoch,
use_amp=use_amp,
prof=prof,
register_metrics=epoch == start_epoch,
batch_size_multiplier=batch_size_multiplier)
if not skip_validation:
prec1, nimg = validate(val_loader,
model_and_loss,
fp16,
logger,
epoch,
prof=prof,
register_metrics=epoch == start_epoch)
if logger is not None:
logger.end_epoch()
if save_checkpoints and (not torch.distributed.is_initialized()
or torch.distributed.get_rank() == 0):
if not skip_validation:
is_best = logger.metrics['val.top1']['meter'].get_epoch() > best_prec1
best_prec1 = max(logger.metrics['val.top1']['meter'].get_epoch(),
best_prec1)
else:
is_best = False
best_prec1 = 0
if should_backup_checkpoint(epoch):
backup_filename = 'checkpoint-{}.pth.tar'.format(epoch + 1)
else:
backup_filename = None
utils.save_checkpoint(
{
'epoch': epoch + 1,
'arch': model_and_loss.arch,
'state_dict': model_and_loss.model.state_dict(),
'best_prec1': best_prec1,
'optimizer': optimizer.state_dict(),
},
is_best,
checkpoint_dir=checkpoint_dir,
backup_filename=backup_filename)
# }}}
@@ -0,0 +1,106 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION
# Copyright (c) 2017- Facebook, Inc
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import os
import numpy as np
import torch
import shutil
import torch.distributed as dist
def should_backup_checkpoint(args):
def _sbc(epoch):
return args.gather_checkpoints and (epoch < 10 or epoch % 10 == 0)
return _sbc
def save_checkpoint(state,
is_best,
filename='checkpoint.pth.tar',
checkpoint_dir='./',
backup_filename=None):
if (not torch.distributed.is_initialized()
) or torch.distributed.get_rank() == 0:
filename = os.path.join(checkpoint_dir, filename)
print("SAVING {}".format(filename))
torch.save(state, filename)
if is_best:
shutil.copyfile(filename,
os.path.join(checkpoint_dir, 'model_best.pth.tar'))
if backup_filename is not None:
shutil.copyfile(filename,
os.path.join(checkpoint_dir, backup_filename))
def timed_generator(gen):
start = time.time()
for g in gen:
end = time.time()
t = end - start
yield g, t
start = time.time()
def timed_function(f):
def _timed_function(*args, **kwargs):
start = time.time()
ret = f(*args, **kwargs)
return ret, time.time() - start
return _timed_function
def accuracy(output, target, topk=(1, )):
"""Computes the precision@k for the specified values of k"""
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
def reduce_tensor(tensor):
rt = tensor.clone()
dist.all_reduce(rt, op=dist.ReduceOp.SUM)
rt /= torch.distributed.get_world_size(
) if torch.distributed.is_initialized() else 1
return rt
def first_n(n, generator):
for i, d in zip(range(n), generator):
yield d
@@ -0,0 +1,609 @@
import argparse
import os
import random
import shutil
import time
import warnings
import math
import numpy as np
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import torch.npu
from apex import amp
from benchmark_log import hwlog
from benchmark_log.basic_utils import get_environment_info
from benchmark_log.basic_utils import get_model_parameter
'''
python3.7 pytorch-resnet50-apex.py --data /opt/npu/dataset/imagenet --npu 7 -j64 -b512 --lr 0.2 --warmup 5 --epochs 90 --label-smoothing 0.1 --optimizer-batch-size 1024 > batch1024-lr0.2-wd.txt &
'''
BATCH_SIZE = 512
EPOCHS_SIZE = 100
TRAIN_STEP = 8000
LOG_STEP = 1
CALCULATE_DEVICE = "npu:7"
PRINT_DEVICE = "cpu"
SOURCE_DIR = "/data/imagenet"
model_names = sorted(name for name in models.__dict__
if name.islower() and not name.startswith("__")
and callable(models.__dict__[name]))
parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
parser.add_argument('--data', metavar='DIR', default=SOURCE_DIR,
help='path to dataset')
parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50',
choices=model_names,
help='model architecture: ' +
' | '.join(model_names) +
' (default: resnet18)')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N',
help='number of data loading workers (default: 8)')
parser.add_argument('--epochs', default=EPOCHS_SIZE, type=int, metavar='N',
help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=BATCH_SIZE, type=int,
metavar='N',
help='mini-batch size (default: 256), this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.1, type=float,
metavar='LR', help='initial learning rate', dest='lr')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
help='momentum')
parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float,
metavar='W', help='weight decay (default: 1e-4)',
dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=10, type=int,
metavar='N', help='print frequency (default: 10)')
parser.add_argument('--resume', default='', type=str, metavar='PATH',
help='path to latest checkpoint (default: none)')
parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
help='evaluate model on validation set')
parser.add_argument('--pretrained', dest='pretrained', action='store_true',
help='use pre-trained model')
parser.add_argument('--world-size', default=-1, type=int,
help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int,
help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
help='url used to set up distributed training')
parser.add_argument('--dist-backend', default='nccl', type=str,
help='distributed backend')
parser.add_argument('--seed', default=None, type=int,
help='seed for initializing training. ')
parser.add_argument('--gpu', default=None, type=int,
help='GPU id to use.')
parser.add_argument('--npu', default=None, type=int,
help='NPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
parser.add_argument('--warmup',
default=0,
type=int,
metavar='E',
help='number of warmup epochs')
parser.add_argument('--label-smoothing',
default=0.0,
type=float,
metavar='S',
help='label smoothing')
parser.add_argument('--optimizer-batch-size',
default=-1,
type=int,
metavar='N',
help=
'size of a total batch size, for simulating bigger batches using gradient accumulation')
parser.add_argument(
'--static-loss-scale',
type=float,
default=1,
help=
'Static loss scale, positive power of 2 values can improve fp16 convergence.')
best_acc1 = 0
def main():
args = parser.parse_args()
if args.npu is None:
args.npu = 0
global CALCULATE_DEVICE
CALCULATE_DEVICE = "npu:{}".format(args.npu)
torch.npu.set_device(CALCULATE_DEVICE)
print("use ", CALCULATE_DEVICE)
if args.seed is not None:
random.seed(args.seed)
torch.manual_seed(args.seed)
cudnn.deterministic = True
warnings.warn('You have chosen to seed training. '
'This will turn on the CUDNN deterministic setting, '
'which can slow down your training considerably! '
'You may see unexpected behavior when restarting '
'from checkpoints.')
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely '
'disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size
# needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the
# main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
global best_acc1
args.gpu = gpu
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
if args.distributed:
if args.dist_url == "env://" and args.rank == -1:
args.rank = int(os.environ["RANK"])
if args.multiprocessing_distributed:
# For multiprocessing distributed training, rank needs to be the
# global rank among all the processes
args.rank = args.rank * ngpus_per_node + gpu
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
# create model
if args.pretrained:
print("=> using pre-trained model '{}'".format(args.arch))
model = models.__dict__[args.arch](pretrained=True)
else:
print("=> creating model '{}'".format(args.arch))
model = models.__dict__[args.arch](zero_init_residual=True)
for layer in model.modules():
if isinstance(layer, nn.Linear):
torch.nn.init.kaiming_normal_(layer.weight, a=math.sqrt(5), )
if args.distributed:
# For multiprocessing distributed, DistributedDataParallel constructor
# should always set the single device scope, otherwise,
# DistributedDataParallel will use all available devices.
if args.gpu is not None:
torch.cuda.set_device(args.gpu)
model.cuda(args.gpu)
# When using a single GPU per process and per
# DistributedDataParallel, we need to divide the batch size
# ourselves based on the total number of GPUs we have
args.batch_size = int(args.batch_size / ngpus_per_node)
args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
else:
model.cuda()
# DistributedDataParallel will divide and allocate batch_size to all
# available GPUs if device_ids are not set
model = torch.nn.parallel.DistributedDataParallel(model)
elif args.gpu is not None:
torch.cuda.set_device(args.gpu)
model = model.cuda(args.gpu)
else:
# DataParallel will divide and allocate batch_size to all available GPUs
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
model.features = torch.nn.DataParallel(model.features)
model.cuda()
else:
#model = torch.nn.DataParallel(model).cuda()
model = model.to(CALCULATE_DEVICE)
lr_policy = lr_cosine_policy(args.lr,
args.warmup,
args.epochs)
# define loss function (criterion) and optimizer
#criterion = nn.CrossEntropyLoss().cuda(args.gpu)
loss = nn.CrossEntropyLoss
if args.label_smoothing > 0.0:
loss = lambda: LabelSmoothing(args.label_smoothing)
criterion = loss().to(CALCULATE_DEVICE)
optimizer = torch.optim.SGD([
{'params': [param for name, param in model.named_parameters() if name[-4:] == 'bias'], 'weight_decay': 0.0},
{'params': [param for name, param in model.named_parameters() if name[-4:] != 'bias'], 'weight_decay': args.weight_decay}],
args.lr,
momentum=args.momentum)
model, optimizer = amp.initialize(model, optimizer, opt_level="O2", loss_scale=1024, verbosity=1)
# optionally resume from a checkpoint
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
if args.npu is not None:
checkpoint = torch.load(args.resume)
elif args.gpu is None:
checkpoint = torch.load(args.resume)
else:
# Map model to be loaded to specified single gpu.
loc = 'cuda:{}'.format(args.gpu)
checkpoint = torch.load(args.resume, map_location=loc)
args.start_epoch = checkpoint['epoch']
best_acc1 = checkpoint['best_acc1']
if args.npu is not None:
best_acc1 = best_acc1.to("npu:{}".format(args.npu))
elif args.gpu is not None:
# best_acc1 may be from a checkpoint from a different GPU
best_acc1 = best_acc1.to(args.gpu)
model.load_state_dict(checkpoint['state_dict'])
#optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
cudnn.benchmark = True
# Data loading code
traindir = os.path.join(args.data, 'train')
valdir = os.path.join(args.data, 'val')
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
traindir,
transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
]))
if args.distributed:
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
else:
train_sampler = None
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
num_workers=args.workers, pin_memory=True, sampler=train_sampler)
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(valdir, transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True)
if args.evaluate:
validate(val_loader, model, criterion, args)
return
for epoch in range(args.start_epoch, args.epochs):
if args.distributed:
train_sampler.set_epoch(epoch)
#adjust_learning_rate(optimizer, epoch, args)
lr_policy(optimizer, 0, epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch, args)
# evaluate on validation set
acc1 = validate(val_loader, model, criterion, args)
# remember best acc@1 and save checkpoint
is_best = acc1 > best_acc1
best_acc1 = max(acc1, best_acc1)
file_name = "checkpoint_npu{}".format(args.npu)
modeltmp = model.cpu()
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': modeltmp.state_dict(),
# 'state_dict': model,
'best_acc1': best_acc1.to("cpu"),
# 'optimizer' : optimizer.state_dict(),
}, is_best.to("cpu"), file_name)
modeltmp.to(CALCULATE_DEVICE)
def train(train_loader, model, criterion, optimizer, epoch, args):
if args.optimizer_batch_size < 0:
batch_size_multiplier = 1
else:
tbs = 1 * args.batch_size
if args.optimizer_batch_size % tbs != 0:
print(
"Warning: simulated batch size {} is not divisible by actual batch size {}"
.format(args.optimizer_batch_size, tbs))
batch_size_multiplier = int(args.optimizer_batch_size / tbs)
print("BSM: {}".format(batch_size_multiplier))
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
optimizer.zero_grad()
end = time.time()
for i, (images, target) in enumerate(train_loader):
#with torch.autograd.profiler.profile() as prof:
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
#target = target.cuda(args.gpu, non_blocking=True)
#if 'npu' in CALCULATE_DEVICE:
# target = target.to(torch.int32)
images = images.to(CALCULATE_DEVICE, non_blocking=True)
if args.label_smoothing == 0.0:
target = target.to(torch.int32).to(CALCULATE_DEVICE, non_blocking=True)
# compute output
output = model(images)
loss = criterion(output, target)
if args.label_smoothing > 0.0:
target = target.to(torch.int32).to(CALCULATE_DEVICE, non_blocking=True)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
#loss.backward()
###############################
with amp.scale_loss(loss, optimizer) as scaled_loss:
#print("middle")
scaled_loss.backward()
optimizer_step = ((i + 1) % batch_size_multiplier) == 0
if optimizer_step:
if batch_size_multiplier != 1:
for param_group in optimizer.param_groups:
for param in param_group['params']:
param.grad /= batch_size_multiplier
optimizer.step()
optimizer.zero_grad()
if i % LOG_STEP == 0:
progress.display(i)
#print(prof.key_averages().table(sort_by="self_cpu_time_total"))
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i == TRAIN_STEP:
break
def validate(val_loader, model, criterion, args):
batch_time = AverageMeter('Time', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(val_loader),
[batch_time, losses, top1, top5],
prefix='Test: ')
# switch to evaluate mode
model.eval()
with torch.no_grad():
end = time.time()
for i, (images, target) in enumerate(val_loader):
#with torch.autograd.profiler.profile() as prof:
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
#target = target.cuda(args.gpu, non_blocking=True)
#if 'npu' in CALCULATE_DEVICE:
# target = target.to(torch.int32)
images = images.to(CALCULATE_DEVICE, non_blocking=True)
if args.label_smoothing == 0.0:
target = target.to(torch.int32).to(CALCULATE_DEVICE, non_blocking=True)
# compute output
output = model(images)
loss = criterion(output, target)
if args.label_smoothing > 0.0:
target = target.to(torch.int32).to(CALCULATE_DEVICE, non_blocking=True)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % LOG_STEP == 0:
progress.display(i)
#print(prof.key_averages().table(sort_by="self_cpu_time_total"))
# TODO: this should also be done with the ProgressMeter
print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
.format(top1=top1, top5=top5))
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP1, value="{top1.avg:.3f}".format(top1=top1))
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP5, value="{top5.avg:.3f}".format(top5=top5))
return top1.avg
def save_checkpoint(state, is_best, filename='checkpoint'):
filename2 = filename + ".pth.tar"
torch.save(state, filename2)
if is_best:
shutil.copyfile(filename2, filename+'model_best.pth.tar')
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
current_run_time=str(entries).split("Time")[1].split("Data")[0].strip().split(" ")[0]
args = parser.parse_args()
batch_size = args.batch_size
if "Epoch" in self.prefix:
if float(current_run_time) > 0:
FPS = int(batch_size)/float(current_run_time)
hwlog.remark_print(key=hwlog.FPS, value=float(FPS))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def adjust_learning_rate(optimizer, epoch, args):
"""Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
lr = args.lr * (0.1 ** (epoch // 30))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
class LabelSmoothing(nn.Module):
"""
NLL loss with label smoothing.
"""
def __init__(self, smoothing=0.0):
"""
Constructor for the LabelSmoothing module.
:param smoothing: label smoothing factor
"""
super(LabelSmoothing, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
def forward(self, x, target):
logprobs = torch.nn.functional.log_softmax(x, dim=-1).to("cpu")
nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
nll_loss = nll_loss.squeeze(1)
smooth_loss = -logprobs.mean(dim=-1)
loss = self.confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean().to(CALCULATE_DEVICE)
def lr_policy(lr_fn, logger=None):
if logger is not None:
logger.register_metric('lr',
log.LR_METER(),
verbosity=dllogger.Verbosity.VERBOSE)
def _alr(optimizer, iteration, epoch):
lr = lr_fn(iteration, epoch)
if logger is not None:
logger.log_metric('lr', lr)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return _alr
def lr_cosine_policy(base_lr, warmup_length, epochs, logger=None):
def _lr_fn(iteration, epoch):
if epoch < warmup_length:
lr = base_lr * (epoch + 1) / warmup_length
else:
e = epoch - warmup_length
es = epochs - warmup_length
lr = 0.5 * (1 + np.cos(np.pi * e / es)) * base_lr
return lr
return lr_policy(_lr_fn, logger=logger)
if __name__ == '__main__':
hwlog.ROOT_DIR = os.path.split(os.path.abspath(__file__))[0]
cpu_info, npu_info, framework_info, os_info, benchmark_version = get_environment_info("pytorch")
config_info = get_model_parameter("pytorch_config")
initinal_data = {"base_lr": 0.1, "dataset": "imagenet", "optimizer": "SGD", "loss_scale": 1024}
hwlog.remark_print(key=hwlog.CPU_INFO, value=cpu_info)
hwlog.remark_print(key=hwlog.NPU_INFO, value=npu_info)
hwlog.remark_print(key=hwlog.OS_INFO, value=os_info)
hwlog.remark_print(key=hwlog.FRAMEWORK_INFO, value=framework_info)
hwlog.remark_print(key=hwlog.BENCHMARK_VERSION, value=benchmark_version)
hwlog.remark_print(key=hwlog.CONFIG_INFO, value=config_info)
hwlog.remark_print(key=hwlog.BASE_LR, value=initinal_data.get("base_lr"))
hwlog.remark_print(key=hwlog.DATASET, value=initinal_data.get("dataset"))
hwlog.remark_print(key=hwlog.OPT_NAME, value=initinal_data.get("optimizer"))
hwlog.remark_print(key=hwlog.LOSS_SCALE, value=initinal_data.get("loss_scale"))
main()
@@ -0,0 +1,31 @@
############## toolkit situation ################
#export LD_LIBRARY_PATH=/usr/local/:/usr/local/python3.7.5/lib/:/usr/local/openblas/lib:/usr/local/lib/:/usr/lib64/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
#export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin/:/usr/local/Ascend/ascend-toolkit/latest/toolkit/tools/ide_daemon/bin/
#export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
#export OPTION_EXEC_EXTERN_PLUGIN_PATH=/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libfe.so:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libaicpu_engine.so:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libge_local_engine.so
#export PYTHONPATH=/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/auto_tune.egg/auto_tune:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/schedule_search.egg:$PYTHONPATH
############## nnae situation ################
if [ -d /usr/local/Ascend/nnae/latest ];then
export LD_LIBRARY_PATH=/usr/local/:/usr/local/python3.7.5/lib/:/usr/local/openblas/lib:/usr/local/lib/:/usr/lib64/:/usr/lib/:/usr/local/Ascend/nnae/latest/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/aarch64_64-linux-gnu:$LD_LIBRARY_PATH
export PATH=$PATH:/usr/local/Ascend/nnae/latest/fwkacllib/ccec_compiler/bin/:/usr/local/Ascend/nnae/latest/toolkit/tools/ide_daemon/bin/
export ASCEND_OPP_PATH=/usr/local/Ascend/nnae/latest/opp/
export OPTION_EXEC_EXTERN_PLUGIN_PATH=/usr/local/Ascend/nnae/latest/fwkacllib/lib64/plugin/opskernel/libfe.so:/usr/local/Ascend/nnae/latest/fwkacllib/lib64/plugin/opskernel/libaicpu_engine.so:/usr/local/Ascend/nnae/latest/fwkacllib/lib64/plugin/opskernel/libge_local_engine.so
export PYTHONPATH=/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/auto_tune.egg/auto_tune:/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/schedule_search.egg:$PYTHONPATH
else
export LD_LIBRARY_PATH=/usr/local/:/usr/local/lib/:/usr/lib64/:/usr/lib/:/usr/local/python3.7.5/lib/:/usr/local/openblas/lib:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/aarch64-linux-gnu:$LD_LIBRARY_PATH
export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin/:/usr/local/Ascend/ascend-toolkit/latest/toolkit/tools/ide_daemon/bin/
export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
export OPTION_EXEC_EXTERN_PLUGIN_PATH=/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libfe.so:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libaicpu_engine.so:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64/plugin/opskernel/libge_local_engine.so
export PYTHONPATH=/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/auto_tune.egg/auto_tune:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/schedule_search.egg:$PYTHONPATH
fi
# ln -s /usr/local/Ascend/ascend-toolkit/latest/toolkit/bin/adc /usr/local/bin/
export SLOG_PRINT_TO_STDOUT=0
#su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[error]\" --device 0"
export TASK_QUEUE_ENABLE=1
@@ -0,0 +1,79 @@
#!/bin/bash
rank_size=$1
yamlPath=$2
toolsPath=$3
# ${rank_size} ${yamlPath} ${currentDir}
if [ -f /.dockerenv ];then
CLUSTER=$4
MPIRUN_ALL_IP="$5"
export CLUSTER=${CLUSTER}
fi
currentDir=$(cd "$(dirname "$0")/.."; pwd)
# 配置环境变量并调用 train 方法
currtime=`date +%Y%m%d%H%M%S`
currtime=`date +%Y%m%d%H%M%S`
mkdir -p ${currentDir%train*}/train/result/pt_resnet50/training_job_${currtime}/
train_job_dir=${currentDir%train*}/train/result/pt_resnet50/training_job_${currtime}/
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir}"
# user env
export HCCL_CONNECT_TIMEOUT=600
export JOB_ID=9999001
export SLOG_PRINT_TO_STDOUT=0
export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
# 从 yaml 获取配置
eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "pytorch_config")
# device 列表, 若无指定 device 根据 rank_size 顺序选择
eval device_group=\$device_group_${rank_size}p
if [ x"${device_group}" == x"" ] || [ ${rank_size} -ge 8 ];then
device_group="$(seq 0 "$(expr $rank_size - 1)")"
fi
# get last device id in device_group, hw log in performance from the dir named last_device_id
device_group_str=`echo ${device_group} | sed 's/ //g'`
first_device_id=`echo ${device_group_str: 0:1}`
if [ x"${rank_size}" == x"8" ]; then
export WHICH_OP=GEOP
export NEW_GE_FE_ID=1
export GE_AICPU_FLAG=1
fi
rank_id=0
if [ x"${CLUSTER}" == x"True" ];then
# ln hw log
ln -snf ${train_job_dir}/0/hw_resnet50.log ${train_job_dir}
this_ip=$(hostname -I |awk '{print $1}')
for ip in $MPIRUN_ALL_IP;do
if [ x"$ip" != x"$this_ip" ];then
scp $yamlPath root@$ip:$yamlPath
scp $jsonFilePath root@$ip:$jsonFilePath
fi
done
export PATH=$PATH:/usr/local/mpirun4.0/bin
mpirun -H ${mpirun_ip} \
--bind-to none -map-by slot\
--allow-run-as-root \
--mca btl_tcp_if_exclude lo,docker0,endvnic,virbr0,vethf40501b,docker_gwbridge,br-f42ac38052b4\
--prefix /usr/local/mpirun4.0/ \
${currentDir}/scripts/train.sh 0 $rank_size $yamlPath $currtime ${toolsPath} ${CLUSTER}
else
# ln hw log
ln -snf ${train_job_dir}/${first_device_id}/hw_resnet50.log ${train_job_dir}
for device_id in $device_group;do
#echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] start: train ${device_id} & " >> ${currentDir}/result/main.log
${currentDir}/scripts/train.sh $device_id $rank_size $yamlPath $currtime ${toolsPath} $rank_id &
let rank_id++
done
fi
wait
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} "
@@ -0,0 +1,138 @@
#!/usr/bin/env bash
device_id=$1
rank_size=$2
yamlPath=$3
currtime=$4
toolsPath=$5
currentDir=$(cd "$(dirname "$0")/.."; pwd)
export REMARK_LOG_FILE=hw_resnet50.log
mkdir -p ${currentDir%train*}/train/result/pt_resnet50/training_job_${currtime}/
export train_job_dir=${currentDir%train*}/train/result/pt_resnet50/training_job_${currtime}/
source ${currentDir}/config/npu_set_env.sh
benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
#atlasboost_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
code_dir_path=${currentDir}/code
export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}:${code_dir_path}
# 从 yaml 获取配置
eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "pytorch_config")
# user env
export YAML_PATH=$3
export HCCL_CONNECT_TIMEOUT=600
export JOB_ID=9999001
#export HCCL_RANK_TABLE_PATH=${currentDir}/config/${rank_size}p.json
export RANK_SIZE=${rank_size}
export RANK_INDEX=0
export SLOG_PRINT_TO_STDOUT=0
export DEVICE_ID=$1
DEVICE_INDEX=$(( DEVICE_ID + RANK_INDEX * 8 ))
export DEVICE_INDEX=${DEVICE_INDEX}
export MODEL_CKPT_PATH=${train_job_dir}/${device_id}/ckpt${device_id}
cd ${train_job_dir}
curd_dir=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
export PYTHONPATH=$PYTHONPATH:${curd_dir}
if [ x"$6" != x"True" ];then
rank_id=$6
export RANK_ID=$6
else
device_id_mo=$(python3.7 -c "import src.tensorflow.mpi_ops as atlasboost;atlasboost.init(); \
device_id = atlasboost.local_rank();cluster_device_id = str(device_id); \
atlasboost.set_device_id(device_id);print(atlasboost.rank())")
device_id_mo=`echo $device_id_mo`
rank_id=${device_id_mo##* }
export RANK_ID=${rank_id}
device=${device_id_mo##*deviceid = }
device_id=${device%% phyid=*}
export DEVICE_ID=${device_id}
hccljson=${train_job_dir}/*.json
cp ${hccljson} ${currentDir}/config/${rank_size}p.json
fi
#mkdir exec path
mkdir -p ${train_job_dir}/${device_id}
cd ${train_job_dir}/${device_id}
startTime=`date +%Y%m%d-%H:%M:%S`
startTime_s=`date +%s`
if [ x"${mode}" == x"evaluate" ];then
eval_data_url="--data=${data_url} --evaluate --resume=${ckpt_path}"
else
eval_data_url="--data=${data_url}"
fi
if [ x"${rank_size}" == x"1" ];then
python3.7 ${currentDir}/code/pytorch-resnet50-apex.py \
${eval_data_url} \
--workers=64 \
--epochs=${epoches} \
--batch-size=${batch_size} \
--learning-rate=${lr} \
--warmup=5 \
--label-smoothing=0.1 \
--optimizer-batch-size=1024 \
--npu=${device_id} > ${train_job_dir}/train_1p.log 2>&1
else
export KERNEL_NAME_ID=${device_id}
rank_id=$6
python3.7 ${currentDir}/code/DistributedResnet50/main-apex-d76-npu.py \
--data=${data_url} \
--addr=$(hostname -I |awk '{print $1}') \
--seed=49 \
--workers=184 \
--learning-rate=${lr} \
--warmup=8 \
--label-smoothing=0.1 \
--mom=0.875 \
--weight-decay=3.0517578125e-05 \
--static-loss-scale=128 \
--print-freq=1 \
--dist-url='tcp://127.0.0.1:50000' \
--dist-backend='hccl' \
--multiprocessing-distributed \
--world-size=1 \
--rank=${rank_id} \
--gpu=${device_id} \
--benchmark=0 \
--device='npu' \
--epochs=${epoches} \
--device_num=${rank_size} \
--batch-size=${batch_size} > ${train_job_dir}/train_${rank_size}p.log 2>&1
fi
if [ $? -eq 0 ] ;
then
echo ":::ABK 1.0.0 resnet50 train success"
echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/train_${rank_size}p.log
echo ":::ABK 1.0.0 resnet50 train success" >> ${train_job_dir}/${device_id}/hw_resnet50.log
else
echo ":::ABK 1.0.0 resnet50 train failed"
echo ":::ABK 1.0.0 resnet50 train failed" >>${train_job_dir}/train_${rank_size}p.log
echo ":::ABK 1.0.0 resnet50 train failed" >> ${train_job_dir}/${device_id}/hw_resnet50.log
fi
endTime=`date +%Y%m%d-%H:%M:%S`
endTime_s=`date +%s`
sumTime=$[ $endTime_s - $startTime_s ]
hour=$(( $sumTime/3600 ))
min=$(( ($sumTime-${hour}*3600)/60 ))
sec=$(( $sumTime-${hour}*3600-${min}*60 ))
echo ":::ABK 1.0.0 resnet50 train total time${hour}:${min}:${sec}"
echo ":::ABK 1.0.0 resnet50 train total time${hour}:${min}:${sec}" >> ${train_job_dir}/${device_id}/hw_resnet50.log
@@ -0,0 +1,38 @@
# ResNet50_tensorflow训练说明
### 1. 模型训练参数配置
在train/yaml/ResNet50.yaml中修改相应配置, 配置项含义:
```
tensorflow_config:
data_url: 数据集路径
batch_size: 32
# 1p/8p, epoches设为90
epoches: 1
max_train_steps: 1000
epochs_between_evals: 1
iterations_per_loop: 100
save_checkpoints_steps: 115200
# 仅多机执行需要配置: ip1:卡数量1,ip2:卡数量2
mpirun_ip: 90.90.176.152:8,90.90.176.154:8
# docker 镜像名称:版本号
docker_image: c73:b02
# 指定 device id, 多个 id 使用空格分隔, 数量需与 rank_size 相同
device_group_1p: 0
device_group_2p: 0 1
device_group_4p: 0 1 2 3
```
------
@@ -0,0 +1,36 @@
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
WORKDIR /research
RUN apt-get update
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
build-essential \
git \
python \
python-pip
ENV HOME /research
ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
RUN apt-get install -y python-setuptools
RUN apt-get install -y python-pip python3-pip virtualenv htop
RUN pip3 install --upgrade numpy scipy sklearn tf-nightly-gpu
# Mount data into the docker
ADD . /research/resnet
WORKDIR /research/resnet
RUN pip3 install -r official/requirements.txt
ENTRYPOINT ["/bin/bash"]
@@ -0,0 +1,47 @@
#!/bin/sh
currentDir=$(cd "$(dirname "$0")"; pwd)
cd ${currentDir}
DEVICE_LIST=$@
export exec_type={MODE}
prog_exit()
{
if [ x"${exec_type}" = xdocker ];
then
# stop slogd progress
bash /usr/local/Ascend/driver/tools/docker_stop_post_sys.sh
fi
}
# register prog_exit
trap "prog_exit" SIGTERM
if [ x"${exec_type}" = xdocker ];
then
#set env
. ${currentDir}/npu_set_env.sh
# start slogd progress
mkdir -p /var/log/npu/slog/slogd
/usr/local/Ascend/driver/tools/docker/slogd &
# start main.sh
${currentDir}/main.sh ${DEVICE_LIST} &
# wait slogd stop
flag=1
while [ $flag -ne 0 ];
do
sleep 5;
flag=`ps -ef | grep train.sh | grep -v grep | wc -l`
ps -ef >> ${currentDir}/ps.log
echo "" >> ${currentDir}/ps.log
done
else
# start main.sh
su - HwHiAiUser -c ". ${currentDir}/npu_set_env.sh;${currentDir}/main.sh ${DEVICE_LIST}" &
wait
fi
@@ -0,0 +1,13 @@
{
"group_count": "1",
"group_list": [
{
"group_name": "worker",
"device_count": "{device_count}",
"instance_count": "{instance_count}",
"instance_list": [{instance_list}]
}
],
"status": "completed"
}
@@ -0,0 +1,18 @@
#!/bin/sh
currentDir=$(cd "$(dirname "$0")"; pwd)
cd ${currentDir}
device_group=$@
device_num=$#
touch ${currentDir}/main.log
for device_phy_id in ${device_group}
do
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] start: train.sh ${device_phy_id} & " >> ${currentDir}/main.log
${currentDir}/train.sh ${device_phy_id} &
done
wait
echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] all train.sh exit " >> ${currentDir}/main.log
@@ -0,0 +1,28 @@
# main env
export LD_LIBRARY_PATH=/usr/local/:/usr/local/lib/:/usr/lib/:/usr/local/Ascend/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/
export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/opp/op_impl/built-in/ai_core/tbe:/code
export PATH=$PATH:/usr/local/Ascend/fwkacllib/ccec_compiler/bin
export ASCEND_OPP_PATH=/usr/local/Ascend/opp
export DDK_VERSION_FLAG=1.60.T17.B830
export HCCL_CONNECT_TIMEOUT=600
# user env
export JOB_ID={JOB_ID}
export RANK_TABLE_FILE={RANK_TABLE_FILE}
export RANK_SIZE={RANK_SIZE}
export RANK_INDEX={RANK_INDEX}
export RANK_ID={RANK_ID}
# profiling env
export PROFILING_MODE={PROFILING_MODE}
export AICPU_PROFILING_MODE={AICPU_PROFILING_MODE}
export PROFILING_OPTIONS={PROFILING_OPTIONS}
export FP_POINT={FP_POINT}
export BP_POINT={BP_POINT}
# debug env
#export DUMP_GE_GRAPH=2
#export DUMP_OP=1
#export DUMP_OP_LESS=1
#export PRINT_MODEL=1
#export TE_PARALLEL_COMPILER=0
@@ -0,0 +1,33 @@
#!/bin/sh
currentDir=$(cd "$(dirname "$0")"; pwd)
cd ${currentDir}
PWD=${currentDir}
device_id=$1
if [ x"${device_id}" = x ] ;
then
echo "turing train fail" >> ${currentDir}/train_${device_id}.log
exit
else
export DEVICE_ID=${device_id}
fi
DEVICE_INDEX=$(( DEVICE_ID + RANK_INDEX * 8 ))
export DEVICE_INDEX=${DEVICE_INDEX}
env > ${currentDir}/env_${device_id}.log
#mkdir exec path
mkdir -p ${currentDir}/${device_id}
rm -rf ${currentDir}/${device_id}/*
cd ${currentDir}/${device_id}
#start exec
python3.7 {RUN_ALGORITHM_CMD} {CHECKPOINT_DIR} > ${currentDir}/train_${device_id}.log 2>&1
if [ $? -eq 0 ] ;
then
echo "turing train success" >> ${currentDir}/train_${device_id}.log
else
echo "turing train fail" >> ${currentDir}/train_${device_id}.log
fi
@@ -0,0 +1,203 @@
Copyright 2015 The TensorFlow Authors. All rights reserved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2015, The TensorFlow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
@@ -0,0 +1,20 @@
# Offically Supported TensorFlow 2.1 Models on Cloud TPU
## Natural Language Processing
* [bert](nlp/bert): A powerful pre-trained language representation model:
BERT, which stands for Bidirectional Encoder Representations from
Transformers.
[BERT FineTuning with Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/bert-2.x) provides step by step instructions on Cloud TPU training. You can look [Bert MNLI Tensorboard.dev metrics](https://tensorboard.dev/experiment/mIah5lppTASvrHqWrdr6NA) for MNLI fine tuning task.
* [transformer](nlp/transformer): A transformer model to translate the WMT
English to German dataset.
[Training transformer on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/transformer-2.x) for step by step instructions on Cloud TPU training.
## Computer Vision
* [mnist](vision/image_classification): A basic model to classify digits
from the MNIST dataset. See [Running MNIST on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/mnist-2.x) tutorial and [Tensorboard.dev metrics](https://tensorboard.dev/experiment/mIah5lppTASvrHqWrdr6NA).
* [resnet](vision/image_classification): A deep residual network that can
be used to classify ImageNet's dataset of 1000 classes.
See [Training ResNet on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/resnet-2.x) tutorial and [Tensorboard.dev metrics](https://tensorboard.dev/experiment/CxlDK8YMRrSpYEGtBRpOhg).
* [retinanet](vision/detection): A fast and powerful object detector. See [Tensorboard.dev training metrics](https://tensorboard.dev/experiment/b8NRnWU3TqG6Rw0UxueU6Q).
@@ -0,0 +1,149 @@
# TensorFlow Official Models
The TensorFlow official models are a collection of models that use
TensorFlow's high-level APIs. They are intended to be well-maintained, tested,
and kept up to date with the latest TensorFlow API. They should also be
reasonably optimized for fast performance while still being easy to read.
These models are used as end-to-end tests, ensuring that the models run with the
same or improved speed and performance with each new TensorFlow build.
## Tensorflow releases
The master branch of the models are **in development** with TensorFlow 2.x, and
they target the
[nightly binaries](https://github.com/tensorflow/tensorflow#installation) built
from the
[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
You may start from installing with pip:
```shell
pip3 install tf-nightly
```
**Stable versions** of the official models targeting releases of TensorFlow are
available as tagged branches or
[downloadable releases](https://github.com/tensorflow/models/releases). Model
repository version numbers match the target TensorFlow release, such that
[release v2.1.0](https://github.com/tensorflow/models/releases/tag/v2.1.0) are
compatible with
[TensorFlow v2.1.0](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0).
If you are on a version of TensorFlow earlier than 1.4, please
[update your installation](https://www.tensorflow.org/install/).
## Requirements
Please follow the below steps before running models in this repo:
1. TensorFlow
[nightly binaries](https://github.com/tensorflow/tensorflow#installation)
2. If users would like to clone this repo but do not care about change history,
please consider:
```shell
export repo_version="master"
git clone -b ${repo_version} https://github.com/tensorflow/models.git --depth=1
```
3. Add the top-level ***/models*** folder to the Python path with the command:
```shell
export PYTHONPATH=$PYTHONPATH:/path/to/models
```
Using Colab:
```python
import os
os.environ['PYTHONPATH'] += ":/path/to/models"
```
4. Install dependencies:
```shell
pip3 install --user -r official/requirements.txt
```
To make Official Models easier to use, we are planning to create a pip
installable Official Models package. This is being tracked in
[#917](https://github.com/tensorflow/models/issues/917).
## Available models
**NOTE: For Officially Supported TPU models please check [README-TPU](README-TPU.md).**
**NOTE:** Please make sure to follow the steps in the
[Requirements](#requirements) section.
### Natural Language Processing
* [bert](nlp/bert): A powerful pre-trained language representation model:
BERT, which stands for Bidirectional Encoder Representations from
Transformers.
* [transformer](nlp/transformer): A transformer model to translate the WMT English
to German dataset.
* [xlnet](nlp/xlnet): XLNet: Generalized Autoregressive Pretraining for
Language Understanding.
### Computer Vision
* [mnist](vision/image_classification): A basic model to classify digits from
the MNIST dataset.
* [resnet](vision/image_classification): A deep residual network that can be
used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
* [retinanet](vision/detection): A fast and powerful object detector.
### Others
* [ncf](recommendation): Neural Collaborative Filtering model for
recommendation tasks.
Models that will not update to TensorFlow 2.x stay inside R1 directory:
* [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to
classify higgs boson process from HIGGS Data Set.
* [wide_deep](r1/wide_deep): A model that combines a wide model and deep
network to classify census income data.
## More models to come!
We are in the progress to revamp official model garden with TensorFlow 2.0 and
Keras. In the near future, we will bring:
* State-of-the-art language understanding models: XLNet, GPT2, and more
members in Transformer family.
* Start-of-the-art image classification models: EfficientNet, MnasNet and
variants.
* A set of excellent objection detection models.
If you would like to make any fixes or improvements to the models, please
[submit a pull request](https://github.com/tensorflow/models/compare).
## New Models
The team is actively working to add new models to the repository. Every model
should follow the following guidelines, to uphold the our objectives of
readable, usable, and maintainable code.
**General guidelines**
* Code should be well documented and tested.
* Runnable from a blank environment with relative ease.
* Trainable on: single GPU/CPU (baseline), multiple GPUs, TPU
* Compatible with Python 3 (using [six](https://pythonhosted.org/six/) when
being compatible with Python 2 is necessary)
* Conform to [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
**Implementation guidelines**
These guidelines exist so the model implementations are consistent for better
readability and maintainability.
* Use [common utility functions](utils)
* Export SavedModel at the end of training.
* Consistent flags and flag-parsing library
([read more here](utils/flags/guidelines.md))
* Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
@@ -0,0 +1,56 @@
[
{
"description": "The ID of the benchmark run, where this metric should tie to.",
"mode": "REQUIRED",
"name": "run_id",
"type": "STRING"
},
{
"description": "The name of the metric, which should be descriptive. E.g. training_loss, accuracy.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The unit of the metric. E.g. MB per sec.",
"mode": "NULLABLE",
"name": "unit",
"type": "STRING"
},
{
"description": "The value of the metric.",
"mode": "NULLABLE",
"name": "value",
"type": "FLOAT"
},
{
"description": "The timestamp when the metric is recorded.",
"mode": "REQUIRED",
"name": "timestamp",
"type": "TIMESTAMP"
},
{
"description": "The global step when this metric is recorded.",
"mode": "NULLABLE",
"name": "global_step",
"type": "INTEGER"
},
{
"description": "Free format metadata for the extra information about the metric.",
"mode": "REPEATED",
"name": "extras",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
]
}
]
@@ -0,0 +1,368 @@
[
{
"description": "The UUID of the run for the benchmark.",
"mode": "REQUIRED",
"name": "model_id",
"type": "STRING"
},
{
"description": "The name of the model, E.g ResNet50, LeNet-5 etc.",
"mode": "REQUIRED",
"name": "model_name",
"type": "STRING"
},
{
"description": "The date when the test of the model is started",
"mode": "REQUIRED",
"name": "run_date",
"type": "TIMESTAMP"
},
{
"description": "The unique name for a test by the combination of key parameters, eg batch size, num of GPU, etc. It is hardware independent.",
"mode": "NULLABLE",
"name": "test_id",
"type": "STRING"
},
{
"description": "The tensorflow version information.",
"fields": [
{
"description": "Version of the tensorflow. E.g. 1.7.0-rc0",
"mode": "REQUIRED",
"name": "version",
"type": "STRING"
},
{
"description": "Git Hash of the tensorflow",
"mode": "NULLABLE",
"name": "git_hash",
"type": "STRING"
},
{
"description": "The channel of the tensorflow binary, eg, nightly, RC, final, custom.",
"mode": "NULLABLE",
"name": "channel",
"type": "STRING"
},
{
"description": "Identify anything special about the build, eg CUDA 10, NCCL, MKL, etc.",
"mode": "NULLABLE",
"name": "build_type",
"type": "STRING"
}
],
"mode": "REQUIRED",
"name": "tensorflow_version",
"type": "RECORD"
},
{
"description": "The arbitrary attribute of the model.",
"fields": [
{
"description": "The name of the attribute.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The value of the attribute.",
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "attribute",
"type": "RECORD"
},
{
"description": "Environment variables when the benchmark run is executed.",
"fields": [
{
"description": "The name of the variable.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The value of the variable.",
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "environment_variable",
"type": "RECORD"
},
{
"description": "TF Environment variables when the benchmark run is executed.",
"fields": [
{
"description": "The name of the variable.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The value of the variable.",
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "tensorflow_environment_variables",
"type": "RECORD"
},
{
"description": "The list of parameters run with the model. It could contain hyperparameters or others.",
"fields": [
{
"description": "The name of the parameter.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The string value of the parameter.",
"mode": "NULLABLE",
"name": "string_value",
"type": "STRING"
},
{
"description": "The bool value of the parameter.",
"mode": "NULLABLE",
"name": "bool_value",
"type": "STRING"
},
{
"description": "The int/long value of the parameter.",
"mode": "NULLABLE",
"name": "long_value",
"type": "INTEGER"
},
{
"description": "The double/float value of parameter.",
"mode": "NULLABLE",
"name": "float_value",
"type": "FLOAT"
}
],
"mode": "REPEATED",
"name": "run_parameters",
"type": "RECORD"
},
{
"description": "The dataset that run with the benchmark.",
"mode": "NULLABLE",
"name": "dataset",
"type": "RECORD",
"fields": [
{
"description": "The name of the dataset that the model is trained/validated with. E.g ImageNet, mnist.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The arbitrary attribute of the dataset.",
"fields": [
{
"description": "The name of the attribute.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The value of the attribute.",
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "attribute",
"type": "RECORD"
}
]
},
{
"description": "Used to differentiate from AWS, GCE or DGX-1 at a high level",
"mode": "NULLABLE",
"name": "test_environment",
"type": "STRING"
},
{
"description": "The machine configuration of the benchmark run.",
"mode": "NULLABLE",
"name": "machine_config",
"type": "RECORD",
"fields": [
{
"description": "The platform information of the benchmark run.",
"mode": "NULLABLE",
"name": "platform_info",
"type": "RECORD",
"fields": [
{
"description": "Eg: 64bit.",
"mode": "NULLABLE",
"name": "bits",
"type": "STRING"
},
{
"description": "Eg: ELF.",
"mode": "NULLABLE",
"name": "linkage",
"type": "STRING"
},
{
"description": "Eg: i386.",
"mode": "NULLABLE",
"name": "machine",
"type": "STRING"
},
{
"description": "Eg: 3.13.0-76-generic.",
"mode": "NULLABLE",
"name": "release",
"type": "STRING"
},
{
"description": "Eg: Linux.",
"mode": "NULLABLE",
"name": "system",
"type": "STRING"
},
{
"description": "Eg: #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016.",
"mode": "NULLABLE",
"name": "version",
"type": "STRING"
}
]
},
{
"description": "The CPU information of the benchmark run.",
"mode": "NULLABLE",
"name": "cpu_info",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "num_cores",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "num_cores_allowed",
"type": "INTEGER"
},
{
"description" : "How fast are those CPUs.",
"mode": "NULLABLE",
"name": "mhz_per_cpu",
"type": "FLOAT"
},
{
"description" : "Additional CPU info, Eg: Intel Ivybridge with HyperThreading (24 cores).",
"mode": "NULLABLE",
"name": "cpu_info",
"type": "STRING"
},
{
"description" : "What kind of cpu scaling is enabled on the host. Eg performance, ondemand, conservative, mixed.",
"mode": "NULLABLE",
"name": "cpu_governor",
"type": "STRING"
},
{
"description": "Cache size of the CPUs.",
"mode": "NULLABLE",
"name": "cache_size",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "level",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "size",
"type": "INTEGER"
}
]
}
]
},
{
"mode": "NULLABLE",
"name": "gpu_info",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "count",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "model",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "cuda_version",
"type": "STRING"
}
]
},
{
"description": "The cloud instance inforation if the benchmark run is executed on cloud",
"mode": "NULLABLE",
"name": "cloud_info",
"type": "RECORD",
"fields": [
{
"description": "The instance type, E.g. n1-standard-4.",
"mode": "NULLABLE",
"name": "instance_type",
"type": "STRING"
},
{
"description": "The arbitrary attribute of the cloud info.",
"fields": [
{
"description": "The name of the attribute.",
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"description": "The value of the attribute.",
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "attribute",
"type": "RECORD"
}
]
},
{
"mode": "NULLABLE",
"name": "memory_total",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "memory_available",
"type": "STRING"
}
]
}
]
@@ -0,0 +1,14 @@
[
{
"description": "The UUID of the run for the benchmark.",
"mode": "REQUIRED",
"name": "run_id",
"type": "STRING"
},
{
"description": "The status of the run for the benchmark. Eg, running, failed, success",
"mode": "REQUIRED",
"name": "status",
"type": "STRING"
}
]
@@ -0,0 +1,285 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Runs a ResNet model on the Cifar-10 dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import app
from absl import flags
import numpy as np
import tensorflow as tf
from official.benchmark.models import resnet_cifar_model
from official.utils.flags import core as flags_core
from official.utils.logs import logger
from official.utils.misc import distribution_utils
from official.utils.misc import keras_utils
from official.vision.image_classification.resnet import cifar_preprocessing
from official.vision.image_classification.resnet import common
LR_SCHEDULE = [ # (multiplier, epoch to start) tuples
(0.1, 91), (0.01, 136), (0.001, 182)
]
def learning_rate_schedule(current_epoch,
current_batch,
batches_per_epoch,
batch_size):
"""Handles linear scaling rule and LR decay.
Scale learning rate at epoch boundaries provided in LR_SCHEDULE by the
provided scaling factor.
Args:
current_epoch: integer, current epoch indexed from 0.
current_batch: integer, current batch in the current epoch, indexed from 0.
batches_per_epoch: integer, number of steps in an epoch.
batch_size: integer, total batch sized.
Returns:
Adjusted learning rate.
"""
del current_batch, batches_per_epoch # not used
initial_learning_rate = common.BASE_LEARNING_RATE * batch_size / 128
learning_rate = initial_learning_rate
for mult, start_epoch in LR_SCHEDULE:
if current_epoch >= start_epoch:
learning_rate = initial_learning_rate * mult
else:
break
return learning_rate
class LearningRateBatchScheduler(tf.keras.callbacks.Callback):
"""Callback to update learning rate on every batch (not epoch boundaries).
N.B. Only support Keras optimizers, not TF optimizers.
Attributes:
schedule: a function that takes an epoch index and a batch index as input
(both integer, indexed from 0) and returns a new learning rate as
output (float).
"""
def __init__(self, schedule, batch_size, steps_per_epoch):
super(LearningRateBatchScheduler, self).__init__()
self.schedule = schedule
self.steps_per_epoch = steps_per_epoch
self.batch_size = batch_size
self.epochs = -1
self.prev_lr = -1
def on_epoch_begin(self, epoch, logs=None):
if not hasattr(self.model.optimizer, 'learning_rate'):
raise ValueError('Optimizer must have a "learning_rate" attribute.')
self.epochs += 1
def on_batch_begin(self, batch, logs=None):
"""Executes before step begins."""
lr = self.schedule(self.epochs,
batch,
self.steps_per_epoch,
self.batch_size)
if not isinstance(lr, (float, np.float32, np.float64)):
raise ValueError('The output of the "schedule" function should be float.')
if lr != self.prev_lr:
self.model.optimizer.learning_rate = lr # lr should be a float here
self.prev_lr = lr
tf.compat.v1.logging.debug(
'Epoch %05d Batch %05d: LearningRateBatchScheduler '
'change learning rate to %s.', self.epochs, batch, lr)
def run(flags_obj):
"""Run ResNet Cifar-10 training and eval loop using native Keras APIs.
Args:
flags_obj: An object containing parsed flag values.
Raises:
ValueError: If fp16 is passed as it is not currently supported.
Returns:
Dictionary of training and eval stats.
"""
keras_utils.set_session_config(
enable_eager=flags_obj.enable_eager,
enable_xla=flags_obj.enable_xla)
# Execute flag override logic for better model performance
if flags_obj.tf_gpu_thread_mode:
keras_utils.set_gpu_thread_mode_and_count(
per_gpu_thread_count=flags_obj.per_gpu_thread_count,
gpu_thread_mode=flags_obj.tf_gpu_thread_mode,
num_gpus=flags_obj.num_gpus,
datasets_num_private_threads=flags_obj.datasets_num_private_threads)
common.set_cudnn_batchnorm_mode()
dtype = flags_core.get_tf_dtype(flags_obj)
if dtype == 'fp16':
raise ValueError('dtype fp16 is not supported in Keras. Use the default '
'value(fp32).')
data_format = flags_obj.data_format
if data_format is None:
data_format = ('channels_first'
if tf.test.is_built_with_cuda() else 'channels_last')
tf.keras.backend.set_image_data_format(data_format)
strategy = distribution_utils.get_distribution_strategy(
distribution_strategy=flags_obj.distribution_strategy,
num_gpus=flags_obj.num_gpus,
all_reduce_alg=flags_obj.all_reduce_alg,
num_packs=flags_obj.num_packs)
if strategy:
# flags_obj.enable_get_next_as_optional controls whether enabling
# get_next_as_optional behavior in DistributedIterator. If true, last
# partial batch can be supported.
strategy.extended.experimental_enable_get_next_as_optional = (
flags_obj.enable_get_next_as_optional
)
strategy_scope = distribution_utils.get_strategy_scope(strategy)
if flags_obj.use_synthetic_data:
distribution_utils.set_up_synthetic_data()
input_fn = common.get_synth_input_fn(
height=cifar_preprocessing.HEIGHT,
width=cifar_preprocessing.WIDTH,
num_channels=cifar_preprocessing.NUM_CHANNELS,
num_classes=cifar_preprocessing.NUM_CLASSES,
dtype=flags_core.get_tf_dtype(flags_obj),
drop_remainder=True)
else:
distribution_utils.undo_set_up_synthetic_data()
input_fn = cifar_preprocessing.input_fn
train_input_dataset = input_fn(
is_training=True,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
parse_record_fn=cifar_preprocessing.parse_record,
datasets_num_private_threads=flags_obj.datasets_num_private_threads,
dtype=dtype,
# Setting drop_remainder to avoid the partial batch logic in normalization
# layer, which triggers tf.where and leads to extra memory copy of input
# sizes between host and GPU.
drop_remainder=(not flags_obj.enable_get_next_as_optional))
eval_input_dataset = None
if not flags_obj.skip_eval:
eval_input_dataset = input_fn(
is_training=False,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
parse_record_fn=cifar_preprocessing.parse_record)
steps_per_epoch = (
cifar_preprocessing.NUM_IMAGES['train'] // flags_obj.batch_size)
lr_schedule = 0.1
if flags_obj.use_tensor_lr:
initial_learning_rate = common.BASE_LEARNING_RATE * flags_obj.batch_size / 128
lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
boundaries=list(p[1] * steps_per_epoch for p in LR_SCHEDULE),
values=[initial_learning_rate] +
list(p[0] * initial_learning_rate for p in LR_SCHEDULE))
with strategy_scope:
optimizer = common.get_optimizer(lr_schedule)
model = resnet_cifar_model.resnet56(classes=cifar_preprocessing.NUM_CLASSES)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=optimizer,
metrics=(['sparse_categorical_accuracy']
if flags_obj.report_accuracy_metrics else None),
run_eagerly=flags_obj.run_eagerly)
train_epochs = flags_obj.train_epochs
callbacks = common.get_callbacks(steps_per_epoch)
if not flags_obj.use_tensor_lr:
lr_callback = LearningRateBatchScheduler(
schedule=learning_rate_schedule,
batch_size=flags_obj.batch_size,
steps_per_epoch=steps_per_epoch)
callbacks.append(lr_callback)
# if mutliple epochs, ignore the train_steps flag.
if train_epochs <= 1 and flags_obj.train_steps:
steps_per_epoch = min(flags_obj.train_steps, steps_per_epoch)
train_epochs = 1
num_eval_steps = (cifar_preprocessing.NUM_IMAGES['validation'] //
flags_obj.batch_size)
validation_data = eval_input_dataset
if flags_obj.skip_eval:
if flags_obj.set_learning_phase_to_train:
# TODO(haoyuzhang): Understand slowdown of setting learning phase when
# not using distribution strategy.
tf.keras.backend.set_learning_phase(1)
num_eval_steps = None
validation_data = None
if not strategy and flags_obj.explicit_gpu_placement:
# TODO(b/135607227): Add device scope automatically in Keras training loop
# when not using distribition strategy.
no_dist_strat_device = tf.device('/device:GPU:0')
no_dist_strat_device.__enter__()
history = model.fit(train_input_dataset,
epochs=train_epochs,
steps_per_epoch=steps_per_epoch,
callbacks=callbacks,
validation_steps=num_eval_steps,
validation_data=validation_data,
validation_freq=flags_obj.epochs_between_evals,
verbose=2)
eval_output = None
if not flags_obj.skip_eval:
eval_output = model.evaluate(eval_input_dataset,
steps=num_eval_steps,
verbose=2)
if not strategy and flags_obj.explicit_gpu_placement:
no_dist_strat_device.__exit__()
stats = common.build_stats(history, eval_output, callbacks)
return stats
def define_cifar_flags():
common.define_keras_flags(dynamic_loss_scale=False)
flags_core.set_defaults(data_dir='/tmp/cifar10_data/cifar-10-batches-bin',
model_dir='/tmp/cifar10_model',
epochs_between_evals=10,
batch_size=128)
def main(_):
with logger.benchmark_context(flags.FLAGS):
return run(flags.FLAGS)
if __name__ == '__main__':
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
define_cifar_flags()
app.run(main)
@@ -0,0 +1,262 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""ResNet56 model for Keras adapted from tf.keras.applications.ResNet50.
# Reference:
- [Deep Residual Learning for Image Recognition](
https://arxiv.org/abs/1512.03385)
Adapted from code contributed by BigMoyan.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import tensorflow as tf
from tensorflow.python.keras import backend
from tensorflow.python.keras import initializers
from tensorflow.python.keras import layers
from tensorflow.python.keras import regularizers
BATCH_NORM_DECAY = 0.997
BATCH_NORM_EPSILON = 1e-5
L2_WEIGHT_DECAY = 2e-4
def identity_building_block(input_tensor,
kernel_size,
filters,
stage,
block,
training=None):
"""The identity block is the block that has no conv layer at shortcut.
Arguments:
input_tensor: input tensor
kernel_size: default 3, the kernel size of
middle conv layer at main path
filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
block: current block label, used for generating layer names
training: Only used if training keras model with Estimator. In other
scenarios it is handled automatically.
Returns:
Output tensor for the block.
"""
filters1, filters2 = filters
if backend.image_data_format() == 'channels_last':
bn_axis = 3
else:
bn_axis = 1
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = layers.Conv2D(filters1, kernel_size,
padding='same', use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name=conv_name_base + '2a')(input_tensor)
x = layers.BatchNormalization(
axis=bn_axis, momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON,
name=bn_name_base + '2a')(x, training=training)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters2, kernel_size,
padding='same', use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name=conv_name_base + '2b')(x)
x = layers.BatchNormalization(
axis=bn_axis, momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON,
name=bn_name_base + '2b')(x, training=training)
x = layers.add([x, input_tensor])
x = layers.Activation('relu')(x)
return x
def conv_building_block(input_tensor,
kernel_size,
filters,
stage,
block,
strides=(2, 2),
training=None):
"""A block that has a conv layer at shortcut.
Arguments:
input_tensor: input tensor
kernel_size: default 3, the kernel size of
middle conv layer at main path
filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
block: current block label, used for generating layer names
strides: Strides for the first conv layer in the block.
training: Only used if training keras model with Estimator. In other
scenarios it is handled automatically.
Returns:
Output tensor for the block.
Note that from stage 3,
the first conv layer at main path is with strides=(2, 2)
And the shortcut should have strides=(2, 2) as well
"""
filters1, filters2 = filters
if tf.keras.backend.image_data_format() == 'channels_last':
bn_axis = 3
else:
bn_axis = 1
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = layers.Conv2D(filters1, kernel_size, strides=strides,
padding='same', use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name=conv_name_base + '2a')(input_tensor)
x = layers.BatchNormalization(
axis=bn_axis, momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON,
name=bn_name_base + '2a')(x, training=training)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters2, kernel_size, padding='same', use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name=conv_name_base + '2b')(x)
x = layers.BatchNormalization(
axis=bn_axis, momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON,
name=bn_name_base + '2b')(x, training=training)
shortcut = layers.Conv2D(filters2, (1, 1), strides=strides, use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name=conv_name_base + '1')(input_tensor)
shortcut = layers.BatchNormalization(
axis=bn_axis, momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON,
name=bn_name_base + '1')(shortcut, training=training)
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
def resnet_block(input_tensor,
size,
kernel_size,
filters,
stage,
conv_strides=(2, 2),
training=None):
"""A block which applies conv followed by multiple identity blocks.
Arguments:
input_tensor: input tensor
size: integer, number of constituent conv/identity building blocks.
A conv block is applied once, followed by (size - 1) identity blocks.
kernel_size: default 3, the kernel size of
middle conv layer at main path
filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
conv_strides: Strides for the first conv layer in the block.
training: Only used if training keras model with Estimator. In other
scenarios it is handled automatically.
Returns:
Output tensor after applying conv and identity blocks.
"""
x = conv_building_block(input_tensor, kernel_size, filters, stage=stage,
strides=conv_strides, block='block_0',
training=training)
for i in range(size - 1):
x = identity_building_block(x, kernel_size, filters, stage=stage,
block='block_%d' % (i + 1), training=training)
return x
def resnet(num_blocks, classes=10, training=None):
"""Instantiates the ResNet architecture.
Arguments:
num_blocks: integer, the number of conv/identity blocks in each block.
The ResNet contains 3 blocks with each block containing one conv block
followed by (layers_per_block - 1) number of idenity blocks. Each
conv/idenity block has 2 convolutional layers. With the input
convolutional layer and the pooling layer towards the end, this brings
the total size of the network to (6*num_blocks + 2)
classes: optional number of classes to classify images into
training: Only used if training keras model with Estimator. In other
scenarios it is handled automatically.
Returns:
A Keras model instance.
"""
input_shape = (32, 32, 3)
img_input = layers.Input(shape=input_shape)
if backend.image_data_format() == 'channels_first':
x = layers.Lambda(lambda x: backend.permute_dimensions(x, (0, 3, 1, 2)),
name='transpose')(img_input)
bn_axis = 1
else: # channel_last
x = img_input
bn_axis = 3
x = layers.ZeroPadding2D(padding=(1, 1), name='conv1_pad')(x)
x = layers.Conv2D(16, (3, 3),
strides=(1, 1),
padding='valid', use_bias=False,
kernel_initializer='he_normal',
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name='conv1')(x)
x = layers.BatchNormalization(axis=bn_axis,
momentum=BATCH_NORM_DECAY,
epsilon=BATCH_NORM_EPSILON,
name='bn_conv1',)(x, training=training)
x = layers.Activation('relu')(x)
x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[16, 16],
stage=2, conv_strides=(1, 1), training=training)
x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[32, 32],
stage=3, conv_strides=(2, 2), training=training)
x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[64, 64],
stage=4, conv_strides=(2, 2), training=training)
rm_axes = [1, 2] if backend.image_data_format() == 'channels_last' else [2, 3]
x = layers.Lambda(lambda x: backend.mean(x, rm_axes), name='reduce_mean')(x)
x = layers.Dense(classes,
activation='softmax',
kernel_initializer=initializers.RandomNormal(stddev=0.01),
kernel_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
bias_regularizer=regularizers.l2(L2_WEIGHT_DECAY),
name='fc10')(x)
inputs = img_input
# Create model.
model = tf.keras.models.Model(inputs, x, name='resnet56')
return model
resnet20 = functools.partial(resnet, num_blocks=3)
resnet32 = functools.partial(resnet, num_blocks=5)
resnet56 = functools.partial(resnet, num_blocks=9)
resnet10 = functools.partial(resnet, num_blocks=110)
@@ -0,0 +1,187 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test the keras ResNet model with Cifar data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tempfile
import tensorflow as tf
from tensorflow.python.eager import context
from tensorflow.python.platform import googletest
from official.benchmark.models import resnet_cifar_main
from official.utils.misc import keras_utils
from official.utils.testing import integration
from official.vision.image_classification.resnet import cifar_preprocessing
class KerasCifarTest(googletest.TestCase):
"""Unit tests for Keras ResNet with Cifar."""
_extra_flags = [
"-batch_size", "4",
"-train_steps", "1",
"-use_synthetic_data", "true"
]
_tempdir = None
def get_temp_dir(self):
if not self._tempdir:
self._tempdir = tempfile.mkdtemp(dir=googletest.GetTempDir())
return self._tempdir
@classmethod
def setUpClass(cls): # pylint: disable=invalid-name
super(KerasCifarTest, cls).setUpClass()
resnet_cifar_main.define_cifar_flags()
def setUp(self):
super(KerasCifarTest, self).setUp()
cifar_preprocessing.NUM_IMAGES["validation"] = 4
def tearDown(self):
super(KerasCifarTest, self).tearDown()
tf.io.gfile.rmtree(self.get_temp_dir())
def test_end_to_end_no_dist_strat(self):
"""Test Keras model with 1 GPU, no distribution strategy."""
config = keras_utils.get_config_proto_v1()
tf.compat.v1.enable_eager_execution(config=config)
extra_flags = [
"-distribution_strategy", "off",
"-model_dir", "keras_cifar_no_dist_strat",
"-data_format", "channels_last",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
def test_end_to_end_graph_no_dist_strat(self):
"""Test Keras model in legacy graph mode with 1 GPU, no dist strat."""
extra_flags = [
"-enable_eager", "false",
"-distribution_strategy", "off",
"-model_dir", "keras_cifar_graph_no_dist_strat",
"-data_format", "channels_last",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
def test_end_to_end_1_gpu(self):
"""Test Keras model with 1 GPU."""
config = keras_utils.get_config_proto_v1()
tf.compat.v1.enable_eager_execution(config=config)
if context.num_gpus() < 1:
self.skipTest(
"{} GPUs are not available for this test. {} GPUs are available".
format(1, context.num_gpus()))
extra_flags = [
"-num_gpus", "1",
"-distribution_strategy", "mirrored",
"-model_dir", "keras_cifar_1_gpu",
"-data_format", "channels_last",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
def test_end_to_end_graph_1_gpu(self):
"""Test Keras model in legacy graph mode with 1 GPU."""
if context.num_gpus() < 1:
self.skipTest(
"{} GPUs are not available for this test. {} GPUs are available".
format(1, context.num_gpus()))
extra_flags = [
"-num_gpus", "1",
"-noenable_eager",
"-distribution_strategy", "mirrored",
"-model_dir", "keras_cifar_graph_1_gpu",
"-data_format", "channels_last",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
def test_end_to_end_2_gpu(self):
"""Test Keras model with 2 GPUs."""
config = keras_utils.get_config_proto_v1()
tf.compat.v1.enable_eager_execution(config=config)
if context.num_gpus() < 2:
self.skipTest(
"{} GPUs are not available for this test. {} GPUs are available".
format(2, context.num_gpus()))
extra_flags = [
"-num_gpus", "2",
"-distribution_strategy", "mirrored",
"-model_dir", "keras_cifar_2_gpu",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
def test_end_to_end_graph_2_gpu(self):
"""Test Keras model in legacy graph mode with 2 GPUs."""
if context.num_gpus() < 2:
self.skipTest(
"{} GPUs are not available for this test. {} GPUs are available".
format(2, context.num_gpus()))
extra_flags = [
"-num_gpus", "2",
"-enable_eager", "false",
"-distribution_strategy", "mirrored",
"-model_dir", "keras_cifar_graph_2_gpu",
]
extra_flags = extra_flags + self._extra_flags
integration.run_synthetic(
main=resnet_cifar_main.run,
tmp_root=self.get_temp_dir(),
extra_flags=extra_flags
)
if __name__ == "__main__":
googletest.main()
@@ -0,0 +1,259 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Executes CTL benchmarks and accuracy tests."""
from __future__ import print_function
import os
import sys
import time
# import pydevd_pycharm
# pydevd_pycharm.settrace('90.253.17.223', port=8008, stdoutToServer=True, stderrToServer=True, suspend=False)
# pylint: disable=g-bad-import-order
from absl import flags
import tensorflow as tf
#sys.path.append(r"/home/wx933135/0708/ResNet50/tensorflow/code")
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)),'../../../../../../utils/atlasboost'))
from official.r1.resnet import imagenet_main
from official.utils.testing.perfzero_benchmark import PerfZeroBenchmark
from official.utils.testing import benchmark_wrappers
from official.utils.flags import core as flags_core
from benchmark_log import hwlog
from benchmark_log.basic_utils import get_environment_info
from benchmark_log.basic_utils import get_model_parameter
MIN_TOP_1_ACCURACY = 0.76
MAX_TOP_1_ACCURACY = 0.77
flags.DEFINE_integer('iterations_per_loop', 1000,'iterations per loop')
flags.DEFINE_integer('save_checkpoints_steps', 115200,'save checkpoints steps')
FLAGS = flags.FLAGS
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)),'../../../config'))
class CtlBenchmark(PerfZeroBenchmark):
"""Base benchmark class with methods to simplify testing."""
def __init__(self, output_dir=None, default_flags=None, flag_methods=None):
self.output_dir = output_dir
self.default_flags = default_flags or {}
self.flag_methods = flag_methods or {}
super(CtlBenchmark, self).__init__(
output_dir=self.output_dir,
default_flags=self.default_flags,
flag_methods=self.flag_methods)
def _report_benchmark(self,
stats,
wall_time_sec,
top_1_max=None,
top_1_min=None,
total_batch_size=None,
log_steps=None,
warmup=1):
"""Report benchmark results by writing to local protobuf file.
Args:
stats: dict returned from keras models with known entries.
wall_time_sec: the during of the benchmark execution in seconds
top_1_max: highest passing level for top_1 accuracy.
top_1_min: lowest passing level for top_1 accuracy.
total_batch_size: Global batch-size.
log_steps: How often the log was created for stats['step_timestamp_log'].
warmup: number of entries in stats['step_timestamp_log'] to ignore.
"""
metrics = []
if 'eval_acc' in stats:
metrics.append({
'name': 'accuracy_top_1',
'value': stats['eval_acc'],
'min_value': top_1_min,
'max_value': top_1_max
})
metrics.append({'name': 'eval_loss', 'value': stats['eval_loss']})
metrics.append({
'name': 'top_1_train_accuracy',
'value': stats['train_acc']
})
metrics.append({'name': 'train_loss', 'value': stats['train_loss']})
if (warmup and 'step_timestamp_log' in stats and
len(stats['step_timestamp_log']) > warmup + 1):
# first entry in the time_log is start of step 0. The rest of the
# entries are the end of each step recorded
time_log = stats['step_timestamp_log']
steps_elapsed = time_log[-1].batch_index - time_log[warmup].batch_index
time_elapsed = time_log[-1].timestamp - time_log[warmup].timestamp
examples_per_sec = total_batch_size * (steps_elapsed / time_elapsed)
metrics.append({'name': 'exp_per_second', 'value': examples_per_sec})
if 'avg_exp_per_second' in stats:
metrics.append({
'name': 'avg_exp_per_second',
'value': stats['avg_exp_per_second']
})
print("start flags_core.get_nondefault_flags_as_str")
flags_str = flags_core.get_nondefault_flags_as_str()
self.report_benchmark(
iters=-1,
wall_time=wall_time_sec,
metrics=metrics,
extras={'flags': flags_str})
class Resnet50CtlAccuracy(CtlBenchmark):
"""Benchmark accuracy tests for ResNet50 in CTL."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
"""A benchmark class.
Args:
output_dir: directory where to output e.g. log files
root_data_dir: directory under which to look for dataset
**kwargs: arbitrary named arguments. This is needed to make the
constructor forward compatible in case PerfZero provides more named
arguments before updating the constructor.
"""
# flag_methods = [common.define_keras_flags]
self.data_dir = os.path.join(root_data_dir, 'imagenet')
super(Resnet50CtlAccuracy, self).__init__(
output_dir=output_dir, flag_methods=flags)
# @benchmark_wrappers.enable_runtime_flags
def _run_and_report_benchmark(self):
start_time_sec = time.time()
stats = imagenet_main.main(flags.FLAGS)
wall_time_sec = time.time() - start_time_sec
super(Resnet50CtlAccuracy, self)._report_benchmark(
stats,
wall_time_sec,
top_1_min=MIN_TOP_1_ACCURACY,
top_1_max=MAX_TOP_1_ACCURACY,
total_batch_size=FLAGS.batch_size,
log_steps=100)
def _get_model_dir(self, folder_name):
return os.path.join(self.output_dir, folder_name)
class Resnet50CtlBenchmarkBase(CtlBenchmark):
"""Resnet50 benchmarks."""
def __init__(self, output_dir=None, default_flags=None):
super(Resnet50CtlBenchmarkBase, self).__init__(
output_dir=output_dir,
flag_methods=flags,
default_flags=default_flags)
# @benchmark_wrappers.enable_runtime_flags
def _run_and_report_benchmark(self):
start_time_sec = time.time()
stats = imagenet_main.benchmark_main()
wall_time_sec = time.time() - start_time_sec
# Number of logged step time entries that are excluded in performance
# report. We keep results from last 100 batches in this case.
warmup = (FLAGS.train_steps - 100) // FLAGS.log_steps
super(Resnet50CtlBenchmarkBase, self)._report_benchmark(
stats,
wall_time_sec,
total_batch_size=FLAGS.batch_size,
log_steps=FLAGS.log_steps,
warmup=warmup)
def benchmark_1_npu_fp16(self, config_dict, cluster_device_id):
"""Test v1 model with 1 NPU with tf mixed precision."""
print("start benchmark_1_npu_fp16")
FLAGS.resnet_size = 50
FLAGS.resnet_version = 1
# FLAGS.max_train_steps = 1000 # this is not global step , only the step per epoch. default is according to train images
FLAGS.max_train_steps = config_dict.get('max_train_steps')
FLAGS.hooks = ['examplespersecondhook']
#FLAGS.data_dir = '/home/w00563133/data/resnet/imagenet_TF'
FLAGS.data_dir = config_dict.get('data_dir')
FLAGS.model_dir = os.getenv('MODEL_CKPT_PATH')
FLAGS.train_epochs = config_dict.get('train_epochs')
FLAGS.batch_size = config_dict.get('batch_size')
# FLAGS.epochs_between_evals = 1
FLAGS.epochs_between_evals = config_dict.get('epochs_between_evals')
FLAGS.iterations_per_loop = config_dict.get('iterations_per_loop')
FLAGS.save_checkpoints_steps = config_dict.get('save_checkpoints_steps')
FLAGS.stop_threshold = MIN_TOP_1_ACCURACY
self._run_and_report_benchmark()
class Resnet50CtlBenchmarkReal(Resnet50CtlBenchmarkBase):
"""Resnet50 real data benchmark tests."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
# def_flags['skip_eval'] = True
# def_flags['data_dir'] = os.path.join(root_data_dir, 'imagenet')
# def_flags['train_steps'] = 110
# def_flags['steps_per_loop'] = 20
# def_flags['log_steps'] = 10
super(Resnet50CtlBenchmarkReal, self).__init__(
output_dir=output_dir, default_flags=def_flags)
if __name__ == '__main__':
hwlog.ROOT_DIR = os.path.split(os.path.abspath(__file__))[0]
cpu_info, npu_info, framework_info, os_info, benchmark_version = get_environment_info("tensorflow")
config_info = get_model_parameter("tensorflow_config")
initinal_data = {"base_lr": 0.128, "dataset": "imagenet1024", "optimizer": "SGD", "loss_scale": 512}
hwlog.remark_print(key=hwlog.CPU_INFO, value=cpu_info)
hwlog.remark_print(key=hwlog.NPU_INFO, value=npu_info)
hwlog.remark_print(key=hwlog.OS_INFO, value=os_info)
hwlog.remark_print(key=hwlog.FRAMEWORK_INFO, value=framework_info)
hwlog.remark_print(key=hwlog.BENCHMARK_VERSION, value=benchmark_version)
hwlog.remark_print(key=hwlog.CONFIG_INFO, value=config_info)
hwlog.remark_print(key=hwlog.BASE_LR, value=initinal_data.get("base_lr"))
hwlog.remark_print(key=hwlog.DATASET, value=initinal_data.get("dataset"))
hwlog.remark_print(key=hwlog.OPT_NAME, value=initinal_data.get("optimizer"))
hwlog.remark_print(key=hwlog.LOSS_SCALE, value=initinal_data.get("loss_scale"))
hwlog.remark_print(key=hwlog.INPUT_BATCH_SIZE, value=initinal_data.get("batchsize"))
cluster_device_id = None
rank_count = sys.argv[1]
if rank_count == "1":
from resnet_config_1p_npu import resnet50_config
elif rank_count == "2":
from resnet_config_2p_npu import resnet50_config
elif rank_count == "4":
from resnet_config_4p_npu import resnet50_config
elif rank_count == "16":
from resnet_config_16p_npu import resnet50_config
elif rank_count == "32":
from resnet_config_32p_npu import resnet50_config
else:
from resnet_config_8p_npu import resnet50_config
config_dict = resnet50_config()
print("config dict info is {}".format(config_dict))
imagenet_main.benchmark_pre()
test=Resnet50CtlBenchmarkReal("./result","./result")
test.benchmark_1_npu_fp16(config_dict, cluster_device_id)
@@ -0,0 +1,19 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Activations package definition."""
from official.modeling.activations.gelu import gelu
from official.modeling.activations.swish import hard_swish
from official.modeling.activations.swish import identity
from official.modeling.activations.swish import simple_swish
@@ -0,0 +1,40 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Gaussian error linear unit."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import tensorflow as tf
@tf.keras.utils.register_keras_serializable(package='Text')
def gelu(x):
"""Gaussian Error Linear Unit.
This is a smoother version of the RELU.
Original paper: https://arxiv.org/abs/1606.08415
Args:
x: float Tensor to perform activation.
Returns:
`x` with the GELU activation applied.
"""
cdf = 0.5 * (1.0 + tf.tanh(
(math.sqrt(2 / math.pi) * (x + 0.044715 * tf.pow(x, 3)))))
return x * cdf
@@ -0,0 +1,38 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for the Gaussian error linear unit."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import
from official.modeling import activations
@keras_parameterized.run_all_keras_modes
class GeluTest(keras_parameterized.TestCase):
def test_gelu(self):
expected_data = [[0.14967535, 0., -0.10032465],
[-0.15880796, -0.04540223, 2.9963627]]
gelu_data = activations.gelu([[.25, 0, -.25], [-1, -2, 3]])
self.assertAllClose(expected_data, gelu_data)
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,75 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Customized Swish activation."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
@tf.keras.utils.register_keras_serializable(package='Text')
def simple_swish(features):
"""Computes the Swish activation function.
The tf.nn.swish operation uses a custom gradient to reduce memory usage.
Since saving custom gradients in SavedModel is currently not supported, and
one would not be able to use an exported TF-Hub module for fine-tuning, we
provide this wrapper that can allow to select whether to use the native
TensorFlow swish operation, or whether to use a customized operation that
has uses default TensorFlow gradient computation.
Args:
features: A `Tensor` representing preactivation values.
Returns:
The activation value.
"""
features = tf.convert_to_tensor(features)
return features * tf.nn.sigmoid(features)
@tf.keras.utils.register_keras_serializable(package='Text')
def hard_swish(features):
"""Computes a hard version of the swish function.
This operation can be used to reduce computational cost and improve
quantization for edge devices.
Args:
features: A `Tensor` representing preactivation values.
Returns:
The activation value.
"""
features = tf.convert_to_tensor(features)
return features * tf.nn.relu6(features + tf.constant(3.)) * (1. / 6.)
@tf.keras.utils.register_keras_serializable(package='Text')
def identity(features):
"""Computes the identity function.
Useful for helping in quantization.
Args:
features: A `Tensor` representing preactivation values.
Returns:
The activation value.
"""
features = tf.convert_to_tensor(features)
return tf.identity(features)
@@ -0,0 +1,49 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for the customized Swish activation."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import
from official.modeling import activations
@keras_parameterized.run_all_keras_modes
class CustomizedSwishTest(keras_parameterized.TestCase):
def _hard_swish_np(self, x):
x = np.float32(x)
return x * np.clip(x + 3, 0, 6) / 6
def test_simple_swish(self):
features = [[.25, 0, -.25], [-1, -2, 3]]
customized_swish_data = activations.simple_swish(features)
swish_data = tf.nn.swish(features)
self.assertAllClose(customized_swish_data, swish_data)
def test_hard_swish(self):
features = [[.25, 0, -.25], [-1, -2, 3]]
customized_swish_data = activations.hard_swish(features)
swish_data = self._hard_swish_np(features)
self.assertAllClose(customized_swish_data, swish_data)
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,318 @@
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Base configurations to standardize experiments."""
from __future__ import absolute_import
from __future__ import division
# from __future__ import google_type_annotations
from __future__ import print_function
import copy
import functools
from typing import Any, List, Mapping, Optional, Type
import dataclasses
import tensorflow as tf
import yaml
from official.modeling.hyperparams import params_dict
@dataclasses.dataclass
class Config(params_dict.ParamsDict):
"""The base configuration class that supports YAML/JSON based overrides.
* It recursively enforces a whitelist of basic types and container types, so
it avoids surprises with copy and reuse caused by unanticipated types.
* It converts dict to Config even within sequences,
e.g. for config = Config({'key': [([{'a': 42}],)]),
type(config.key[0][0][0]) is Config rather than dict.
"""
# It's safe to add bytes and other immutable types here.
IMMUTABLE_TYPES = (str, int, float, bool, type(None))
# It's safe to add set, frozenset and other collections here.
SEQUENCE_TYPES = (list, tuple)
default_params: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None
restrictions: dataclasses.InitVar[Optional[List[str]]] = None
@classmethod
def _isvalidsequence(cls, v):
"""Check if the input values are valid sequences.
Args:
v: Input sequence.
Returns:
True if the sequence is valid. Valid sequence includes the sequence
type in cls.SEQUENCE_TYPES and element type is in cls.IMMUTABLE_TYPES or
is dict or ParamsDict.
"""
if not isinstance(v, cls.SEQUENCE_TYPES):
return False
return (all(isinstance(e, cls.IMMUTABLE_TYPES) for e in v) or
all(isinstance(e, dict) for e in v) or
all(isinstance(e, params_dict.ParamsDict) for e in v))
@classmethod
def _import_config(cls, v, subconfig_type):
"""Returns v with dicts converted to Configs, recursively."""
if not issubclass(subconfig_type, params_dict.ParamsDict):
raise TypeError(
'Subconfig_type should be subclass of ParamsDict, found {!r}'.format(
subconfig_type))
if isinstance(v, cls.IMMUTABLE_TYPES):
return v
elif isinstance(v, cls.SEQUENCE_TYPES):
# Only support one layer of sequence.
if not cls._isvalidsequence(v):
raise TypeError(
'Invalid sequence: only supports single level {!r} of {!r} or '
'dict or ParamsDict found: {!r}'.format(cls.SEQUENCE_TYPES,
cls.IMMUTABLE_TYPES, v))
import_fn = functools.partial(
cls._import_config, subconfig_type=subconfig_type)
return type(v)(map(import_fn, v))
elif isinstance(v, params_dict.ParamsDict):
# Deepcopy here is a temporary solution for preserving type in nested
# Config object.
return copy.deepcopy(v)
elif isinstance(v, dict):
return subconfig_type(v)
else:
raise TypeError('Unknown type: {!r}'.format(type(v)))
@classmethod
def _export_config(cls, v):
"""Returns v with Configs converted to dicts, recursively."""
if isinstance(v, cls.IMMUTABLE_TYPES):
return v
elif isinstance(v, cls.SEQUENCE_TYPES):
return type(v)(map(cls._export_config, v))
elif isinstance(v, params_dict.ParamsDict):
return v.as_dict()
elif isinstance(v, dict):
raise TypeError('dict value not supported in converting.')
else:
raise TypeError('Unknown type: {!r}'.format(type(v)))
@classmethod
def _get_subconfig_type(cls, k) -> Type[params_dict.ParamsDict]:
"""Get element type by the field name.
Args:
k: the key/name of the field.
Returns:
Config as default. If a type annotation is found for `k`,
1) returns the type of the annotation if it is subtype of ParamsDict;
2) returns the element type if the annotation of `k` is List[SubType]
or Tuple[SubType].
"""
subconfig_type = Config
if k in cls.__annotations__:
# Directly Config subtype.
type_annotation = cls.__annotations__[k]
if (isinstance(type_annotation, type) and
issubclass(type_annotation, Config)):
subconfig_type = cls.__annotations__[k]
else:
# Check if the field is a sequence of subtypes.
field_type = getattr(type_annotation, '__origin__', type(None))
if (isinstance(field_type, type) and
issubclass(field_type, cls.SEQUENCE_TYPES)):
element_type = getattr(type_annotation, '__args__', [type(None)])[0]
subconfig_type = (
element_type if issubclass(element_type, params_dict.ParamsDict)
else subconfig_type)
return subconfig_type
def __post_init__(self, default_params, restrictions, *args, **kwargs):
super().__init__(default_params=default_params,
restrictions=restrictions,
*args,
**kwargs)
def _set(self, k, v):
"""Overrides same method in ParamsDict.
Also called by ParamsDict methods.
Args:
k: key to set.
v: value.
Raises:
RuntimeError
"""
subconfig_type = self._get_subconfig_type(k)
if isinstance(v, dict):
if k not in self.__dict__ or not self.__dict__[k]:
# If the key not exist or the value is None, a new Config-family object
# sould be created for the key.
self.__dict__[k] = subconfig_type(v)
else:
self.__dict__[k].override(v)
else:
self.__dict__[k] = self._import_config(v, subconfig_type)
def __setattr__(self, k, v):
if k not in self.RESERVED_ATTR:
if getattr(self, '_locked', False):
raise ValueError('The Config has been locked. ' 'No change is allowed.')
self._set(k, v)
def _override(self, override_dict, is_strict=True):
"""Overrides same method in ParamsDict.
Also called by ParamsDict methods.
Args:
override_dict: dictionary to write to .
is_strict: If True, not allows to add new keys.
Raises:
KeyError: overriding reserved keys or keys not exist (is_strict=True).
"""
for k, v in sorted(override_dict.items()):
if k in self.RESERVED_ATTR:
raise KeyError('The key {!r} is internally reserved. '
'Can not be overridden.'.format(k))
if k not in self.__dict__:
if is_strict:
raise KeyError('The key {!r} does not exist in {!r}. '
'To extend the existing keys, use '
'`override` with `is_strict` = False.'.format(
k, type(self)))
else:
self._set(k, v)
else:
if isinstance(v, dict) and self.__dict__[k]:
self.__dict__[k]._override(v, is_strict) # pylint: disable=protected-access
elif isinstance(v, params_dict.ParamsDict) and self.__dict__[k]:
self.__dict__[k]._override(v.as_dict(), is_strict) # pylint: disable=protected-access
else:
self._set(k, v)
def as_dict(self):
"""Returns a dict representation of params_dict.ParamsDict.
For the nested params_dict.ParamsDict, a nested dict will be returned.
"""
return {
k: self._export_config(v)
for k, v in self.__dict__.items()
if k not in self.RESERVED_ATTR
}
def replace(self, **kwargs):
"""Like `override`, but returns a copy with the current config unchanged."""
params = self.__class__(self)
params.override(kwargs, is_strict=True)
return params
@classmethod
def from_yaml(cls, file_path: str):
# Note: This only works if the Config has all default values.
with tf.io.gfile.GFile(file_path, 'r') as f:
loaded = yaml.load(f)
config = cls()
config.override(loaded)
return config
@classmethod
def from_json(cls, file_path: str):
"""Wrapper for `from_yaml`."""
return cls.from_yaml(file_path)
@classmethod
def from_args(cls, *args, **kwargs):
"""Builds a config from the given list of arguments."""
attributes = list(cls.__annotations__.keys())
default_params = {a: p for a, p in zip(attributes, args)}
default_params.update(kwargs)
return cls(default_params)
@dataclasses.dataclass
class RuntimeConfig(Config):
"""High-level configurations for Runtime.
These include parameters that are not directly related to the experiment,
e.g. directories, accelerator type, etc.
Attributes:
distribution_strategy: e.g. 'mirrored', 'tpu', etc.
enable_eager: Whether or not to enable eager mode.
enable_xla: Whether or not to enable XLA.
per_gpu_thread_count: thread count per GPU.
gpu_threads_enabled: Whether or not GPU threads are enabled.
gpu_thread_mode: Whether and how the GPU device uses its own threadpool.
dataset_num_private_threads: Number of threads for a private threadpool
created for all datasets computation.
tpu: The address of the TPU to use, if any.
num_gpus: The number of GPUs to use, if any.
worker_hosts: comma-separated list of worker ip:port pairs for running
multi-worker models with DistributionStrategy.
task_index: If multi-worker training, the task index of this worker.
all_reduce_alg: Defines the algorithm for performing all-reduce.
num_packs: Sets `num_packs` in the cross device ops used in
MirroredStrategy. For details, see tf.distribute.NcclAllReduce.
"""
distribution_strategy: str = 'mirrored'
enable_eager: bool = False
enable_xla: bool = False
gpu_threads_enabled: bool = False
gpu_thread_mode: Optional[str] = None
dataset_num_private_threads: Optional[int] = None
per_gpu_thread_count: int = 0
tpu: Optional[str] = None
num_gpus: int = 0
worker_hosts: Optional[str] = None
task_index: int = -1
all_reduce_alg: Optional[str] = None
num_packs: int = 1
@dataclasses.dataclass
class TensorboardConfig(Config):
"""Configuration for Tensorboard.
Attributes:
track_lr: Whether or not to track the learning rate in Tensorboard. Defaults
to True.
write_model_weights: Whether or not to write the model weights as
images in Tensorboard. Defaults to False.
"""
track_lr: bool = True
write_model_weights: bool = False
@dataclasses.dataclass
class CallbacksConfig(Config):
"""Configuration for Callbacks.
Attributes:
enable_checkpoint_and_export: Whether or not to enable checkpoints as a
Callback. Defaults to True.
enable_tensorboard: Whether or not to enable Tensorboard as a Callback.
Defaults to True.
"""
enable_checkpoint_and_export: bool = True
enable_tensorboard: bool = True
@@ -0,0 +1,299 @@
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import pprint
from typing import List, Tuple
from absl.testing import parameterized
import dataclasses
import tensorflow as tf
from official.modeling.hyperparams import base_config
@dataclasses.dataclass
class DumpConfig1(base_config.Config):
a: int = 1
b: str = 'text'
@dataclasses.dataclass
class DumpConfig2(base_config.Config):
c: int = 2
d: str = 'text'
e: DumpConfig1 = DumpConfig1()
@dataclasses.dataclass
class DumpConfig3(DumpConfig2):
f: int = 2
g: str = 'text'
h: List[DumpConfig1] = dataclasses.field(
default_factory=lambda: [DumpConfig1(), DumpConfig1()])
g: Tuple[DumpConfig1, ...] = (DumpConfig1(),)
class BaseConfigTest(parameterized.TestCase, tf.test.TestCase):
def assertHasSameTypes(self, c, d, msg=''):
"""Checks if a Config has the same structure as a given dict.
Args:
c: the Config object to be check.
d: the reference dict object.
msg: The error message to show when type mismatched.
"""
# Make sure d is not a Config. Assume d is either
# dictionary or primitive type and c is the Config or primitive types.
self.assertNotIsInstance(d, base_config.Config)
if isinstance(d, base_config.Config.IMMUTABLE_TYPES):
self.assertEqual(pprint.pformat(c), pprint.pformat(d), msg=msg)
elif isinstance(d, base_config.Config.SEQUENCE_TYPES):
self.assertEqual(type(c), type(d), msg=msg)
for i, v in enumerate(d):
self.assertHasSameTypes(c[i], v, msg='{}[{!r}]'.format(msg, i))
elif isinstance(d, dict):
self.assertIsInstance(c, base_config.Config, msg=msg)
for k, v in sorted(d.items()):
self.assertHasSameTypes(getattr(c, k), v, msg='{}[{!r}]'.format(msg, k))
else:
raise TypeError('Unknown type: %r' % type(d))
def assertImportExport(self, v):
config = base_config.Config({'key': v})
back = config.as_dict()['key']
self.assertEqual(pprint.pformat(back), pprint.pformat(v))
self.assertHasSameTypes(config.key, v, msg='=%s v' % pprint.pformat(v))
def test_invalid_keys(self):
params = base_config.Config()
with self.assertRaises(AttributeError):
_ = params.a
def test_nested_config_types(self):
config = DumpConfig3()
self.assertIsInstance(config.e, DumpConfig1)
self.assertIsInstance(config.h[0], DumpConfig1)
self.assertIsInstance(config.h[1], DumpConfig1)
self.assertIsInstance(config.g[0], DumpConfig1)
config.override({'e': {'a': 2, 'b': 'new text'}})
self.assertIsInstance(config.e, DumpConfig1)
self.assertEqual(config.e.a, 2)
self.assertEqual(config.e.b, 'new text')
config.override({'h': [{'a': 3, 'b': 'new text 2'}]})
self.assertIsInstance(config.h[0], DumpConfig1)
self.assertLen(config.h, 1)
self.assertEqual(config.h[0].a, 3)
self.assertEqual(config.h[0].b, 'new text 2')
config.override({'g': [{'a': 4, 'b': 'new text 3'}]})
self.assertIsInstance(config.g[0], DumpConfig1)
self.assertLen(config.g, 1)
self.assertEqual(config.g[0].a, 4)
self.assertEqual(config.g[0].b, 'new text 3')
@parameterized.parameters(
('_locked', "The key '_locked' is internally reserved."),
('_restrictions', "The key '_restrictions' is internally reserved."),
('aa', "The key 'aa' does not exist."),
)
def test_key_error(self, key, msg):
params = base_config.Config()
with self.assertRaisesRegex(KeyError, msg):
params.override({key: True})
@parameterized.parameters(
('str data',),
(123,),
(1.23,),
(None,),
(['str', 1, 2.3, None],),
(('str', 1, 2.3, None),),
)
def test_import_export_immutable_types(self, v):
self.assertImportExport(v)
out = base_config.Config({'key': v})
self.assertEqual(pprint.pformat(v), pprint.pformat(out.key))
def test_override_is_strict_true(self):
params = base_config.Config({
'a': 'aa',
'b': 2,
'c': {
'c1': 'cc',
'c2': 20
}
})
params.override({'a': 2, 'c': {'c1': 'ccc'}}, is_strict=True)
self.assertEqual(params.a, 2)
self.assertEqual(params.c.c1, 'ccc')
with self.assertRaises(KeyError):
params.override({'d': 'ddd'}, is_strict=True)
with self.assertRaises(KeyError):
params.override({'c': {'c3': 30}}, is_strict=True)
config = base_config.Config({'key': [{'a': 42}]})
config.override({'key': [{'b': 43}]})
self.assertEqual(config.key[0].b, 43)
with self.assertRaisesRegex(AttributeError, 'The key `a` does not exist'):
_ = config.key[0].a
@parameterized.parameters(
(lambda x: x, 'Unknown type'),
(object(), 'Unknown type'),
(set(), 'Unknown type'),
(frozenset(), 'Unknown type'),
)
def test_import_unsupport_types(self, v, msg):
with self.assertRaisesRegex(TypeError, msg):
_ = base_config.Config({'key': v})
@parameterized.parameters(
({
'a': [{
'b': 2,
}, {
'c': 3,
}]
},),
({
'c': [{
'f': 1.1,
}, {
'h': [1, 2],
}]
},),
(({
'a': 'aa',
'b': 2,
'c': {
'c1': 10,
'c2': 20,
}
},),),
)
def test_import_export_nested_structure(self, d):
self.assertImportExport(d)
@parameterized.parameters(
([{
'a': 42,
'b': 'hello',
'c': 1.2
}],),
(({
'a': 42,
'b': 'hello',
'c': 1.2
},),),
)
def test_import_export_nested_sequences(self, v):
self.assertImportExport(v)
@parameterized.parameters(
([([{}],)],),
([['str', 1, 2.3, None]],),
((('str', 1, 2.3, None),),),
([
('str', 1, 2.3, None),
],),
([
('str', 1, 2.3, None),
],),
([[{
'a': 42,
'b': 'hello',
'c': 1.2
}]],),
([[[{
'a': 42,
'b': 'hello',
'c': 1.2
}]]],),
((({
'a': 42,
'b': 'hello',
'c': 1.2
},),),),
(((({
'a': 42,
'b': 'hello',
'c': 1.2
},),),),),
([({
'a': 42,
'b': 'hello',
'c': 1.2
},)],),
(([{
'a': 42,
'b': 'hello',
'c': 1.2
}],),),
)
def test_import_export_unsupport_sequence(self, v):
with self.assertRaisesRegex(TypeError,
'Invalid sequence: only supports single level'):
_ = base_config.Config({'key': v})
def test_construct_subtype(self):
pass
def test_import_config(self):
params = base_config.Config({'a': [{'b': 2}, {'c': {'d': 3}}]})
self.assertLen(params.a, 2)
self.assertEqual(params.a[0].b, 2)
self.assertEqual(type(params.a[0]), base_config.Config)
self.assertEqual(pprint.pformat(params.a[0].b), '2')
self.assertEqual(type(params.a[1]), base_config.Config)
self.assertEqual(type(params.a[1].c), base_config.Config)
self.assertEqual(pprint.pformat(params.a[1].c.d), '3')
def test_override(self):
params = base_config.Config({'a': [{'b': 2}, {'c': {'d': 3}}]})
params.override({'a': [{'b': 4}, {'c': {'d': 5}}]}, is_strict=False)
self.assertEqual(type(params.a), list)
self.assertEqual(type(params.a[0]), base_config.Config)
self.assertEqual(pprint.pformat(params.a[0].b), '4')
self.assertEqual(type(params.a[1]), base_config.Config)
self.assertEqual(type(params.a[1].c), base_config.Config)
self.assertEqual(pprint.pformat(params.a[1].c.d), '5')
@parameterized.parameters(
([{}],),
(({},),),
)
def test_config_vs_params_dict(self, v):
d = {'key': v}
self.assertEqual(type(base_config.Config(d).key[0]), base_config.Config)
self.assertEqual(type(base_config.params_dict.ParamsDict(d).key[0]), dict)
def test_ppformat(self):
self.assertEqual(
pprint.pformat([
's', 1, 1.0, True, None, {}, [], (), {
(2,): (3, [4], {
6: 7,
}),
8: 9,
}
]),
"['s', 1, 1.0, True, None, {}, [], (), {8: 9, (2,): (3, [4], {6: 7})}]")
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,410 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A parameter dictionary class which supports the nest structure."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import copy
import re
import six
import tensorflow as tf
import yaml
# regex pattern that matches on key-value pairs in a comma-separated
# key-value pair string. It splits each k-v pair on the = sign, and
# matches on values that are within single quotes, double quotes, single
# values (e.g. floats, ints, etc.), and a lists within brackets.
_PARAM_RE = re.compile(r"""
(?P<name>[a-zA-Z][\w\.]*) # variable name: "var" or "x"
\s*=\s*
((?P<val>\'(.*?)\' # single quote
|
\"(.*?)\" # double quote
|
[^,\[]* # single value
|
\[[^\]]*\])) # list of values
($|,\s*)""", re.VERBOSE)
class ParamsDict(object):
"""A hyperparameter container class."""
RESERVED_ATTR = ['_locked', '_restrictions']
def __init__(self, default_params=None, restrictions=None):
"""Instantiate a ParamsDict.
Instantiate a ParamsDict given a set of default parameters and a list of
restrictions. Upon initialization, it validates itself by checking all the
defined restrictions, and raise error if it finds inconsistency.
Args:
default_params: a Python dict or another ParamsDict object including the
default parameters to initialize.
restrictions: a list of strings, which define a list of restrictions to
ensure the consistency of different parameters internally. Each
restriction string is defined as a binary relation with a set of
operators, including {'==', '!=', '<', '<=', '>', '>='}.
"""
self._locked = False
self._restrictions = []
if restrictions:
self._restrictions = restrictions
if default_params is None:
default_params = {}
self.override(default_params, is_strict=False)
self.validate()
def _set(self, k, v):
if isinstance(v, dict):
self.__dict__[k] = ParamsDict(v)
else:
self.__dict__[k] = copy.deepcopy(v)
def __setattr__(self, k, v):
"""Sets the value of the existing key.
Note that this does not allow directly defining a new key. Use the
`override` method with `is_strict=False` instead.
Args:
k: the key string.
v: the value to be used to set the key `k`.
Raises:
KeyError: if k is not defined in the ParamsDict.
"""
if k not in ParamsDict.RESERVED_ATTR:
if k not in self.__dict__.keys():
raise KeyError('The key `%{}` does not exist. '
'To extend the existing keys, use '
'`override` with `is_strict` = True.'.format(k))
if self._locked:
raise ValueError('The ParamsDict has been locked. '
'No change is allowed.')
self._set(k, v)
def __getattr__(self, k):
"""Gets the value of the existing key.
Args:
k: the key string.
Returns:
the value of the key.
Raises:
AttributeError: if k is not defined in the ParamsDict.
"""
if k not in self.__dict__.keys():
raise AttributeError('The key `{}` does not exist. '.format(k))
return self.__dict__[k]
def __contains__(self, key):
"""Implements the membership test operator."""
return key in self.__dict__
def get(self, key, value=None):
"""Accesses through built-in dictionary get method."""
return self.__dict__.get(key, value)
def override(self, override_params, is_strict=True):
"""Override the ParamsDict with a set of given params.
Args:
override_params: a dict or a ParamsDict specifying the parameters to
be overridden.
is_strict: a boolean specifying whether override is strict or not. If
True, keys in `override_params` must be present in the ParamsDict.
If False, keys in `override_params` can be different from what is
currently defined in the ParamsDict. In this case, the ParamsDict will
be extended to include the new keys.
"""
if self._locked:
raise ValueError('The ParamsDict has been locked. No change is allowed.')
if isinstance(override_params, ParamsDict):
override_params = override_params.as_dict()
self._override(override_params, is_strict) # pylint: disable=protected-access
def _override(self, override_dict, is_strict=True):
"""The implementation of `override`."""
for k, v in six.iteritems(override_dict):
if k in ParamsDict.RESERVED_ATTR:
raise KeyError('The key `%{}` is internally reserved. '
'Can not be overridden.')
if k not in self.__dict__.keys():
if is_strict:
raise KeyError('The key `{}` does not exist. '
'To extend the existing keys, use '
'`override` with `is_strict` = False.'.format(k))
else:
self._set(k, v)
else:
if isinstance(v, dict):
self.__dict__[k]._override(v, is_strict) # pylint: disable=protected-access
elif isinstance(v, ParamsDict):
self.__dict__[k]._override(v.as_dict(), is_strict) # pylint: disable=protected-access
else:
self.__dict__[k] = copy.deepcopy(v)
def lock(self):
"""Makes the ParamsDict immutable."""
self._locked = True
def as_dict(self):
"""Returns a dict representation of ParamsDict.
For the nested ParamsDict, a nested dict will be returned.
"""
params_dict = {}
for k, v in six.iteritems(self.__dict__):
if k not in ParamsDict.RESERVED_ATTR:
if isinstance(v, ParamsDict):
params_dict[k] = v.as_dict()
else:
params_dict[k] = copy.deepcopy(v)
return params_dict
def validate(self):
"""Validate the parameters consistency based on the restrictions.
This method validates the internal consistency using the pre-defined list of
restrictions. A restriction is defined as a string which specfiies a binary
operation. The supported binary operations are {'==', '!=', '<', '<=', '>',
'>='}. Note that the meaning of these operators are consistent with the
underlying Python immplementation. Users should make sure the define
restrictions on their type make sense.
For example, for a ParamsDict like the following
```
a:
a1: 1
a2: 2
b:
bb:
bb1: 10
bb2: 20
ccc:
a1: 1
a3: 3
```
one can define two restrictions like this
['a.a1 == b.ccc.a1', 'a.a2 <= b.bb.bb2']
What it enforces are:
- a.a1 = 1 == b.ccc.a1 = 2
- a.a2 = 2 <= b.bb.bb2 = 20
Raises:
KeyError: if any of the following happens
(1) any of parameters in any of restrictions is not defined in
ParamsDict,
(2) any inconsistency violating the restriction is found.
ValueError: if the restriction defined in the string is not supported.
"""
def _get_kv(dotted_string, params_dict):
tokenized_params = dotted_string.split('.')
v = params_dict
for t in tokenized_params:
v = v[t]
return tokenized_params[-1], v
def _get_kvs(tokens, params_dict):
if len(tokens) != 2:
raise ValueError('Only support binary relation in restriction.')
stripped_tokens = [t.strip() for t in tokens]
left_k, left_v = _get_kv(stripped_tokens[0], params_dict)
right_k, right_v = _get_kv(stripped_tokens[1], params_dict)
return left_k, left_v, right_k, right_v
params_dict = self.as_dict()
for restriction in self._restrictions:
if '==' in restriction:
tokens = restriction.split('==')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v != right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
elif '!=' in restriction:
tokens = restriction.split('!=')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v == right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
elif '<' in restriction:
tokens = restriction.split('<')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v >= right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
elif '<=' in restriction:
tokens = restriction.split('<=')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v > right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
elif '>' in restriction:
tokens = restriction.split('>')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v <= right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
elif '>=' in restriction:
tokens = restriction.split('>=')
_, left_v, _, right_v = _get_kvs(tokens, params_dict)
if left_v < right_v:
raise KeyError('Found inconsistncy between key `{}` and key `{}`.'
.format(tokens[0], tokens[1]))
else:
raise ValueError('Unsupported relation in restriction.')
def read_yaml_to_params_dict(file_path):
"""Reads a YAML file to a ParamsDict."""
with tf.io.gfile.GFile(file_path, 'r') as f:
params_dict = yaml.load(f)
return ParamsDict(params_dict)
def save_params_dict_to_yaml(params, file_path):
"""Saves the input ParamsDict to a YAML file."""
with tf.io.gfile.GFile(file_path, 'w') as f:
def _my_list_rep(dumper, data):
# u'tag:yaml.org,2002:seq' is the YAML internal tag for sequence.
return dumper.represent_sequence(
u'tag:yaml.org,2002:seq', data, flow_style=True)
yaml.add_representer(list, _my_list_rep)
yaml.dump(params.as_dict(), f, default_flow_style=False)
def nested_csv_str_to_json_str(csv_str):
"""Converts a nested (using '.') comma-separated k=v string to a JSON string.
Converts a comma-separated string of key/value pairs that supports
nesting of keys to a JSON string. Nesting is implemented using
'.' between levels for a given key.
Spacing between commas and = is supported (e.g. there is no difference between
"a=1,b=2", "a = 1, b = 2", or "a=1, b=2") but there should be no spaces before
keys or after values (e.g. " a=1,b=2" and "a=1,b=2 " are not supported).
Note that this will only support values supported by CSV, meaning
values such as nested lists (e.g. "a=[[1,2,3],[4,5,6]]") are not
supported. Strings are supported as well, e.g. "a='hello'".
An example conversion would be:
"a=1, b=2, c.a=2, c.b=3, d.a.a=5"
to
"{ a: 1, b : 2, c: {a : 2, b : 3}, d: {a: {a : 5}}}"
Args:
csv_str: the comma separated string.
Returns:
the converted JSON string.
Raises:
ValueError: If csv_str is not in a comma separated string or
if the string is formatted incorrectly.
"""
if not csv_str:
return ''
formatted_entries = []
nested_map = collections.defaultdict(list)
pos = 0
while pos < len(csv_str):
m = _PARAM_RE.match(csv_str, pos)
if not m:
raise ValueError('Malformed hyperparameter value while parsing '
'CSV string: %s' % csv_str[pos:])
pos = m.end()
# Parse the values.
m_dict = m.groupdict()
name = m_dict['name']
v = m_dict['val']
# If a GCS path (e.g. gs://...) is provided, wrap this in quotes
# as yaml.load would otherwise throw an exception
if re.match(r'(?=[^\"\'])(?=[gs://])', v):
v = '\'{}\''.format(v)
name_nested = name.split('.')
if len(name_nested) > 1:
grouping = name_nested[0]
value = '.'.join(name_nested[1:]) + '=' + v
nested_map[grouping].append(value)
else:
formatted_entries.append('%s : %s' % (name, v))
for grouping, value in nested_map.items():
value = ','.join(value)
value = nested_csv_str_to_json_str(value)
formatted_entries.append('%s : %s' % (grouping, value))
return '{' + ', '.join(formatted_entries) + '}'
def override_params_dict(params, dict_or_string_or_yaml_file, is_strict):
"""Override a given ParamsDict using a dict, JSON/YAML/CSV string or YAML file.
The logic of the function is outlined below:
1. Test that the input is a dict. If not, proceed to 2.
2. Tests that the input is a string. If not, raise unknown ValueError
2.1. Test if the string is in a CSV format. If so, parse.
If not, proceed to 2.2.
2.2. Try loading the string as a YAML/JSON. If successful, parse to
dict and use it to override. If not, proceed to 2.3.
2.3. Try using the string as a file path and load the YAML file.
Args:
params: a ParamsDict object to be overridden.
dict_or_string_or_yaml_file: a Python dict, JSON/YAML/CSV string or
path to a YAML file specifying the parameters to be overridden.
is_strict: a boolean specifying whether override is strict or not.
Returns:
params: the overridden ParamsDict object.
Raises:
ValueError: if failed to override the parameters.
"""
if not dict_or_string_or_yaml_file:
return params
if isinstance(dict_or_string_or_yaml_file, dict):
params.override(dict_or_string_or_yaml_file, is_strict)
elif isinstance(dict_or_string_or_yaml_file, six.string_types):
try:
dict_or_string_or_yaml_file = (
nested_csv_str_to_json_str(dict_or_string_or_yaml_file))
except ValueError:
pass
params_dict = yaml.load(dict_or_string_or_yaml_file)
if isinstance(params_dict, dict):
params.override(params_dict, is_strict)
else:
with tf.io.gfile.GFile(dict_or_string_or_yaml_file) as f:
params.override(yaml.load(f), is_strict)
else:
raise ValueError('Unknown input type to parse.')
return params
@@ -0,0 +1,322 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for official.modeling.hyperparams.params_dict.py."""
import os
import tensorflow as tf
import yaml
from official.modeling.hyperparams import params_dict
class ParamsDictTest(tf.test.TestCase):
def test_init_from_an_empty_dict(self):
params = params_dict.ParamsDict()
with self.assertRaises(AttributeError):
_ = params.a
with self.assertRaises(KeyError):
params.a = 'aa'
def test_init_from_a_dict(self):
params = params_dict.ParamsDict({'a': 'aa', 'b': 2})
self.assertEqual(params.a, 'aa')
self.assertEqual(params.b, 2)
def test_init_from_a_param_dict(self):
params_init = params_dict.ParamsDict({'a': 'aa', 'b': 2})
params = params_dict.ParamsDict(params_init)
self.assertEqual(params.a, 'aa')
self.assertEqual(params.b, 2)
def test_lock(self):
params = params_dict.ParamsDict({'a': 1, 'b': 2})
params.lock()
with self.assertRaises(ValueError):
params.a = 10
with self.assertRaises(ValueError):
params.override({'b': 20})
def test_setattr(self):
params = params_dict.ParamsDict()
params.override(
{'a': 'aa', 'b': 2, 'c': None}, is_strict=False)
params.c = 'ccc'
self.assertEqual(params.a, 'aa')
self.assertEqual(params.b, 2)
self.assertEqual(params.c, 'ccc')
def test_getattr(self):
params = params_dict.ParamsDict()
params.override(
{'a': 'aa', 'b': 2, 'c': None}, is_strict=False)
self.assertEqual(params.a, 'aa')
self.assertEqual(params.b, 2)
self.assertEqual(params.c, None)
def test_contains(self):
params = params_dict.ParamsDict()
params.override(
{'a': 'aa'}, is_strict=False)
self.assertIn('a', params)
self.assertNotIn('b', params)
def test_get(self):
params = params_dict.ParamsDict()
params.override(
{'a': 'aa'}, is_strict=False)
self.assertEqual(params.get('a'), 'aa')
self.assertEqual(params.get('b', 2), 2)
self.assertEqual(params.get('b'), None)
def test_override_is_strict_true(self):
params = params_dict.ParamsDict(
{'a': 'aa', 'b': 2, 'c': {'c1': 'cc', 'c2': 20}})
params.override({'a': 2, 'c': {'c1': 'ccc'}}, is_strict=True)
self.assertEqual(params.a, 2)
self.assertEqual(params.c.c1, 'ccc')
with self.assertRaises(KeyError):
params.override({'d': 'ddd'}, is_strict=True)
with self.assertRaises(KeyError):
params.override({'c': {'c3': 30}}, is_strict=True)
def test_override_is_strict_false(self):
params = params_dict.ParamsDict(
{'a': 'aa', 'b': 2, 'c': {'c1': 10, 'c2': 20}})
params.override({'a': 2, 'c': {'c3': 3000}}, is_strict=False)
self.assertEqual(params.a, 2)
self.assertEqual(params.c.c3, 3000)
params.override({'d': 'ddd'}, is_strict=False)
self.assertEqual(params.d, 'ddd')
params.override({'c': {'c4': 4444}}, is_strict=False)
self.assertEqual(params.c.c4, 4444)
def test_as_dict(self):
params = params_dict.ParamsDict(
{'a': 'aa', 'b': 2, 'c': {'c1': 10, 'c2': 20}})
params_d = params.as_dict()
self.assertEqual(params_d['a'], 'aa')
self.assertEqual(params_d['b'], 2)
self.assertEqual(params_d['c']['c1'], 10)
self.assertEqual(params_d['c']['c2'], 20)
def test_validate(self):
# Raise error due to the unknown parameter.
with self.assertRaises(KeyError):
params = params_dict.ParamsDict(
{'a': 1, 'b': {'a': 11}}, ['a == c'])
# OK to check equality of two nested dicts.
params = params_dict.ParamsDict(
{'a': 1, 'b': {'a': 10}, 'c': {'a': 10}}, ['b == c'])
# Raise error due to inconsistency
with self.assertRaises(KeyError):
params = params_dict.ParamsDict(
{'a': 1, 'c': {'a': 10}}, ['a == c.a'])
# Valid rule.
params = params_dict.ParamsDict(
{'a': 1, 'c': {'a': 1}}, ['a == c.a'])
# Overridding violates the existing rule, raise error upon validate.
params.override({'a': 11})
with self.assertRaises(KeyError):
params.validate()
class ParamsDictIOTest(tf.test.TestCase):
def write_temp_file(self, filename, text):
temp_file = os.path.join(self.get_temp_dir(), filename)
with tf.io.gfile.GFile(temp_file, 'w') as writer:
writer.write(text)
return temp_file
def test_save_params_dict_to_yaml(self):
params = params_dict.ParamsDict(
{'a': 'aa', 'b': 2, 'c': {'c1': 10, 'c2': 20}})
output_yaml_file = os.path.join(self.get_temp_dir(), 'params.yaml')
params_dict.save_params_dict_to_yaml(params, output_yaml_file)
with tf.io.gfile.GFile(output_yaml_file, 'r') as f:
params_d = yaml.load(f)
self.assertEqual(params.a, params_d['a'])
self.assertEqual(params.b, params_d['b'])
self.assertEqual(params.c.c1, params_d['c']['c1'])
self.assertEqual(params.c.c2, params_d['c']['c2'])
def test_read_yaml_to_params_dict(self):
input_yaml_file = self.write_temp_file(
'params.yaml', r"""
a: 'aa'
b: 2
c:
c1: 10
c2: 20
""")
params = params_dict.read_yaml_to_params_dict(input_yaml_file)
self.assertEqual(params.a, 'aa')
self.assertEqual(params.b, 2)
self.assertEqual(params.c.c1, 10)
self.assertEqual(params.c.c2, 20)
def test_override_params_dict_using_dict(self):
params = params_dict.ParamsDict({
'a': 1, 'b': 2.5, 'c': [3, 4], 'd': 'hello', 'e': False})
override_dict = {'b': 5.2, 'c': [30, 40]}
params = params_dict.override_params_dict(
params, override_dict, is_strict=True)
self.assertEqual(1, params.a)
self.assertEqual(5.2, params.b)
self.assertEqual([30, 40], params.c)
self.assertEqual('hello', params.d)
self.assertEqual(False, params.e)
def test_override_params_dict_using_yaml_string(self):
params = params_dict.ParamsDict({
'a': 1, 'b': 2.5, 'c': [3, 4], 'd': 'hello', 'e': False})
override_yaml_string = "'b': 5.2\n'c': [30, 40]"
params = params_dict.override_params_dict(
params, override_yaml_string, is_strict=True)
self.assertEqual(1, params.a)
self.assertEqual(5.2, params.b)
self.assertEqual([30, 40], params.c)
self.assertEqual('hello', params.d)
self.assertEqual(False, params.e)
def test_override_params_dict_using_json_string(self):
params = params_dict.ParamsDict({
'a': 1, 'b': {'b1': 2, 'b2': [2, 3],},
'd': {'d1': {'d2': 'hello'}}, 'e': False})
override_json_string = "{ b: { b2: [3, 4] }, d: { d1: { d2: 'hi' } } }"
params = params_dict.override_params_dict(
params, override_json_string, is_strict=True)
self.assertEqual(1, params.a)
self.assertEqual(2, params.b.b1)
self.assertEqual([3, 4], params.b.b2)
self.assertEqual('hi', params.d.d1.d2)
self.assertEqual(False, params.e)
def test_override_params_dict_using_csv_string(self):
params = params_dict.ParamsDict({
'a': 1, 'b': {'b1': 2, 'b2': [2, 3],},
'd': {'d1': {'d2': 'hello'}}, 'e': False})
override_csv_string = "b.b2=[3,4], d.d1.d2='hi, world', e=gs://test"
params = params_dict.override_params_dict(
params, override_csv_string, is_strict=True)
self.assertEqual(1, params.a)
self.assertEqual(2, params.b.b1)
self.assertEqual([3, 4], params.b.b2)
self.assertEqual('hi, world', params.d.d1.d2)
self.assertEqual('gs://test', params.e)
def test_override_params_dict_using_yaml_file(self):
params = params_dict.ParamsDict({
'a': 1, 'b': 2.5, 'c': [3, 4], 'd': 'hello', 'e': False})
override_yaml_file = self.write_temp_file(
'params.yaml', r"""
b: 5.2
c: [30, 40]
""")
params = params_dict.override_params_dict(
params, override_yaml_file, is_strict=True)
self.assertEqual(1, params.a)
self.assertEqual(5.2, params.b)
self.assertEqual([30, 40], params.c)
self.assertEqual('hello', params.d)
self.assertEqual(False, params.e)
class IOTest(tf.test.TestCase):
def test_basic_csv_str_to_json_str(self):
csv_str = 'a=1,b=2,c=3'
json_str = '{a : 1, b : 2, c : 3}'
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
self.assertEqual(converted_csv_str, json_str)
def test_basic_csv_str_load(self):
csv_str = 'a=1,b=2,c=3'
expected_output = {'a': 1, 'b': 2, 'c': 3}
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
converted_dict = yaml.load(converted_csv_str)
self.assertDictEqual(converted_dict, expected_output)
def test_basic_nested_csv_str_to_json_str(self):
csv_str = 'a=1,b.b1=2'
json_str = '{a : 1, b : {b1 : 2}}'
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
self.assertEqual(converted_csv_str, json_str)
def test_basic_nested_csv_str_load(self):
csv_str = 'a=1,b.b1=2,c.c1=3'
expected_output = {'a': 1, 'b': {'b1': 2}, 'c': {'c1': 3}}
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
converted_dict = yaml.load(converted_csv_str)
self.assertDictEqual(converted_dict, expected_output)
def test_complex_nested_csv_str_to_json_str(self):
csv_str = 'a.aa.aaa.aaaaa.a=1'
json_str = '{a : {aa : {aaa : {aaaaa : {a : 1}}}}}'
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
self.assertEqual(converted_csv_str, json_str)
def test_complex_nested_csv_str_load(self):
csv_str = 'a.aa.aaa.aaaaa.a=1,a.a=2'
expected_output = {'a': {'aa': {'aaa': {'aaaaa': {'a': 1}}}, 'a': 2}}
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
converted_dict = yaml.load(converted_csv_str)
self.assertDictEqual(converted_dict, expected_output)
def test_csv_str_load_supported_datatypes(self):
csv_str = 'a=1,b=2.,c=[1,2,3],d=\'hello, there\',e=\"Hi.\"'
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
converted_dict = yaml.load(converted_csv_str)
self.assertEqual(converted_dict['a'], 1)
self.assertEqual(converted_dict['b'], 2.)
self.assertEqual(converted_dict['c'], [1, 2, 3])
self.assertEqual(converted_dict['d'], 'hello, there')
self.assertEqual(converted_dict['e'], 'Hi.')
def test_csv_str_load_unsupported_datatypes(self):
csv_str = 'a=[[1,2,3],[4,5,6]]'
self.assertRaises(ValueError,
params_dict.nested_csv_str_to_json_str,
csv_str)
def test_csv_str_to_json_str_spacing(self):
csv_str1 = 'a=1,b=2,c=3'
csv_str2 = 'a = 1, b = 2, c = 3'
json_str = '{a : 1, b : 2, c : 3}'
converted_csv_str1 = params_dict.nested_csv_str_to_json_str(csv_str1)
converted_csv_str2 = params_dict.nested_csv_str_to_json_str(csv_str2)
self.assertEqual(converted_csv_str1, converted_csv_str2)
self.assertEqual(converted_csv_str1, json_str)
self.assertEqual(converted_csv_str2, json_str)
def test_gcs_added_quotes(self):
csv_str = 'a=gs://abc, b=gs://def'
expected_output = '{a : \'gs://abc\', b : \'gs://def\'}'
converted_csv_str = params_dict.nested_csv_str_to_json_str(csv_str)
self.assertEqual(converted_csv_str, expected_output)
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,491 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A light weight utilities to train NLP models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import json
import os
import tempfile
from absl import logging
import tensorflow as tf
from official.staging.training import grad_utils
from official.utils.misc import distribution_utils
_SUMMARY_TXT = 'training_summary.txt'
_MIN_SUMMARY_STEPS = 10
def _should_export_checkpoint(strategy):
return (not strategy) or strategy.extended.should_checkpoint
def _should_export_summary(strategy):
return (not strategy) or strategy.extended.should_save_summary
def _save_checkpoint(strategy, checkpoint, model_dir, checkpoint_prefix):
"""Saves model to with provided checkpoint prefix."""
if _should_export_checkpoint(strategy):
checkpoint_path = os.path.join(model_dir, checkpoint_prefix)
saved_path = checkpoint.save(checkpoint_path)
logging.info('Saving model as TF checkpoint: %s', saved_path)
else:
# In multi worker training we need every worker to save checkpoint, because
# variables can trigger synchronization on read and synchronization needs
# all workers to participate. To avoid workers overriding each other we save
# to a temporary directory on non-chief workers.
tmp_dir = tempfile.mkdtemp()
checkpoint.save(os.path.join(tmp_dir, 'ckpt'))
tf.io.gfile.rmtree(tmp_dir)
return
def _get_input_iterator(input_fn, strategy):
"""Returns distributed dataset iterator."""
# When training with TPU pods, datasets needs to be cloned across
# workers. Since Dataset instance cannot be cloned in eager mode, we instead
# pass callable that returns a dataset.
if not callable(input_fn):
raise ValueError('`input_fn` should be a closure that returns a dataset.')
iterator = iter(
strategy.experimental_distribute_datasets_from_function(input_fn))
return iterator
def _float_metric_value(metric):
"""Gets the value of a float-value keras metric."""
return metric.result().numpy().astype(float)
def steps_to_run(current_step, steps_per_epoch, steps_per_loop):
"""Calculates steps to run on device."""
if steps_per_loop <= 0:
raise ValueError('steps_per_loop should be positive integer.')
if steps_per_loop == 1:
return steps_per_loop
remainder_in_epoch = current_step % steps_per_epoch
if remainder_in_epoch != 0:
return min(steps_per_epoch - remainder_in_epoch, steps_per_loop)
else:
return steps_per_loop
def write_txt_summary(training_summary, summary_dir):
"""Writes a summary text file to record stats."""
summary_path = os.path.join(summary_dir, _SUMMARY_TXT)
with tf.io.gfile.GFile(summary_path, 'wb') as f:
logging.info('Training Summary: \n%s', str(training_summary))
f.write(json.dumps(training_summary, indent=4))
def run_customized_training_loop(
# pylint: disable=invalid-name
_sentinel=None,
# pylint: enable=invalid-name
strategy=None,
model_fn=None,
loss_fn=None,
scale_loss=True,
model_dir=None,
train_input_fn=None,
steps_per_epoch=None,
steps_per_loop=1,
epochs=1,
eval_input_fn=None,
eval_steps=None,
metric_fn=None,
init_checkpoint=None,
custom_callbacks=None,
run_eagerly=False,
sub_model_export_name=None,
explicit_allreduce=False,
pre_allreduce_callbacks=None,
post_allreduce_callbacks=None):
"""Run BERT pretrain model training using low-level API.
Arguments:
_sentinel: Used to prevent positional parameters. Internal, do not use.
strategy: Distribution strategy on which to run low level training loop.
model_fn: Function that returns a tuple (model, sub_model). Caller of this
function should add optimizer to the `model` via calling
`model.compile()` API or manually setting `model.optimizer` attribute.
Second element of the returned tuple(sub_model) is an optional sub model
to be used for initial checkpoint -- if provided.
loss_fn: Function with signature func(labels, logits) and returns a loss
tensor.
scale_loss: Whether to divide the raw loss by number of replicas before
gradients calculation.
model_dir: Model directory used during training for restoring/saving model
weights.
train_input_fn: Function that returns a tf.data.Dataset used for training.
steps_per_epoch: Number of steps to run per epoch. At the end of each
epoch, model checkpoint will be saved and evaluation will be conducted
if evaluation dataset is provided.
steps_per_loop: Number of steps per graph-mode loop. In order to reduce
communication in eager context, training logs are printed every
steps_per_loop.
epochs: Number of epochs to train.
eval_input_fn: Function that returns evaluation dataset. If none,
evaluation is skipped.
eval_steps: Number of steps to run evaluation. Required if `eval_input_fn`
is not none.
metric_fn: A metrics function that returns a Keras Metric object to record
evaluation result using evaluation dataset or with training dataset
after every epoch.
init_checkpoint: Optional checkpoint to load to `sub_model` returned by
`model_fn`.
custom_callbacks: A list of Keras Callbacks objects to run during
training. More specifically, `on_batch_begin()`, `on_batch_end()`,
methods are invoked during training.
run_eagerly: Whether to run model training in pure eager execution. This
should be disable for TPUStrategy.
sub_model_export_name: If not None, will export `sub_model` returned by
`model_fn` into checkpoint files. The name of intermediate checkpoint
file is {sub_model_export_name}_step_{step}.ckpt and the last
checkpint's name is {sub_model_export_name}.ckpt;
if None, `sub_model` will not be exported as checkpoint.
explicit_allreduce: Whether to explicitly perform gradient allreduce,
instead of relying on implicit allreduce in optimizer.apply_gradients().
default is False. For now, if training using FP16 mixed precision,
explicit allreduce will aggregate gradients in FP16 format. For TPU and
GPU training using FP32, explicit allreduce will aggregate gradients in
FP32 format.
pre_allreduce_callbacks: A list of callback functions that takes gradients
and model variables pairs as input, manipulate them, and returns a new
gradients and model variables paris. The callback functions will be
invoked in the list order and before gradients are allreduced.
With mixed precision training, the pre_allreduce_allbacks will be
applied on scaled_gradients. Default is no callbacks.
Only used when explicit_allreduce=True.
post_allreduce_callbacks: A list of callback functions that takes
gradients and model variables pairs as input, manipulate them, and
returns a new gradients and model variables paris. The callback
functions will be invoked in the list order and right before gradients
are applied to variables for updates. Default is no callbacks. Only used
when explicit_allreduce=True.
Returns:
Trained model.
Raises:
ValueError: (1) When model returned by `model_fn` does not have optimizer
attribute or when required parameters are set to none. (2) eval args are
not specified correctly. (3) metric_fn must be a callable if specified.
(4) sub_model_checkpoint_name is specified, but `sub_model` returned
by `model_fn` is None.
"""
if _sentinel is not None:
raise ValueError('only call `run_customized_training_loop()` '
'with named arguments.')
required_arguments = [
strategy, model_fn, loss_fn, model_dir, steps_per_epoch, train_input_fn
]
if [arg for arg in required_arguments if arg is None]:
raise ValueError('`strategy`, `model_fn`, `loss_fn`, `model_dir`, '
'`steps_per_loop` and `steps_per_epoch` are required '
'parameters.')
if steps_per_loop > steps_per_epoch:
logging.error(
'steps_per_loop: %d is specified to be greater than '
' steps_per_epoch: %d, we will use steps_per_epoch as'
' steps_per_loop.', steps_per_loop, steps_per_epoch)
steps_per_loop = steps_per_epoch
assert tf.executing_eagerly()
if run_eagerly:
if isinstance(strategy, tf.distribute.experimental.TPUStrategy):
raise ValueError(
'TPUStrategy should not run eagerly as it heavily relies on graph'
' optimization for the distributed system.')
if eval_input_fn and (eval_steps is None or metric_fn is None):
raise ValueError(
'`eval_step` and `metric_fn` are required when `eval_input_fn ` '
'is not none.')
if metric_fn and not callable(metric_fn):
raise ValueError(
'if `metric_fn` is specified, metric_fn must be a callable.')
total_training_steps = steps_per_epoch * epochs
train_iterator = _get_input_iterator(train_input_fn, strategy)
with distribution_utils.get_strategy_scope(strategy):
# To correctly place the model weights on accelerators,
# model and optimizer should be created in scope.
model, sub_model = model_fn()
if not hasattr(model, 'optimizer'):
raise ValueError('User should set optimizer attribute to model '
'inside `model_fn`.')
if sub_model_export_name and sub_model is None:
raise ValueError('sub_model_export_name is specified as %s, but '
'sub_model is None.' % sub_model_export_name)
optimizer = model.optimizer
if init_checkpoint:
logging.info(
'Checkpoint file %s found and restoring from '
'initial checkpoint for core model.', init_checkpoint)
checkpoint = tf.train.Checkpoint(model=sub_model)
checkpoint.restore(init_checkpoint).assert_existing_objects_matched()
logging.info('Loading from checkpoint file completed')
train_loss_metric = tf.keras.metrics.Mean(
'training_loss', dtype=tf.float32)
eval_metrics = [metric_fn()] if metric_fn else []
# If evaluation is required, make a copy of metric as it will be used by
# both train and evaluation.
train_metrics = [
metric.__class__.from_config(metric.get_config())
for metric in eval_metrics
]
# Create summary writers
if _should_export_summary(strategy):
summary_dir = os.path.join(model_dir, 'summaries')
else:
# In multi worker training we need every worker to write summary, because
# variables can trigger synchronization on read and synchronization needs
# all workers to participate.
summary_dir = tempfile.mkdtemp()
eval_summary_writer = tf.summary.create_file_writer(
os.path.join(summary_dir, 'eval'))
if steps_per_loop >= _MIN_SUMMARY_STEPS:
# Only writes summary when the stats are collected sufficiently over
# enough steps.
train_summary_writer = tf.summary.create_file_writer(
os.path.join(summary_dir, 'train'))
else:
train_summary_writer = None
# Collects training variables.
training_vars = model.trainable_variables
def _replicated_step(inputs):
"""Replicated training step."""
inputs, labels = inputs
with tf.GradientTape() as tape:
model_outputs = model(inputs, training=True)
loss = loss_fn(labels, model_outputs)
# Raw loss is used for reporting in metrics/logs.
raw_loss = loss
if scale_loss:
# Scales down the loss for gradients to be invariant from replicas.
loss = loss / strategy.num_replicas_in_sync
if explicit_allreduce:
grad_utils.minimize_using_explicit_allreduce(tape, optimizer, loss,
training_vars,
pre_allreduce_callbacks,
post_allreduce_callbacks)
else:
if isinstance(optimizer,
tf.keras.mixed_precision.experimental.LossScaleOptimizer):
with tape:
scaled_loss = optimizer.get_scaled_loss(loss)
scaled_grads = tape.gradient(scaled_loss, training_vars)
grads = optimizer.get_unscaled_gradients(scaled_grads)
else:
grads = tape.gradient(loss, training_vars)
optimizer.apply_gradients(zip(grads, training_vars))
# For reporting, the metric takes the mean of losses.
train_loss_metric.update_state(raw_loss)
for metric in train_metrics:
metric.update_state(labels, model_outputs)
@tf.function
def train_steps(iterator, steps):
"""Performs distributed training steps in a loop.
Args:
iterator: the distributed iterator of training datasets.
steps: an tf.int32 integer tensor to specify number of steps to run
inside host training loop.
Raises:
ValueError: Any of the arguments or tensor shapes are invalid.
"""
if not isinstance(steps, tf.Tensor):
raise ValueError('steps should be an Tensor. Python object may cause '
'retracing.')
for _ in tf.range(steps):
strategy.run(_replicated_step, args=(next(iterator),))
def train_single_step(iterator):
"""Performs a distributed training step.
Args:
iterator: the distributed iterator of training datasets.
Raises:
ValueError: Any of the arguments or tensor shapes are invalid.
"""
strategy.run(_replicated_step, args=(next(iterator),))
def test_step(iterator):
"""Calculates evaluation metrics on distributed devices."""
def _test_step_fn(inputs):
"""Replicated accuracy calculation."""
inputs, labels = inputs
model_outputs = model(inputs, training=False)
for metric in eval_metrics:
metric.update_state(labels, model_outputs)
strategy.run(_test_step_fn, args=(next(iterator),))
if not run_eagerly:
train_single_step = tf.function(train_single_step)
test_step = tf.function(test_step)
def _run_evaluation(current_training_step, test_iterator):
"""Runs validation steps and aggregate metrics."""
for _ in range(eval_steps):
test_step(test_iterator)
with eval_summary_writer.as_default():
for metric in eval_metrics + model.metrics:
metric_value = _float_metric_value(metric)
logging.info('Step: [%d] Validation %s = %f', current_training_step,
metric.name, metric_value)
tf.summary.scalar(
metric.name, metric_value, step=current_training_step)
eval_summary_writer.flush()
def _run_callbacks_on_batch_begin(batch):
"""Runs custom callbacks at the start of every step."""
if not custom_callbacks:
return
for callback in custom_callbacks:
callback.on_batch_begin(batch)
def _run_callbacks_on_batch_end(batch, logs):
"""Runs custom callbacks at the end of every step."""
if not custom_callbacks:
return
for callback in custom_callbacks:
callback.on_batch_end(batch, logs)
# Training loop starts here.
checkpoint = tf.train.Checkpoint(model=model, optimizer=optimizer)
sub_model_checkpoint = tf.train.Checkpoint(
model=sub_model) if sub_model_export_name else None
latest_checkpoint_file = tf.train.latest_checkpoint(model_dir)
if latest_checkpoint_file:
logging.info(
'Checkpoint file %s found and restoring from '
'checkpoint', latest_checkpoint_file)
checkpoint.restore(latest_checkpoint_file)
logging.info('Loading from checkpoint file completed')
current_step = optimizer.iterations.numpy()
checkpoint_name = 'ctl_step_{step}.ckpt'
while current_step < total_training_steps:
# Training loss/metric are taking average over steps inside micro
# training loop. We reset the their values before each round.
train_loss_metric.reset_states()
for metric in train_metrics + model.metrics:
metric.reset_states()
_run_callbacks_on_batch_begin(current_step)
# Runs several steps in the host while loop.
steps = steps_to_run(current_step, steps_per_epoch, steps_per_loop)
if tf.test.is_built_with_cuda():
# TODO(zongweiz): merge with train_steps once tf.while_loop
# GPU performance bugs are fixed.
for _ in range(steps):
train_single_step(train_iterator)
else:
# Converts steps to a Tensor to avoid tf.function retracing.
train_steps(train_iterator,
tf.convert_to_tensor(steps, dtype=tf.int32))
train_loss = _float_metric_value(train_loss_metric)
current_step += steps
_run_callbacks_on_batch_end(current_step - 1, {'loss': train_loss})
# Updates training logging.
training_status = 'Train Step: %d/%d / loss = %s' % (
current_step, total_training_steps, train_loss)
if train_summary_writer:
with train_summary_writer.as_default():
tf.summary.scalar(
train_loss_metric.name, train_loss, step=current_step)
for metric in train_metrics + model.metrics:
metric_value = _float_metric_value(metric)
training_status += ' %s = %f' % (metric.name, metric_value)
tf.summary.scalar(metric.name, metric_value, step=current_step)
train_summary_writer.flush()
logging.info(training_status)
# Saves model checkpoints and run validation steps at every epoch end.
if current_step % steps_per_epoch == 0:
# To avoid repeated model saving, we do not save after the last
# step of training.
if current_step < total_training_steps:
_save_checkpoint(strategy, checkpoint, model_dir,
checkpoint_name.format(step=current_step))
if sub_model_export_name:
_save_checkpoint(
strategy, sub_model_checkpoint, model_dir,
'%s_step_%d.ckpt' % (sub_model_export_name, current_step))
if eval_input_fn:
logging.info('Running evaluation after step: %s.', current_step)
_run_evaluation(current_step,
_get_input_iterator(eval_input_fn, strategy))
# Re-initialize evaluation metric.
for metric in eval_metrics + model.metrics:
metric.reset_states()
_save_checkpoint(strategy, checkpoint, model_dir,
checkpoint_name.format(step=current_step))
if sub_model_export_name:
_save_checkpoint(strategy, sub_model_checkpoint, model_dir,
'%s.ckpt' % sub_model_export_name)
if eval_input_fn:
logging.info('Running final evaluation after training is complete.')
_run_evaluation(current_step,
_get_input_iterator(eval_input_fn, strategy))
training_summary = {
'total_training_steps': total_training_steps,
'train_loss': _float_metric_value(train_loss_metric),
}
if eval_metrics:
# TODO(hongkuny): Cleans up summary reporting in text.
training_summary['last_train_metrics'] = _float_metric_value(
train_metrics[0])
training_summary['eval_metrics'] = _float_metric_value(eval_metrics[0])
write_txt_summary(training_summary, summary_dir)
if not _should_export_summary(strategy):
tf.io.gfile.rmtree(summary_dir)
return model
@@ -0,0 +1,235 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for official.modeling.training.model_training_utils."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl.testing import parameterized
from absl.testing.absltest import mock
import numpy as np
import tensorflow as tf
from tensorflow.python.distribute import combinations
from tensorflow.python.distribute import strategy_combinations
from official.modeling import model_training_utils
def eager_strategy_combinations():
return combinations.combine(
distribution=[
strategy_combinations.default_strategy,
strategy_combinations.tpu_strategy,
strategy_combinations.one_device_strategy_gpu,
strategy_combinations.mirrored_strategy_with_gpu_and_cpu,
strategy_combinations.mirrored_strategy_with_two_gpus,
],
mode='eager',
)
def eager_gpu_strategy_combinations():
return combinations.combine(
distribution=[
strategy_combinations.default_strategy,
strategy_combinations.one_device_strategy_gpu,
strategy_combinations.mirrored_strategy_with_gpu_and_cpu,
strategy_combinations.mirrored_strategy_with_two_gpus,
],
mode='eager',
)
def create_fake_data_input_fn(batch_size, features_shape, num_classes):
"""Creates a dummy input function with the given feature and label shapes.
Args:
batch_size: integer.
features_shape: list[int]. Feature shape for an individual example.
num_classes: integer. Number of labels.
Returns:
An input function that is usable in the executor.
"""
def _dataset_fn(input_context=None):
"""An input function for generating fake data."""
local_batch_size = input_context.get_per_replica_batch_size(batch_size)
features = np.random.rand(64, *features_shape)
labels = np.random.randint(2, size=[64, num_classes])
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shard(input_context.num_input_pipelines,
input_context.input_pipeline_id)
def _assign_dtype(features, labels):
features = tf.cast(features, tf.float32)
labels = tf.cast(labels, tf.float32)
return features, labels
# Shuffle, repeat, and batch the examples.
dataset = dataset.map(_assign_dtype)
dataset = dataset.shuffle(64).repeat()
dataset = dataset.batch(local_batch_size, drop_remainder=True)
dataset = dataset.prefetch(buffer_size=64)
return dataset
return _dataset_fn
def create_model_fn(input_shape, num_classes, use_float16=False):
def _model_fn():
"""A one-layer softmax model suitable for testing."""
input_layer = tf.keras.layers.Input(shape=input_shape)
x = tf.keras.layers.Dense(num_classes, activation='relu')(input_layer)
output_layer = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
sub_model = tf.keras.models.Model(input_layer, x, name='sub_model')
model = tf.keras.models.Model(input_layer, output_layer, name='model')
model.add_metric(
tf.reduce_mean(input_layer), name='mean_input', aggregation='mean')
model.optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
if use_float16:
model.optimizer = (
tf.keras.mixed_precision.experimental.LossScaleOptimizer(
model.optimizer, loss_scale='dynamic'))
return model, sub_model
return _model_fn
def metric_fn():
"""Gets a tf.keras metric object."""
return tf.keras.metrics.CategoricalAccuracy(name='accuracy', dtype=tf.float32)
def summaries_with_matching_keyword(keyword, summary_dir):
"""Yields summary protos matching given keyword from event file."""
event_paths = tf.io.gfile.glob(os.path.join(summary_dir, 'events*'))
for event in tf.compat.v1.train.summary_iterator(event_paths[-1]):
if event.summary is not None:
for value in event.summary.value:
if keyword in value.tag:
tf.compat.v1.logging.error(event)
yield event.summary
def check_eventfile_for_keyword(keyword, summary_dir):
"""Checks event files for the keyword."""
return any(summaries_with_matching_keyword(keyword, summary_dir))
class ModelTrainingUtilsTest(tf.test.TestCase, parameterized.TestCase):
def setUp(self):
super(ModelTrainingUtilsTest, self).setUp()
self._model_fn = create_model_fn(input_shape=[128], num_classes=3)
def run_training(self, strategy, model_dir, steps_per_loop, run_eagerly):
input_fn = create_fake_data_input_fn(
batch_size=8, features_shape=[128], num_classes=3)
model_training_utils.run_customized_training_loop(
strategy=strategy,
model_fn=self._model_fn,
loss_fn=tf.keras.losses.categorical_crossentropy,
model_dir=model_dir,
steps_per_epoch=20,
steps_per_loop=steps_per_loop,
epochs=2,
train_input_fn=input_fn,
eval_input_fn=input_fn,
eval_steps=10,
init_checkpoint=None,
metric_fn=metric_fn,
custom_callbacks=None,
run_eagerly=run_eagerly)
@combinations.generate(eager_strategy_combinations())
def test_train_eager_single_step(self, distribution):
model_dir = self.get_temp_dir()
if isinstance(distribution, tf.distribute.experimental.TPUStrategy):
with self.assertRaises(ValueError):
self.run_training(
distribution, model_dir, steps_per_loop=1, run_eagerly=True)
else:
self.run_training(
distribution, model_dir, steps_per_loop=1, run_eagerly=True)
@combinations.generate(eager_gpu_strategy_combinations())
def test_train_eager_mixed_precision(self, distribution):
model_dir = self.get_temp_dir()
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
self._model_fn = create_model_fn(
input_shape=[128], num_classes=3, use_float16=True)
self.run_training(
distribution, model_dir, steps_per_loop=1, run_eagerly=True)
@combinations.generate(eager_strategy_combinations())
def test_train_check_artifacts(self, distribution):
model_dir = self.get_temp_dir()
self.run_training(
distribution, model_dir, steps_per_loop=10, run_eagerly=False)
# Two checkpoints should be saved after two epochs.
self.assertNotEmpty(tf.io.gfile.glob(os.path.join(model_dir, 'ctl_step_*')))
self.assertNotEmpty(
tf.io.gfile.glob(
os.path.join(model_dir, 'summaries/training_summary*')))
# Loss and accuracy values should be written into summaries.
self.assertTrue(
check_eventfile_for_keyword('loss',
os.path.join(model_dir, 'summaries/train')))
self.assertTrue(
check_eventfile_for_keyword('accuracy',
os.path.join(model_dir, 'summaries/train')))
self.assertTrue(
check_eventfile_for_keyword('mean_input',
os.path.join(model_dir, 'summaries/train')))
self.assertTrue(
check_eventfile_for_keyword('accuracy',
os.path.join(model_dir, 'summaries/eval')))
self.assertTrue(
check_eventfile_for_keyword('mean_input',
os.path.join(model_dir, 'summaries/eval')))
@combinations.generate(
combinations.combine(
distribution=[
strategy_combinations.one_device_strategy_gpu,
],
mode='eager',
))
def test_train_check_artifacts_non_chief(self, distribution):
# We shouldn't export artifacts on non-chief workers. Since there's no easy
# way to test with real MultiWorkerMirroredStrategy, we patch the strategy
# to make it as if it's MultiWorkerMirroredStrategy on non-chief workers.
extended = distribution.extended
with mock.patch.object(extended.__class__, 'should_checkpoint',
new_callable=mock.PropertyMock, return_value=False), \
mock.patch.object(extended.__class__, 'should_save_summary',
new_callable=mock.PropertyMock, return_value=False):
model_dir = self.get_temp_dir()
self.run_training(
distribution, model_dir, steps_per_loop=10, run_eagerly=False)
self.assertEmpty(tf.io.gfile.listdir(model_dir))
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,56 @@
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions and classes related to training performance."""
import tensorflow as tf
def configure_optimizer(optimizer,
use_float16=False,
use_graph_rewrite=False,
loss_scale="dynamic"):
"""Configures optimizer object with performance options."""
if use_float16:
# Wraps optimizer with a LossScaleOptimizer. This is done automatically
# in compile() with the "mixed_float16" policy, but since we do not call
# compile(), we must wrap the optimizer manually.
optimizer = (
tf.keras.mixed_precision.experimental.LossScaleOptimizer(
optimizer, loss_scale=loss_scale))
if use_graph_rewrite:
# Note: the model dtype must be 'float32', which will ensure
# tf.ckeras.mixed_precision and
# tf.train.experimental.enable_mixed_precision_graph_rewrite do not double
# up.
optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(
optimizer)
return optimizer
def set_mixed_precision_policy(dtype, loss_scale=None):
"""Sets mix precision policy."""
if dtype == tf.float16:
policy = tf.keras.mixed_precision.experimental.Policy(
'mixed_float16', loss_scale=loss_scale)
tf.keras.mixed_precision.experimental.set_policy(policy)
elif dtype == tf.bfloat16:
policy = tf.keras.mixed_precision.experimental.Policy(
'mixed_bfloat16')
tf.keras.mixed_precision.experimental.set_policy(policy)
elif dtype == tf.float32:
tf.keras.mixed_precision.experimental.set_policy('float32')
else:
raise ValueError("Unexpected dtype: %s" % dtype)
@@ -0,0 +1,175 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common TF utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import tensorflow as tf
from tensorflow.python.util import deprecation
from official.modeling import activations
@deprecation.deprecated(
None,
"tf.keras.layers.Layer supports multiple positional args and kwargs as "
"input tensors. pack/unpack inputs to override __call__ is no longer "
"needed."
)
def pack_inputs(inputs):
"""Pack a list of `inputs` tensors to a tuple.
Args:
inputs: a list of tensors.
Returns:
a tuple of tensors. if any input is None, replace it with a special constant
tensor.
"""
inputs = tf.nest.flatten(inputs)
outputs = []
for x in inputs:
if x is None:
outputs.append(tf.constant(0, shape=[], dtype=tf.int32))
else:
outputs.append(x)
return tuple(outputs)
@deprecation.deprecated(
None,
"tf.keras.layers.Layer supports multiple positional args and kwargs as "
"input tensors. pack/unpack inputs to override __call__ is no longer "
"needed."
)
def unpack_inputs(inputs):
"""unpack a tuple of `inputs` tensors to a tuple.
Args:
inputs: a list of tensors.
Returns:
a tuple of tensors. if any input is a special constant tensor, replace it
with None.
"""
inputs = tf.nest.flatten(inputs)
outputs = []
for x in inputs:
if is_special_none_tensor(x):
outputs.append(None)
else:
outputs.append(x)
x = tuple(outputs)
# To trick the very pointless 'unbalanced-tuple-unpacking' pylint check
# from triggering.
if len(x) == 1:
return x[0]
return tuple(outputs)
def is_special_none_tensor(tensor):
"""Checks if a tensor is a special None Tensor."""
return tensor.shape.ndims == 0 and tensor.dtype == tf.int32
# TODO(hongkuny): consider moving custom string-map lookup to keras api.
def get_activation(identifier):
"""Maps a identifier to a Python function, e.g., "relu" => `tf.nn.relu`.
It checks string first and if it is one of customized activation not in TF,
the corresponding activation will be returned. For non-customized activation
names and callable identifiers, always fallback to tf.keras.activations.get.
Args:
identifier: String name of the activation function or callable.
Returns:
A Python function corresponding to the activation function.
"""
if isinstance(identifier, six.string_types):
name_to_fn = {
"gelu": activations.gelu,
"simple_swish": activations.simple_swish,
"hard_swish": activations.hard_swish,
"identity": activations.identity,
}
identifier = str(identifier).lower()
if identifier in name_to_fn:
return tf.keras.activations.get(name_to_fn[identifier])
return tf.keras.activations.get(identifier)
def get_shape_list(tensor, expected_rank=None, name=None):
"""Returns a list of the shape of tensor, preferring static dimensions.
Args:
tensor: A tf.Tensor object to find the shape of.
expected_rank: (optional) int. The expected rank of `tensor`. If this is
specified and the `tensor` has a different rank, and exception will be
thrown.
name: Optional name of the tensor for the error message.
Returns:
A list of dimensions of the shape of tensor. All static dimensions will
be returned as python integers, and dynamic dimensions will be returned
as tf.Tensor scalars.
"""
if expected_rank is not None:
assert_rank(tensor, expected_rank, name)
shape = tensor.shape.as_list()
non_static_indexes = []
for (index, dim) in enumerate(shape):
if dim is None:
non_static_indexes.append(index)
if not non_static_indexes:
return shape
dyn_shape = tf.shape(tensor)
for index in non_static_indexes:
shape[index] = dyn_shape[index]
return shape
def assert_rank(tensor, expected_rank, name=None):
"""Raises an exception if the tensor rank is not of the expected rank.
Args:
tensor: A tf.Tensor to check the rank of.
expected_rank: Python integer or list of integers, expected rank.
name: Optional name of the tensor for the error message.
Raises:
ValueError: If the expected shape doesn't match the actual shape.
"""
expected_rank_dict = {}
if isinstance(expected_rank, six.integer_types):
expected_rank_dict[expected_rank] = True
else:
for x in expected_rank:
expected_rank_dict[x] = True
actual_rank = tensor.shape.ndims
if actual_rank not in expected_rank_dict:
raise ValueError(
"For the tensor `%s`, the actual tensor rank `%d` (shape = %s) is not "
"equal to the expected tensor rank `%s`" %
(name, actual_rank, str(tensor.shape), str(expected_rank)))
@@ -0,0 +1,735 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Custom training loop for running TensorFlow 2.0 models."""
from __future__ import absolute_import
from __future__ import division
# from __future__ import google_type_annotations
from __future__ import print_function
import json
import os
from absl import flags
from absl import logging
import numpy as np
import tensorflow as tf
# pylint: disable=unused-import,g-import-not-at-top,redefined-outer-name,reimported
from typing import Optional, Dict, List, Text, Callable, Union, Iterator, Any
from official.modeling.hyperparams import params_dict
from official.utils.misc import distribution_utils
from official.utils import hyperparams_flags
FLAGS = flags.FLAGS
strategy_flags_dict = hyperparams_flags.strategy_flags_dict
hparam_flags_dict = hyperparams_flags.hparam_flags_dict
def _save_checkpoint(checkpoint, model_dir, checkpoint_prefix):
"""Saves model to model_dir with provided checkpoint prefix."""
checkpoint_path = os.path.join(model_dir, checkpoint_prefix)
saved_path = checkpoint.save(checkpoint_path)
logging.info('Saving model as TF checkpoint: %s', saved_path)
def _steps_to_run(current_step, total_steps, steps_per_loop):
"""Calculates steps to run on device."""
if steps_per_loop <= 0:
raise ValueError('steps_per_loop should be positive integer.')
return min(total_steps - current_step, steps_per_loop)
def _no_metric():
return None
class SummaryWriter(object):
"""Simple SummaryWriter for writing dictionary of metrics.
Attributes:
writer: The tf.SummaryWriter.
"""
def __init__(self, model_dir: Text, name: Text):
"""Inits SummaryWriter with paths.
Arguments:
model_dir: the model folder path.
name: the summary subfolder name.
"""
self.writer = tf.summary.create_file_writer(os.path.join(model_dir, name))
def __call__(self, metrics: Union[Dict[Text, float], float], step: int):
"""Write metrics to summary with the given writer.
Args:
metrics: a dictionary of metrics values. Prefer dictionary.
step: integer. The training step.
"""
if not isinstance(metrics, dict):
# Support scalar metric without name.
logging.warning('Warning: summary writer prefer metrics as dictionary.')
metrics = {'metric': metrics}
with self.writer.as_default():
for k, v in metrics.items():
tf.summary.scalar(k, v, step=step)
self.writer.flush()
class DistributedExecutor(object):
"""Interface to train and eval models with tf.distribute.Strategy.
Arguments:
strategy: an instance of tf.distribute.Strategy.
params: Model configuration needed to run distribution strategy.
model_fn: Keras model function. Signature:
(params: ParamsDict) -> tf.keras.models.Model.
loss_fn: loss function. Signature:
(y_true: Tensor, y_pred: Tensor) -> Tensor
metric_fn: metric function. Signature: () -> tf.keras.metrics.Metric.
is_multi_host: Set to True when using multi hosts for training, like multi
worker GPU or TPU pod (slice). Otherwise, False.
"""
def __init__(self,
strategy,
params,
model_fn,
loss_fn,
is_multi_host=False):
self._params = params
self._model_fn = model_fn
self._loss_fn = loss_fn
self._strategy = strategy
self._checkpoint_name = 'ctl_step_{step}.ckpt'
self._is_multi_host = is_multi_host
self.train_summary_writer = None
self.eval_summary_writer = None
self.global_train_step = None
@property
def checkpoint_name(self):
"""Returns default checkpoint name."""
return self._checkpoint_name
@checkpoint_name.setter
def checkpoint_name(self, name):
"""Sets default summary writer for the current thread."""
self._checkpoint_name = name
def loss_fn(self):
return self._loss_fn()
def model_fn(self, params):
return self._model_fn(params)
def _save_config(self, model_dir):
"""Save parameters to config files if model_dir is defined."""
logging.info('Save config to model_dir %s.', model_dir)
if model_dir:
if not tf.io.gfile.exists(model_dir):
tf.io.gfile.makedirs(model_dir)
self._params.lock()
params_dict.save_params_dict_to_yaml(self._params,
model_dir + '/params.yaml')
else:
logging.warning('model_dir is empty, so skip the save config.')
def _get_input_iterator(
self, input_fn: Callable[..., tf.data.Dataset],
strategy: tf.distribute.Strategy) -> Optional[Iterator[Any]]:
"""Returns distributed dataset iterator.
Args:
input_fn: (params: dict) -> tf.data.Dataset.
strategy: an instance of tf.distribute.Strategy.
Returns:
An iterator that yields input tensors.
"""
if input_fn is None:
return None
# When training with multiple TPU workers, datasets needs to be cloned
# across workers. Since Dataset instance cannot be cloned in eager mode,
# we instead pass callable that returns a dataset.
if self._is_multi_host:
return iter(
strategy.experimental_distribute_datasets_from_function(input_fn))
else:
input_data = input_fn()
return iter(strategy.experimental_distribute_dataset(input_data))
def _create_replicated_step(self,
strategy,
model,
loss_fn,
optimizer,
metric=None):
def _replicated_step(inputs):
"""Replicated training step."""
inputs, labels = inputs
with tf.GradientTape() as tape:
outputs = model(inputs, training=True)
prediction_loss = loss_fn(labels, outputs)
loss = tf.reduce_mean(prediction_loss)
loss = loss / strategy.num_replicas_in_sync
if isinstance(metric, tf.keras.metrics.Metric):
metric.update_state(labels, outputs)
else:
logging.error('train metric is not an instance of '
'tf.keras.metrics.Metric.')
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
return loss
return _replicated_step
def _create_train_step(self,
strategy,
model,
loss_fn,
optimizer,
metric=None):
"""Creates a distributed training step.
Args:
strategy: an instance of tf.distribute.Strategy.
model: (Tensor, bool) -> Tensor. model function.
loss_fn: (y_true: Tensor, y_pred: Tensor) -> Tensor.
optimizer: tf.keras.optimizers.Optimizer.
iterator: an iterator that yields input tensors.
metric: tf.keras.metrics.Metric subclass.
Returns:
The training step callable.
"""
_replicated_step = self._create_replicated_step(strategy, model, loss_fn,
optimizer, metric)
@tf.function
def train_step(iterator, num_steps):
"""Performs a distributed training step.
Args:
iterator: an iterator that yields input tensors.
Returns:
The loss tensor.
"""
if not isinstance(num_steps, tf.Tensor):
raise ValueError('steps should be an Tensor. Python object may cause '
'retracing.')
per_replica_losses = strategy.run(
_replicated_step, args=(next(iterator),))
for _ in tf.range(num_steps - 1):
per_replica_losses = strategy.run(
_replicated_step, args=(next(iterator),))
# For reporting, we returns the mean of losses.
losses = tf.nest.map_structure(
lambda x: strategy.reduce(tf.distribute.ReduceOp.MEAN, x, axis=None),
per_replica_losses)
return losses
return train_step
def _create_test_step(self, strategy, model, metric):
"""Creates a distributed test step."""
@tf.function
def test_step(iterator):
"""Calculates evaluation metrics on distributed devices."""
if not metric:
logging.info('Skip test_step because metric is None (%s)', metric)
return None, None
if not isinstance(metric, tf.keras.metrics.Metric):
raise ValueError(
'Metric must be an instance of tf.keras.metrics.Metric '
'for running in test_step. Actual {}'.format(metric))
def _test_step_fn(inputs):
"""Replicated accuracy calculation."""
inputs, labels = inputs
model_outputs = model(inputs, training=False)
metric.update_state(labels, model_outputs)
return labels, model_outputs
return strategy.run(_test_step_fn, args=(next(iterator),))
return test_step
def train(self,
train_input_fn: Callable[[params_dict.ParamsDict], tf.data.Dataset],
eval_input_fn: Callable[[params_dict.ParamsDict],
tf.data.Dataset] = None,
model_dir: Text = None,
total_steps: int = 1,
iterations_per_loop: int = 1,
train_metric_fn: Callable[[], Any] = None,
eval_metric_fn: Callable[[], Any] = None,
summary_writer_fn: Callable[[Text, Text],
SummaryWriter] = SummaryWriter,
init_checkpoint: Callable[[tf.keras.Model], Any] = None,
custom_callbacks: List[tf.keras.callbacks.Callback] = None,
save_config: bool = True):
"""Runs distributed training.
Args:
train_input_fn: (params: dict) -> tf.data.Dataset training data input
function.
eval_input_fn: (Optional) same type as train_input_fn. If not None, will
trigger evaluting metric on eval data. If None, will not run eval step.
model_dir: the folder path for model checkpoints.
total_steps: total training steps.
iterations_per_loop: train steps per loop. After each loop, this job will
update metrics like loss and save checkpoint.
train_metric_fn: metric_fn for evaluation in train_step.
eval_metric_fn: metric_fn for evaluation in test_step.
summary_writer_fn: function to create summary writer.
init_checkpoint: function to load checkpoint.
custom_callbacks: A list of Keras Callbacks objects to run during
training. More specifically, `on_batch_begin()`, `on_batch_end()`,
methods are invoked during training.
save_config: bool. Whether to save params to model_dir.
Returns:
The training loss and eval metrics.
"""
assert train_input_fn is not None
if train_metric_fn and not callable(train_metric_fn):
raise ValueError('if `train_metric_fn` is specified, '
'train_metric_fn must be a callable.')
if eval_metric_fn and not callable(eval_metric_fn):
raise ValueError('if `eval_metric_fn` is specified, '
'eval_metric_fn must be a callable.')
train_metric_fn = train_metric_fn or _no_metric
eval_metric_fn = eval_metric_fn or _no_metric
if custom_callbacks and iterations_per_loop != 1:
logging.error(
'It is sematically wrong to run callbacks when '
'iterations_per_loop is not one (%s)', iterations_per_loop)
def _run_callbacks_on_batch_begin(batch):
"""Runs custom callbacks at the start of every step."""
if not custom_callbacks:
return
for callback in custom_callbacks:
if callback:
callback.on_batch_begin(batch)
def _run_callbacks_on_batch_end(batch):
"""Runs custom callbacks at the end of every step."""
if not custom_callbacks:
return
for callback in custom_callbacks:
if callback:
callback.on_batch_end(batch)
if save_config:
self._save_config(model_dir)
if FLAGS.save_checkpoint_freq:
save_freq = FLAGS.save_checkpoint_freq
else:
save_freq = iterations_per_loop
params = self._params
strategy = self._strategy
# To reduce unnecessary send/receive input pipeline operation, we place
# input pipeline ops in worker task.
train_iterator = self._get_input_iterator(train_input_fn, strategy)
train_loss = None
eval_metric_result = None
with strategy.scope():
# To correctly place the model weights on accelerators,
# model and optimizer should be created in scope.
model = self.model_fn(params.as_dict())
if not hasattr(model, 'optimizer'):
raise ValueError('User should set optimizer attribute to model '
'inside `model_fn`.')
optimizer = model.optimizer
# Training loop starts here.
checkpoint = tf.train.Checkpoint(model=model, optimizer=optimizer)
latest_checkpoint_file = tf.train.latest_checkpoint(model_dir)
initial_step = 0
if latest_checkpoint_file:
logging.info(
'Checkpoint file %s found and restoring from '
'checkpoint', latest_checkpoint_file)
checkpoint.restore(latest_checkpoint_file)
initial_step = optimizer.iterations.numpy()
logging.info('Loading from checkpoint file completed. Init step %d',
initial_step)
elif init_checkpoint:
logging.info('Restoring from init checkpoint function')
init_checkpoint(model)
logging.info('Loading from init checkpoint file completed')
current_step = optimizer.iterations.numpy()
checkpoint_name = self.checkpoint_name
eval_metric = eval_metric_fn()
train_metric = train_metric_fn()
train_summary_writer = summary_writer_fn(model_dir, 'eval_train')
self.train_summary_writer = train_summary_writer.writer
test_summary_writer = summary_writer_fn(model_dir, 'eval_test')
self.eval_summary_writer = test_summary_writer.writer
# Continue training loop.
train_step = self._create_train_step(
strategy=strategy,
model=model,
loss_fn=self.loss_fn(),
optimizer=optimizer,
metric=train_metric)
test_step = None
if eval_input_fn and eval_metric:
self.global_train_step = model.optimizer.iterations
test_step = self._create_test_step(strategy, model, metric=eval_metric)
logging.info('Training started')
last_save_checkpoint_step = current_step
while current_step < total_steps:
num_steps = _steps_to_run(current_step, total_steps, iterations_per_loop)
_run_callbacks_on_batch_begin(current_step)
train_loss = train_step(train_iterator,
tf.convert_to_tensor(num_steps, dtype=tf.int32))
_run_callbacks_on_batch_end(current_step)
current_step += num_steps
train_loss = tf.nest.map_structure(lambda x: x.numpy().astype(float),
train_loss)
if not isinstance(train_loss, dict):
train_loss = {'total_loss': train_loss}
if np.isnan(train_loss['total_loss']):
raise ValueError('total loss is NaN.')
if train_metric:
train_metric_result = train_metric.result()
if isinstance(train_metric, tf.keras.metrics.Metric):
train_metric_result = tf.nest.map_structure(
lambda x: x.numpy().astype(float), train_metric_result)
if not isinstance(train_metric_result, dict):
train_metric_result = {'metric': train_metric_result}
train_metric_result.update(train_loss)
else:
train_metric_result = train_loss
if callable(optimizer.lr):
train_metric_result.update(
{'learning_rate': optimizer.lr(current_step).numpy()})
else:
train_metric_result.update({'learning_rate': optimizer.lr.numpy()})
logging.info('Train Step: %d/%d / loss = %s / training metric = %s',
current_step, total_steps, train_loss,
train_metric_result)
train_summary_writer(
metrics=train_metric_result, step=optimizer.iterations)
# Saves model checkpoints and run validation steps at every
# iterations_per_loop steps.
# To avoid repeated model saving, we do not save after the last
# step of training.
if save_freq > 0 and current_step < total_steps and (
current_step - last_save_checkpoint_step) >= save_freq:
_save_checkpoint(checkpoint, model_dir,
checkpoint_name.format(step=current_step))
last_save_checkpoint_step = current_step
if test_step:
eval_iterator = self._get_input_iterator(eval_input_fn, strategy)
eval_metric_result = self._run_evaluation(test_step, current_step,
eval_metric, eval_iterator)
logging.info('Step: %s evalation metric = %s.', current_step,
eval_metric_result)
test_summary_writer(
metrics=eval_metric_result, step=optimizer.iterations)
# Re-initialize evaluation metric, except the last step.
if eval_metric and current_step < total_steps:
eval_metric.reset_states()
if train_metric and current_step < total_steps:
train_metric.reset_states()
# Reaches the end of training and saves the last checkpoint.
if last_save_checkpoint_step < total_steps:
_save_checkpoint(checkpoint, model_dir,
checkpoint_name.format(step=current_step))
if test_step:
logging.info('Running final evaluation after training is complete.')
eval_iterator = self._get_input_iterator(eval_input_fn, strategy)
eval_metric_result = self._run_evaluation(test_step, current_step,
eval_metric, eval_iterator)
logging.info('Final evaluation metric = %s.', eval_metric_result)
test_summary_writer(
metrics=eval_metric_result, step=optimizer.iterations)
return train_loss, eval_metric_result
def _run_evaluation(self, test_step, current_training_step, metric,
test_iterator):
"""Runs validation steps and aggregate metrics."""
if not test_iterator or not metric:
logging.warning(
'Both test_iterator (%s) and metrics (%s) must not be None.',
test_iterator, metric)
return None
logging.info('Running evaluation after step: %s.', current_training_step)
while True:
try:
test_step(test_iterator)
except (StopIteration, tf.errors.OutOfRangeError):
break
metric_result = metric.result()
if isinstance(metric, tf.keras.metrics.Metric):
metric_result = metric_result.numpy().astype(float)
logging.info('Step: [%d] Validation metric = %f', current_training_step,
metric_result)
return metric_result
def evaluate_from_model_dir(
self,
model_dir: Text,
eval_input_fn: Callable[[params_dict.ParamsDict], tf.data.Dataset],
eval_metric_fn: Callable[[], Any],
total_steps: int = -1,
eval_timeout: int = None,
min_eval_interval: int = 180,
summary_writer_fn: Callable[[Text, Text], SummaryWriter] = SummaryWriter):
"""Runs distributed evaluation on model folder.
Args:
eval_input_fn: (Optional) same type as train_input_fn. If not None, will
trigger evaluting metric on eval data. If None, will not run eval step.
eval_metric_fn: metric_fn for evaluation in test_step.
model_dir: the folder for storing model checkpoints.
total_steps: total training steps. If the current step reaches the
total_steps, the evaluation loop will stop.
eval_timeout: The maximum number of seconds to wait between checkpoints.
If left as None, then the process will wait indefinitely. Used by
tf.train.checkpoints_iterator.
min_eval_interval: The minimum number of seconds between yielding
checkpoints. Used by tf.train.checkpoints_iterator.
summary_writer_fn: function to create summary writer.
Returns:
Eval metrics dictionary of the last checkpoint.
"""
if not model_dir:
raise ValueError('model_dir must be set.')
def terminate_eval():
tf.logging.info('Terminating eval after %d seconds of no checkpoints' %
eval_timeout)
return True
summary_writer = summary_writer_fn(model_dir, 'eval')
self.eval_summary_writer = summary_writer.writer
# Read checkpoints from the given model directory
# until `eval_timeout` seconds elapses.
for checkpoint_path in tf.train.checkpoints_iterator(
model_dir,
min_interval_secs=min_eval_interval,
timeout=eval_timeout,
timeout_fn=terminate_eval):
eval_metric_result, current_step = self.evaluate_checkpoint(
checkpoint_path=checkpoint_path,
eval_input_fn=eval_input_fn,
eval_metric_fn=eval_metric_fn,
summary_writer=summary_writer)
if total_steps > 0 and current_step >= total_steps:
logging.info('Evaluation finished after training step %d', current_step)
break
return eval_metric_result
def evaluate_checkpoint(self,
checkpoint_path: Text,
eval_input_fn: Callable[[params_dict.ParamsDict],
tf.data.Dataset],
eval_metric_fn: Callable[[], Any],
summary_writer: SummaryWriter = None):
"""Runs distributed evaluation on the one checkpoint.
Args:
eval_input_fn: (Optional) same type as train_input_fn. If not None, will
trigger evaluting metric on eval data. If None, will not run eval step.
eval_metric_fn: metric_fn for evaluation in test_step.
checkpoint_path: the checkpoint to evaluate.
summary_writer_fn: function to create summary writer.
Returns:
Eval metrics dictionary of the last checkpoint.
"""
if not callable(eval_metric_fn):
raise ValueError('if `eval_metric_fn` is specified, '
'eval_metric_fn must be a callable.')
params = self._params
strategy = self._strategy
# To reduce unnecessary send/receive input pipeline operation, we place
# input pipeline ops in worker task.
with strategy.scope():
# To correctly place the model weights on accelerators,
# model and optimizer should be created in scope.
model = self.model_fn(params.as_dict())
checkpoint = tf.train.Checkpoint(model=model)
eval_metric = eval_metric_fn()
assert eval_metric, 'eval_metric does not exist'
test_step = self._create_test_step(strategy, model, metric=eval_metric)
logging.info('Starting to evaluate.')
if not checkpoint_path:
raise ValueError('checkpoint path is empty')
reader = tf.compat.v1.train.NewCheckpointReader(checkpoint_path)
current_step = reader.get_tensor(
'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE')
logging.info(
'Checkpoint file %s found and restoring from '
'checkpoint', checkpoint_path)
checkpoint.restore(checkpoint_path)
self.global_train_step = model.optimizer.iterations
eval_iterator = self._get_input_iterator(eval_input_fn, strategy)
eval_metric_result = self._run_evaluation(test_step, current_step,
eval_metric, eval_iterator)
logging.info('Step: %s evalation metric = %s.', current_step,
eval_metric_result)
summary_writer(metrics=eval_metric_result, step=current_step)
eval_metric.reset_states()
return eval_metric_result, current_step
def predict(self):
return NotImplementedError('Unimplmented function.')
class ExecutorBuilder(object):
"""Builder of DistributedExecutor.
Example 1: Builds an executor with supported Strategy.
builder = ExecutorBuilder(
strategy_type='tpu',
strategy_config={'tpu': '/bns/xxx'})
dist_executor = builder.build_executor(
params=params,
model_fn=my_model_fn,
loss_fn=my_loss_fn,
metric_fn=my_metric_fn)
Example 2: Builds an executor with customized Strategy.
builder = ExecutorBuilder()
builder.strategy = <some customized Strategy>
dist_executor = builder.build_executor(
params=params,
model_fn=my_model_fn,
loss_fn=my_loss_fn,
metric_fn=my_metric_fn)
Example 3: Builds a customized executor with customized Strategy.
class MyDistributedExecutor(DistributedExecutor):
# implementation ...
builder = ExecutorBuilder()
builder.strategy = <some customized Strategy>
dist_executor = builder.build_executor(
class_ctor=MyDistributedExecutor,
params=params,
model_fn=my_model_fn,
loss_fn=my_loss_fn,
metric_fn=my_metric_fn)
Args:
strategy_type: string. One of 'tpu', 'mirrored', 'multi_worker_mirrored'. If
None. User is responsible to set the strategy before calling
build_executor(...).
strategy_config: necessary config for constructing the proper Strategy.
Check strategy_flags_dict() for examples of the structure.
"""
def __init__(self, strategy_type=None, strategy_config=None):
_ = distribution_utils.configure_cluster(
strategy_config.worker_hosts, strategy_config.task_index)
self._strategy = distribution_utils.get_distribution_strategy(
distribution_strategy=strategy_type,
num_gpus=strategy_config.num_gpus,
all_reduce_alg=strategy_config.all_reduce_alg,
num_packs=strategy_config.num_packs,
tpu_address=strategy_config.tpu)
@property
def strategy(self):
"""Returns default checkpoint name."""
return self._strategy
@strategy.setter
def strategy(self, new_strategy):
"""Sets default summary writer for the current thread."""
self._strategy = new_strategy
def build_executor(self,
class_ctor=DistributedExecutor,
params=None,
model_fn=None,
loss_fn=None,
**kwargs):
"""Creates an executor according to strategy type.
See doc string of the DistributedExecutor.__init__ for more information of
the
input arguments.
Args:
class_ctor: A constructor of executor (default: DistributedExecutor).
params: ParamsDict, all the model parameters and runtime parameters.
model_fn: Keras model function.
loss_fn: loss function.
**kwargs: other arguments to the executor constructor.
Returns:
An instance of DistributedExecutor or its subclass.
"""
if self._strategy is None:
raise ValueError('`strategy` should not be None. You need to specify '
'`strategy_type` in the builder contructor or directly '
'set the `strategy` property of the builder.')
return class_ctor(
strategy=self._strategy,
params=params,
model_fn=model_fn,
loss_fn=loss_fn,
**kwargs)
@@ -0,0 +1,88 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Sets up TensorFlow Official Models."""
import datetime
import os
import sys
from setuptools import find_packages
from setuptools import setup
version = '2.2.0'
project_name = 'tf-models-official'
long_description = """The TensorFlow official models are a collection of
models that use TensorFlow's high-level APIs.
They are intended to be well-maintained, tested, and kept up to date with the
latest TensorFlow API. They should also be reasonably optimized for fast
performance while still being easy to read."""
if '--project_name' in sys.argv:
project_name_idx = sys.argv.index('--project_name')
project_name = sys.argv[project_name_idx + 1]
sys.argv.remove('--project_name')
sys.argv.pop(project_name_idx)
def _get_requirements():
"""Parses requirements.txt file."""
install_requires_tmp = []
dependency_links_tmp = []
with open(
os.path.join(os.path.dirname(__file__), '../requirements.txt'), 'r') as f:
for line in f:
package_name = line.strip()
if package_name.startswith('-e '):
dependency_links_tmp.append(package_name[3:].strip())
else:
install_requires_tmp.append(package_name)
return install_requires_tmp, dependency_links_tmp
install_requires, dependency_links = _get_requirements()
if project_name == 'tf-models-nightly':
version += '.dev' + datetime.datetime.now().strftime('%Y%m%d')
install_requires.append('tf-nightly')
else:
install_requires.append('tensorflow>=2.1.0')
print('install_requires: ', install_requires)
print('dependency_links: ', dependency_links)
setup(
name=project_name,
version=version,
description='TensorFlow Official Models',
long_description=long_description,
author='Google Inc.',
author_email='no-reply@google.com',
url='https://github.com/tensorflow/models',
license='Apache 2.0',
packages=find_packages(exclude=[
'research*',
'tutorials*',
'samples*',
'official.r1*',
'official.pip_package*',
'official.benchmark*',
]),
exclude_package_data={
'': ['*_test.py',],
},
install_requires=install_requires,
dependency_links=dependency_links,
python_requires='>=3.6',
)
@@ -0,0 +1,23 @@
# Legacy Models Collection
The R1 folder contains legacy model implmentation and models that will not
update to TensorFlow 2.x. They do not have solid performance tracking.
**Note: models will be removed from the master branch by 2020/06.**
After removal, you can still access to these legacy models in the previous
released tags, e.g. [v2.1.0](https://github.com/tensorflow/models/releases/tag/v2.1.0).
## Legacy model implmentation
Transformer and MNIST implementation uses pure TF 1.x TF-Estimator.
Users should follow the corresponding TF 2.x implmentation inside the
official model garden.
## Models that will not update to TensorFlow 2.x
* [boosted_trees](boosted_trees): A Gradient Boosted Trees model to
classify higgs boson process from HIGGS Data Set.
* [wide_deep](wide_deep): A model that combines a wide model and deep
network to classify census income data.
@@ -0,0 +1,112 @@
# Classifying Higgs boson processes in the HIGGS Data Set
## Overview
The [HIGGS Data Set](https://archive.ics.uci.edu/ml/datasets/HIGGS) contains 11 million samples with 28 features, and is for the classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not.
We use Gradient Boosted Trees algorithm to distinguish the two classes.
---
The code sample uses the high level `tf.estimator.Estimator` and `tf.data.Dataset`. These APIs are great for fast iteration and quickly adapting models to your own datasets without major code overhauls. It allows you to move from single-worker training to distributed training, and makes it easy to export model binaries for prediction. Here, for further simplicity and faster execution, we use a utility function `tf.contrib.estimator.boosted_trees_classifier_train_in_memory`. This utility function is especially effective when the input is provided as in-memory data sets like numpy arrays.
An input function for the `Estimator` typically uses `tf.data.Dataset` API, which can handle various data control like streaming, batching, transform and shuffling. However `boosted_trees_classifier_train_in_memory()` utility function requires that the entire data is provided as a single batch (i.e. without using `batch()` API). Thus in this practice, simply `Dataset.from_tensors()` is used to convert numpy arrays into structured tensors, and `Dataset.zip()` is used to put features and label together.
For further references of `Dataset`, [Read more here](https://www.tensorflow.org/guide/datasets).
## Running the code
First make sure you've [added the models folder to your Python path](/official/#running-the-models); otherwise you may encounter an error like `ImportError: No module named official.boosted_trees`.
### Setup
The [HIGGS Data Set](https://archive.ics.uci.edu/ml/datasets/HIGGS) that this sample uses for training is hosted by the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/). We have provided a script that downloads and cleans the necessary files.
```
python data_download.py
```
This will download a file and store the processed file under the directory designated by `--data_dir` (defaults to `/tmp/higgs_data/`). To change the target directory, set the `--data_dir` flag. The directory could be network storages that Tensorflow supports (like Google Cloud Storage, `gs://<bucket>/<path>/`).
The file downloaded to the local temporary folder is about 2.8 GB, and the processed file is about 0.8 GB, so there should be enough storage to handle them.
### Training
This example uses about 3 GB of RAM during training.
You can run the code locally as follows:
```
python train_higgs.py
```
The model is by default saved to `/tmp/higgs_model`, which can be changed using the `--model_dir` flag.
Note that the model_dir is cleaned up before every time training starts.
Model parameters can be adjusted by flags, like `--n_trees`, `--max_depth`, `--learning_rate` and so on. Check out the code for details.
The final accuracy will be around 74% and loss will be around 0.516 over the eval set, when trained with the default parameters.
By default, the first 1 million examples among 11 millions are used for training, and the last 1 million examples are used for evaluation.
The training/evaluation data can be selected as index ranges by flags `--train_start`, `--train_count`, `--eval_start`, `--eval_count`, etc.
### TensorBoard
Run TensorBoard to inspect the details about the graph and training progression.
```
tensorboard --logdir=/tmp/higgs_model # set logdir as --model_dir set during training.
```
## Inference with SavedModel
You can export the model into Tensorflow [SavedModel](https://www.tensorflow.org/guide/saved_model) format by using the argument `--export_dir`:
```
python train_higgs.py --export_dir /tmp/higgs_boosted_trees_saved_model
```
After the model finishes training, use [`saved_model_cli`](https://www.tensorflow.org/guide/saved_model#cli_to_inspect_and_execute_savedmodel) to inspect and execute the SavedModel.
Try the following commands to inspect the SavedModel:
**Replace `${TIMESTAMP}` with the folder produced (e.g. 1524249124)**
```
# List possible tag_sets. Only one metagraph is saved, so there will be one option.
saved_model_cli show --dir /tmp/higgs_boosted_trees_saved_model/${TIMESTAMP}/
# Show SignatureDefs for tag_set=serve. SignatureDefs define the outputs to show.
saved_model_cli show --dir /tmp/higgs_boosted_trees_saved_model/${TIMESTAMP}/ \
--tag_set serve --all
```
### Inference
Let's use the model to predict the income group of two examples.
Note that this model exports SavedModel with the custom parsing module that accepts csv lines as features. (Each line is an example with 28 columns; be careful to not add a label column, unlike in the training data.)
```
saved_model_cli run --dir /tmp/boosted_trees_higgs_saved_model/${TIMESTAMP}/ \
--tag_set serve --signature_def="predict" \
--input_exprs='inputs=["0.869293,-0.635082,0.225690,0.327470,-0.689993,0.754202,-0.248573,-1.092064,0.0,1.374992,-0.653674,0.930349,1.107436,1.138904,-1.578198,-1.046985,0.0,0.657930,-0.010455,-0.045767,3.101961,1.353760,0.979563,0.978076,0.920005,0.721657,0.988751,0.876678", "1.595839,-0.607811,0.007075,1.818450,-0.111906,0.847550,-0.566437,1.581239,2.173076,0.755421,0.643110,1.426367,0.0,0.921661,-1.190432,-1.615589,0.0,0.651114,-0.654227,-1.274345,3.101961,0.823761,0.938191,0.971758,0.789176,0.430553,0.961357,0.957818"]'
```
This will print out the predicted classes and class probabilities. Something like:
```
Result for output key class_ids:
[[1]
[0]]
Result for output key classes:
[['1']
['0']]
Result for output key logistic:
[[0.6440273 ]
[0.10902369]]
Result for output key logits:
[[ 0.59288704]
[-2.1007526 ]]
Result for output key probabilities:
[[0.3559727 0.6440273]
[0.8909763 0.1090237]]
```
Please note that "predict" signature_def gives out different (more detailed) results than "classification" or "serving_default".
## Additional Links
If you are interested in distributed training, take a look at [Distributed TensorFlow](https://www.tensorflow.org/deploy/distributed).
You can also [train models on Cloud ML Engine](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction), which provides [hyperparameter tuning](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction#hyperparameter_tuning) to maximize your model's results and enables [deploying your model for prediction](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction#deploy_a_model_to_support_prediction).
@@ -0,0 +1,97 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Downloads the UCI HIGGS Dataset and prepares train data.
The details on the dataset are in https://archive.ics.uci.edu/ml/datasets/HIGGS
It takes a while as it needs to download 2.8 GB over the network, process, then
store it into the specified location as a compressed numpy file.
Usage:
$ python data_download.py --data_dir=/tmp/higgs_data
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import os
import tempfile
# pylint: disable=g-bad-import-order
import numpy as np
import pandas as pd
from six.moves import urllib
from absl import app as absl_app
from absl import flags
import tensorflow as tf
from official.utils.flags import core as flags_core
URL_ROOT = "https://archive.ics.uci.edu/ml/machine-learning-databases/00280"
INPUT_FILE = "HIGGS.csv.gz"
NPZ_FILE = "HIGGS.csv.gz.npz" # numpy compressed file to contain "data" array.
def _download_higgs_data_and_save_npz(data_dir):
"""Download higgs data and store as a numpy compressed file."""
input_url = URL_ROOT + "/" + INPUT_FILE
np_filename = os.path.join(data_dir, NPZ_FILE)
if tf.gfile.Exists(np_filename):
raise ValueError("data_dir already has the processed data file: {}".format(
np_filename))
if not tf.gfile.Exists(data_dir):
tf.gfile.MkDir(data_dir)
# 2.8 GB to download.
try:
tf.logging.info("Data downloading...")
temp_filename, _ = urllib.request.urlretrieve(input_url)
# Reading and parsing 11 million csv lines takes 2~3 minutes.
tf.logging.info("Data processing... taking multiple minutes...")
with gzip.open(temp_filename, "rb") as csv_file:
data = pd.read_csv(
csv_file,
dtype=np.float32,
names=["c%02d" % i for i in range(29)] # label + 28 features.
).as_matrix()
finally:
tf.gfile.Remove(temp_filename)
# Writing to temporary location then copy to the data_dir (0.8 GB).
f = tempfile.NamedTemporaryFile()
np.savez_compressed(f, data=data)
tf.gfile.Copy(f.name, np_filename)
tf.logging.info("Data saved to: {}".format(np_filename))
def main(unused_argv):
if not tf.gfile.Exists(FLAGS.data_dir):
tf.gfile.MkDir(FLAGS.data_dir)
_download_higgs_data_and_save_npz(FLAGS.data_dir)
def define_data_download_flags():
"""Add flags specifying data download arguments."""
flags.DEFINE_string(
name="data_dir", default="/tmp/higgs_data",
help=flags_core.help_wrap(
"Directory to download higgs dataset and store training/eval data."))
if __name__ == "__main__":
tf.logging.set_verbosity(tf.logging.INFO)
define_data_download_flags()
FLAGS = flags.FLAGS
absl_app.run(main)
@@ -0,0 +1,297 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""A script that builds boosted trees over higgs data.
If you haven't, please run data_download.py beforehand to prepare the data.
For some more details on this example, please refer to README.md as well.
Note that the model_dir is cleaned up before starting the training.
Usage:
$ python train_higgs.py --n_trees=100 --max_depth=6 --learning_rate=0.1 \
--model_dir=/tmp/higgs_model
Note that BoostedTreesClassifier is available since Tensorflow 1.8.0.
So you need to install recent enough version of Tensorflow to use this example.
The training data is by default the first million examples out of 11M examples,
and eval data is by default the last million examples.
They are controlled by --train_start, --train_count, --eval_start, --eval_count.
e.g. to train over the first 10 million examples instead of 1 million:
$ python train_higgs.py --n_trees=100 --max_depth=6 --learning_rate=0.1 \
--model_dir=/tmp/higgs_model --train_count=10000000
Training history and metrics can be inspected using tensorboard.
Set --logdir as the --model_dir set by flag when training
(or the default /tmp/higgs_model).
$ tensorboard --logdir=/tmp/higgs_model
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# pylint: disable=g-bad-import-order
import numpy as np
from absl import app as absl_app
from absl import flags
import tensorflow as tf
# pylint: enable=g-bad-import-order
from official.utils.flags import core as flags_core
from official.utils.flags._conventions import help_wrap
from official.utils.logs import logger
NPZ_FILE = "HIGGS.csv.gz.npz" # numpy compressed file containing "data" array
def read_higgs_data(data_dir, train_start, train_count, eval_start, eval_count):
"""Reads higgs data from csv and returns train and eval data.
Args:
data_dir: A string, the directory of higgs dataset.
train_start: An integer, the start index of train examples within the data.
train_count: An integer, the number of train examples within the data.
eval_start: An integer, the start index of eval examples within the data.
eval_count: An integer, the number of eval examples within the data.
Returns:
Numpy array of train data and eval data.
"""
npz_filename = os.path.join(data_dir, NPZ_FILE)
try:
# gfile allows numpy to read data from network data sources as well.
with tf.gfile.Open(npz_filename, "rb") as npz_file:
with np.load(npz_file) as npz:
data = npz["data"]
except tf.errors.NotFoundError as e:
raise RuntimeError(
"Error loading data; use data_download.py to prepare the data.\n{}: {}"
.format(type(e).__name__, e))
return (data[train_start:train_start+train_count],
data[eval_start:eval_start+eval_count])
# This showcases how to make input_fn when the input data is available in the
# form of numpy arrays.
def make_inputs_from_np_arrays(features_np, label_np):
"""Makes and returns input_fn and feature_columns from numpy arrays.
The generated input_fn will return tf.data.Dataset of feature dictionary and a
label, and feature_columns will consist of the list of
tf.feature_column.BucketizedColumn.
Note, for in-memory training, tf.data.Dataset should contain the whole data
as a single tensor. Don't use batch.
Args:
features_np: A numpy ndarray (shape=[batch_size, num_features]) for
float32 features.
label_np: A numpy ndarray (shape=[batch_size, 1]) for labels.
Returns:
input_fn: A function returning a Dataset of feature dict and label.
feature_names: A list of feature names.
feature_column: A list of tf.feature_column.BucketizedColumn.
"""
num_features = features_np.shape[1]
features_np_list = np.split(features_np, num_features, axis=1)
# 1-based feature names.
feature_names = ["feature_%02d" % (i + 1) for i in range(num_features)]
# Create source feature_columns and bucketized_columns.
def get_bucket_boundaries(feature):
"""Returns bucket boundaries for feature by percentiles."""
return np.unique(np.percentile(feature, range(0, 100))).tolist()
source_columns = [
tf.feature_column.numeric_column(
feature_name, dtype=tf.float32,
# Although higgs data have no missing values, in general, default
# could be set as 0 or some reasonable value for missing values.
default_value=0.0)
for feature_name in feature_names
]
bucketized_columns = [
tf.feature_column.bucketized_column(
source_columns[i],
boundaries=get_bucket_boundaries(features_np_list[i]))
for i in range(num_features)
]
# Make an input_fn that extracts source features.
def input_fn():
"""Returns features as a dictionary of numpy arrays, and a label."""
features = {
feature_name: tf.constant(features_np_list[i])
for i, feature_name in enumerate(feature_names)
}
return tf.data.Dataset.zip((tf.data.Dataset.from_tensors(features),
tf.data.Dataset.from_tensors(label_np),))
return input_fn, feature_names, bucketized_columns
def make_eval_inputs_from_np_arrays(features_np, label_np):
"""Makes eval input as streaming batches."""
num_features = features_np.shape[1]
features_np_list = np.split(features_np, num_features, axis=1)
# 1-based feature names.
feature_names = ["feature_%02d" % (i + 1) for i in range(num_features)]
def input_fn():
features = {
feature_name: tf.constant(features_np_list[i])
for i, feature_name in enumerate(feature_names)
}
return tf.data.Dataset.zip((
tf.data.Dataset.from_tensor_slices(features),
tf.data.Dataset.from_tensor_slices(label_np),)).batch(1000)
return input_fn
def _make_csv_serving_input_receiver_fn(column_names, column_defaults):
"""Returns serving_input_receiver_fn for csv.
The input arguments are relevant to `tf.decode_csv()`.
Args:
column_names: a list of column names in the order within input csv.
column_defaults: a list of default values with the same size of
column_names. Each entity must be either a list of one scalar, or an
empty list to denote the corresponding column is required.
e.g. [[""], [2.5], []] indicates the third column is required while
the first column must be string and the second must be float/double.
Returns:
a serving_input_receiver_fn that handles csv for serving.
"""
def serving_input_receiver_fn():
csv = tf.placeholder(dtype=tf.string, shape=[None], name="csv")
features = dict(zip(column_names, tf.decode_csv(csv, column_defaults)))
receiver_tensors = {"inputs": csv}
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
return serving_input_receiver_fn
def train_boosted_trees(flags_obj):
"""Train boosted_trees estimator on HIGGS data.
Args:
flags_obj: An object containing parsed flag values.
"""
# Clean up the model directory if present.
if tf.gfile.Exists(flags_obj.model_dir):
tf.gfile.DeleteRecursively(flags_obj.model_dir)
tf.logging.info("## Data loading...")
train_data, eval_data = read_higgs_data(
flags_obj.data_dir, flags_obj.train_start, flags_obj.train_count,
flags_obj.eval_start, flags_obj.eval_count)
tf.logging.info("## Data loaded; train: {}{}, eval: {}{}".format(
train_data.dtype, train_data.shape, eval_data.dtype, eval_data.shape))
# Data consists of one label column followed by 28 feature columns.
train_input_fn, feature_names, feature_columns = make_inputs_from_np_arrays(
features_np=train_data[:, 1:], label_np=train_data[:, 0:1])
eval_input_fn = make_eval_inputs_from_np_arrays(
features_np=eval_data[:, 1:], label_np=eval_data[:, 0:1])
tf.logging.info("## Features prepared. Training starts...")
# Create benchmark logger to log info about the training and metric values
run_params = {
"train_start": flags_obj.train_start,
"train_count": flags_obj.train_count,
"eval_start": flags_obj.eval_start,
"eval_count": flags_obj.eval_count,
"n_trees": flags_obj.n_trees,
"max_depth": flags_obj.max_depth,
}
benchmark_logger = logger.config_benchmark_logger(flags_obj)
benchmark_logger.log_run_info(
model_name="boosted_trees",
dataset_name="higgs",
run_params=run_params,
test_id=flags_obj.benchmark_test_id)
# Though BoostedTreesClassifier is under tf.estimator, faster in-memory
# training is yet provided as a contrib library.
from tensorflow.contrib import estimator as contrib_estimator # pylint: disable=g-import-not-at-top
classifier = contrib_estimator.boosted_trees_classifier_train_in_memory(
train_input_fn,
feature_columns,
model_dir=flags_obj.model_dir or None,
n_trees=flags_obj.n_trees,
max_depth=flags_obj.max_depth,
learning_rate=flags_obj.learning_rate)
# Evaluation.
eval_results = classifier.evaluate(eval_input_fn)
# Benchmark the evaluation results
benchmark_logger.log_evaluation_result(eval_results)
# Exporting the savedmodel with csv parsing.
if flags_obj.export_dir is not None:
classifier.export_savedmodel(
flags_obj.export_dir,
_make_csv_serving_input_receiver_fn(
column_names=feature_names,
# columns are all floats.
column_defaults=[[0.0]] * len(feature_names)),
strip_default_attrs=True)
def main(_):
train_boosted_trees(flags.FLAGS)
def define_train_higgs_flags():
"""Add tree related flags as well as training/eval configuration."""
flags_core.define_base(clean=False, stop_threshold=False, batch_size=False,
num_gpu=False, export_dir=True)
flags_core.define_benchmark()
flags.adopt_module_key_flags(flags_core)
flags.DEFINE_integer(
name="train_start", default=0,
help=help_wrap("Start index of train examples within the data."))
flags.DEFINE_integer(
name="train_count", default=1000000,
help=help_wrap("Number of train examples within the data."))
flags.DEFINE_integer(
name="eval_start", default=10000000,
help=help_wrap("Start index of eval examples within the data."))
flags.DEFINE_integer(
name="eval_count", default=1000000,
help=help_wrap("Number of eval examples within the data."))
flags.DEFINE_integer(
"n_trees", default=100, help=help_wrap("Number of trees to build."))
flags.DEFINE_integer(
"max_depth", default=6, help=help_wrap("Maximum depths of each tree."))
flags.DEFINE_float(
"learning_rate", default=0.1,
help=help_wrap("The learning rate."))
flags_core.set_defaults(data_dir="/tmp/higgs_data",
model_dir="/tmp/higgs_model")
if __name__ == "__main__":
# Training progress and eval results are shown as logging.INFO; so enables it.
tf.logging.set_verbosity(tf.logging.INFO)
define_train_higgs_flags()
absl_app.run(main)
@@ -0,0 +1,152 @@
# ResNet in TensorFlow
Deep residual networks, or ResNets for short, provided the breakthrough idea of
identity mappings in order to enable training of very deep convolutional neural
networks. This folder contains an implementation of ResNet for the ImageNet
dataset written in TensorFlow.
See the following papers for more background:
[1] [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
[2] [Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
In code, v1 refers to the ResNet defined in [1] but where a stride 2 is used on
the 3x3 conv rather than the first 1x1 in the bottleneck. This change results
in higher and more stable accuracy with less epochs than the original v1 and has
shown to scale to higher batch sizes with minimal degradation in accuracy.
There is no originating paper. The first mention we are aware of was in the
torch version of [ResNetv1](https://github.com/facebook/fb.resnet.torch). Most
popular v1 implementations are this implementation which we call ResNetv1.5.
In testing we found v1.5 requires ~12% more compute to train and has 6% reduced
throughput for inference compared to ResNetv1. CIFAR-10 ResNet does not use the
bottleneck and is thus the same for v1 as v1.5.
v2 refers to [2]. The principle difference between the two versions is that v1
applies batch normalization and activation after convolution, while v2 applies
batch normalization, then activation, and finally convolution. A schematic
comparison is presented in Figure 1 (left) of [2].
Please proceed according to which dataset you would like to train/evaluate on:
## CIFAR-10
### Setup
You need to have the latest version of TensorFlow installed.
First, make sure [the models folder is in your Python path](/official/#running-the-models); otherwise you may encounter `ImportError: No module named official.resnet`.
Then, download and extract the CIFAR-10 data from Alex's website, specifying the location with the `--data_dir` flag. Run the following:
```bash
python cifar10_download_and_extract.py --data_dir <DATA_DIR>
```
Then, to train the model:
```bash
python cifar10_main.py --data_dir <DATA_DIR>/cifar-10-batches-bin --model_dir <MODEL_DIR>
```
Use `--data_dir` to specify the location of the CIFAR-10 data used in the previous step. There are more flag options as described in `cifar10_main.py`.
To export a `SavedModel` from the trained checkpoint:
```bash
python cifar10_main.py --data_dir <DATA_DIR>/cifar-10-batches-bin --model_dir <MODEL_DIR> --eval_only --export_dir <EXPORT_DIR>
```
Note: The `<EXPORT_DIR>` must be present. You might want to run `mkdir <EXPORT_DIR>` beforehand.
The `SavedModel` can then be [loaded](https://www.tensorflow.org/guide/saved_model#loading_a_savedmodel_in_python) in order to use the ResNet for prediction.
## ImageNet
### Setup
To begin, you will need to download the ImageNet dataset and convert it to
TFRecord format. The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py)
and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy)
provide a few options.
Once your dataset is ready, you can begin training the model as follows:
```bash
python imagenet_main.py --data_dir=/path/to/imagenet
```
The model will begin training and will automatically evaluate itself on the
validation data roughly once per epoch.
Note that there are a number of other options you can specify, including
`--model_dir` to choose where to store the model and `--resnet_size` to choose
the model size (options include ResNet-18 through ResNet-200). See
[`resnet_run_loop.py`](resnet_run_loop.py) for the full list of options.
## Compute Devices
Training is accomplished using the DistributionStrategies API. (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/distribute/README.md)
The appropriate distribution strategy is chosen based on the `--num_gpus` flag.
By default this flag is one if TensorFlow is compiled with CUDA, and zero
otherwise.
num_gpus:
+ 0: Use OneDeviceStrategy and train on CPU.
+ 1: Use OneDeviceStrategy and train on GPU.
+ 2+: Use MirroredStrategy (data parallelism) to distribute a batch between devices.
### Pre-trained model
You can download pre-trained versions of ResNet-50. Reported accuracies are top-1 single-crop accuracy for the ImageNet validation set.
Models are reported as both checkpoints produced by Estimator during training, and as SavedModels which are more portable. Checkpoints are fragile,
and these are not guaranteed to work with future versions of the code. Both ResNet v1
and ResNet v2 have been trained in both fp16 and fp32 precision. (Here v1 refers to "v1.5". See the note above.) Furthermore, SavedModels
are generated to accept either tensor or JPG inputs, and with channels_first (NCHW) and channels_last (NHWC) convolutions. NCHW is generally
better for GPUs, while NHWC is generally better for CPUs. See the TensorFlow [performance guide](https://www.tensorflow.org/performance/performance_guide#data_formats)
for more details.
ResNet-50 v2 (fp32, Accuracy 76.47%):
* [Checkpoint](http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v2_fp32_20181001.tar.gz)
* SavedModel [(NCHW)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NCHW.tar.gz),
[(NCHW, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz),
[(NHWC)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz),
[(NHWC, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz)
ResNet-50 v2 (fp16, Accuracy 76.56%):
* [Checkpoint](http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v2_fp16_20180928.tar.gz)
* SavedModel [(NCHW)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp16_savedmodel_NCHW.tar.gz),
[(NCHW, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp16_savedmodel_NCHW_jpg.tar.gz),
[(NHWC)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp16_savedmodel_NHWC.tar.gz),
[(NHWC, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp16_savedmodel_NHWC_jpg.tar.gz)
ResNet-50 v1 (fp32, Accuracy 76.53%):
* [Checkpoint](http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp32_20181001.tar.gz)
* SavedModel [(NCHW)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW.tar.gz),
[(NCHW, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz),
[(NHWC)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NHWC.tar.gz),
[(NHWC, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NHWC_jpg.tar.gz)
ResNet-50 v1 (fp16, Accuracy 76.18%):
* [Checkpoint](http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp16_20181001.tar.gz)
* SavedModel [(NCHW)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp16_savedmodel_NCHW.tar.gz),
[(NCHW, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp16_savedmodel_NCHW_jpg.tar.gz),
[(NHWC)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp16_savedmodel_NHWC.tar.gz),
[(NHWC, JPG)](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp16_savedmodel_NHWC_jpg.tar.gz)
### Transfer Learning
You can use a pretrained model to initialize a training process. In addition you are able to freeze all but the final fully connected layers to fine tune your model. Transfer Learning is useful when training on your own small datasets. For a brief look at transfer learning in the context of convolutional neural networks, we recommend reading these [short notes](http://cs231n.github.io/transfer-learning/).
To fine tune a pretrained resnet you must make three changes to your training procedure:
1) Build the exact same model as previously except we change the number of labels in the final classification layer.
2) Restore all weights from the pre-trained resnet except for the final classification layer; this will get randomly initialized instead.
3) Freeze earlier layers of the network
We can perform these three operations by specifying two flags: ```--pretrained_model_checkpoint_path``` and ```--fine_tune```. The first flag is a string that points to the path of a pre-trained resnet model. If this flag is specified, it will load all but the final classification layer. A key thing to note: if both ```--pretrained_model_checkpoint_path``` and a non empty ```model_dir``` directory are passed, the tensorflow estimator will load only the ```model_dir```. For more on this please see [WarmStartSettings](https://www.tensorflow.org/versions/master/api_docs/python/tf/estimator/WarmStartSettings) and [Estimators](https://www.tensorflow.org/guide/estimators).
The second flag ```--fine_tune``` is a boolean that indicates whether earlier layers of the network should be frozen. You may set this flag to false if you wish to continue training a pre-trained model from a checkpoint. If you set this flag to true, you can train a new classification layer from scratch.
@@ -0,0 +1,499 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Executes Estimator benchmarks and accuracy tests."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
from absl import flags
from absl.testing import flagsaver
import tensorflow as tf # pylint: disable=g-bad-import-order
from official.r1.resnet import cifar10_main as cifar_main
from official.r1.resnet import imagenet_main
from official.utils.flags import core as flags_core
from official.utils.logs import hooks
IMAGENET_DATA_DIR_NAME = 'imagenet'
CIFAR_DATA_DIR_NAME = 'cifar-10-batches-bin'
FLAGS = flags.FLAGS
class EstimatorBenchmark(tf.test.Benchmark):
"""Base class to hold methods common to test classes in the module.
Code under test for Estimator models (ResNet50 and 56) report mostly the
same data and require the same FLAG setup.
"""
local_flags = None
def __init__(self, output_dir=None, default_flags=None, flag_methods=None):
if not output_dir:
output_dir = '/tmp'
self.output_dir = output_dir
self.default_flags = default_flags or {}
self.flag_methods = flag_methods or {}
def _get_model_dir(self, folder_name):
"""Returns directory to store info, e.g. saved model and event log."""
return os.path.join(self.output_dir, folder_name)
def _setup(self):
"""Sets up and resets flags before each test."""
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
if EstimatorBenchmark.local_flags is None:
for flag_method in self.flag_methods:
flag_method()
# Loads flags to get defaults to then override. List cannot be empty.
flags.FLAGS(['foo'])
# Overrides flag values with defaults for the class of tests.
for k, v in self.default_flags.items():
setattr(FLAGS, k, v)
saved_flag_values = flagsaver.save_flag_values()
EstimatorBenchmark.local_flags = saved_flag_values
else:
flagsaver.restore_flag_values(EstimatorBenchmark.local_flags)
def _report_benchmark(self,
stats,
wall_time_sec,
top_1_max=None,
top_1_min=None):
"""Report benchmark results by writing to local protobuf file.
Args:
stats: dict returned from estimator models with known entries.
wall_time_sec: the during of the benchmark execution in seconds
top_1_max: highest passing level for top_1 accuracy.
top_1_min: lowest passing level for top_1 accuracy.
"""
examples_per_sec_hook = None
for hook in stats['train_hooks']:
if isinstance(hook, hooks.ExamplesPerSecondHook):
examples_per_sec_hook = hook
break
eval_results = stats['eval_results']
metrics = []
if 'accuracy' in eval_results:
metrics.append({'name': 'accuracy_top_1',
'value': float(eval_results['accuracy']),
'min_value': top_1_min,
'max_value': top_1_max})
if 'accuracy_top_5' in eval_results:
metrics.append({'name': 'accuracy_top_5',
'value': float(eval_results['accuracy_top_5'])})
if examples_per_sec_hook:
exp_per_second_list = examples_per_sec_hook.current_examples_per_sec_list
# ExamplesPerSecondHook skips the first 10 steps.
exp_per_sec = sum(exp_per_second_list) / (len(exp_per_second_list))
metrics.append({'name': 'exp_per_second',
'value': exp_per_sec})
flags_str = flags_core.get_nondefault_flags_as_str()
self.report_benchmark(
iters=eval_results.get('global_step', None),
wall_time=wall_time_sec,
metrics=metrics,
extras={'flags': flags_str})
class Resnet50EstimatorAccuracy(EstimatorBenchmark):
"""Benchmark accuracy tests for ResNet50 w/ Estimator."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
"""Benchmark accuracy tests for ResNet50 w/ Estimator.
Args:
output_dir: directory where to output e.g. log files
root_data_dir: directory under which to look for dataset
**kwargs: arbitrary named arguments. This is needed to make the
constructor forward compatible in case PerfZero provides more
named arguments before updating the constructor.
"""
flag_methods = [imagenet_main.define_imagenet_flags]
self.data_dir = os.path.join(root_data_dir, IMAGENET_DATA_DIR_NAME)
super(Resnet50EstimatorAccuracy, self).__init__(
output_dir=output_dir, flag_methods=flag_methods)
def benchmark_graph_8_gpu(self):
"""Test 8 GPUs graph mode."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.data_dir = self.data_dir
FLAGS.batch_size = 128 * 8
FLAGS.train_epochs = 90
FLAGS.epochs_between_evals = 10
FLAGS.model_dir = self._get_model_dir('benchmark_graph_8_gpu')
FLAGS.dtype = 'fp32'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_8_gpu(self):
"""Test FP16 8 GPUs graph mode."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.data_dir = self.data_dir
FLAGS.batch_size = 256 * 8
FLAGS.train_epochs = 90
FLAGS.epochs_between_evals = 10
FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_8_gpu')
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_graph_rewrite_8_gpu(self):
"""Test FP16 graph rewrite 8 GPUs graph mode."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.data_dir = self.data_dir
FLAGS.batch_size = 256 * 8
FLAGS.train_epochs = 90
FLAGS.epochs_between_evals = 10
FLAGS.model_dir = self._get_model_dir(
'benchmark_graph_fp16_graph_rewrite_8_gpu')
FLAGS.dtype = 'fp16'
FLAGS.fp16_implementation = 'graph_rewrite'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def _run_and_report_benchmark(self):
start_time_sec = time.time()
stats = imagenet_main.run_imagenet(flags.FLAGS)
wall_time_sec = time.time() - start_time_sec
self._report_benchmark(stats,
wall_time_sec,
top_1_min=0.762,
top_1_max=0.766)
class Resnet50EstimatorBenchmarkBase(EstimatorBenchmark):
"""Base class for benchmarks for ResNet50 using Estimator."""
local_flags = None
def __init__(self, output_dir=None, default_flags=None):
flag_methods = [imagenet_main.define_imagenet_flags]
super(Resnet50EstimatorBenchmarkBase, self).__init__(
output_dir=output_dir,
default_flags=default_flags,
flag_methods=flag_methods)
def _run_and_report_benchmark(self):
start_time_sec = time.time()
stats = imagenet_main.run_imagenet(FLAGS)
wall_time_sec = time.time() - start_time_sec
print(stats)
# Remove values to skip triggering accuracy check.
stats['eval_results'].pop('accuracy', None)
stats['eval_results'].pop('accuracy_top_5', None)
self._report_benchmark(stats, wall_time_sec)
class Resnet50EstimatorBenchmark(Resnet50EstimatorBenchmarkBase):
"""Benchmarks for ResNet50 using Estimator with 1 worker."""
def __init__(self, output_dir=None, default_flags=None):
super(Resnet50EstimatorBenchmark, self).__init__(
output_dir=output_dir,
default_flags=default_flags)
def benchmark_graph_fp16_1_gpu(self):
"""Benchmarks graph fp16 1 gpu."""
self._setup()
FLAGS.num_gpus = 1
FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_1_gpu')
FLAGS.batch_size = 128
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_1_gpu_tweaked(self):
"""Benchmarks graph fp16 1 gpu tweaked."""
self._setup()
FLAGS.num_gpus = 1
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_1_gpu_tweaked')
FLAGS.batch_size = 256
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_graph_rewrite_1_gpu_tweaked(self):
"""Benchmarks graph fp16 graph rewrite 1 gpu tweaked."""
self._setup()
FLAGS.num_gpus = 1
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.model_dir = self._get_model_dir(
'benchmark_graph_fp16_graph_rewrite_1_gpu_tweaked')
FLAGS.batch_size = 256
FLAGS.dtype = 'fp16'
FLAGS.fp16_implementation = 'graph_rewrite'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_1_gpu(self):
"""Benchmarks graph 1 gpu."""
self._setup()
FLAGS.num_gpus = 1
FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu')
FLAGS.batch_size = 128
FLAGS.dtype = 'fp32'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_8_gpu(self):
"""Benchmarks graph 8 gpus."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.model_dir = self._get_model_dir('benchmark_graph_8_gpu')
FLAGS.batch_size = 128*8
FLAGS.dtype = 'fp32'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_8_gpu(self):
"""Benchmarks graph fp16 8 gpus."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_8_gpu')
FLAGS.batch_size = 256*8
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_8_gpu_tweaked(self):
"""Benchmarks graph fp16 8 gpus tweaked."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_8_gpu_tweaked')
FLAGS.batch_size = 256*8
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_graph_rewrite_8_gpu_tweaked(self):
"""Benchmarks graph fp16 graph rewrite 8 gpus tweaked."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.model_dir = self._get_model_dir(
'benchmark_graph_fp16_graph_rewrite_8_gpu_tweaked')
FLAGS.batch_size = 256*8
FLAGS.dtype = 'fp16'
FLAGS.fp16_implementation = 'graph_rewrite'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
class Resnet50EstimatorBenchmarkSynth(Resnet50EstimatorBenchmark):
"""Resnet50 synthetic benchmark tests."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
def_flags['use_synthetic_data'] = True
def_flags['max_train_steps'] = 110
def_flags['train_epochs'] = 1
super(Resnet50EstimatorBenchmarkSynth, self).__init__(
output_dir=output_dir, default_flags=def_flags)
class Resnet50EstimatorBenchmarkReal(Resnet50EstimatorBenchmark):
"""Resnet50 real data benchmark tests."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
def_flags['data_dir'] = os.path.join(root_data_dir, IMAGENET_DATA_DIR_NAME)
def_flags['max_train_steps'] = 110
def_flags['train_epochs'] = 1
super(Resnet50EstimatorBenchmarkReal, self).__init__(
output_dir=output_dir, default_flags=def_flags)
class Resnet50MultiWorkerEstimatorBenchmark(Resnet50EstimatorBenchmarkBase):
"""Benchmarks for ResNet50 using Estimator with multiple workers."""
def __init__(self, output_dir=None, default_flags=None):
super(Resnet50MultiWorkerEstimatorBenchmark, self).__init__(
output_dir=output_dir,
default_flags=default_flags)
def benchmark_graph_fp16_8_gpu_ring_tweaked(self):
"""Benchmarks graph fp16 8 gpus with ring collective tweaked."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.distribution_strategy = 'multi_worker_mirrored'
FLAGS.all_reduce_alg = 'ring'
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.datasets_num_private_threads = 32
FLAGS.model_dir = self._get_model_dir(
folder_name='benchmark_graph_fp16_8_gpu_ring_tweaked')
FLAGS.batch_size = 256*8
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_8_gpu_nccl_tweaked(self):
"""Benchmarks graph fp16 8 gpus with nccl collective tweaked."""
self._setup()
FLAGS.num_gpus = 8
FLAGS.distribution_strategy = 'multi_worker_mirrored'
FLAGS.all_reduce_alg = 'nccl'
FLAGS.tf_gpu_thread_mode = 'gpu_private'
FLAGS.intra_op_parallelism_threads = 1
FLAGS.datasets_num_private_threads = 32
FLAGS.model_dir = self._get_model_dir(
folder_name='benchmark_graph_fp16_8_gpu_nccl_tweaked')
FLAGS.batch_size = 256*8
FLAGS.dtype = 'fp16'
FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
class Resnet50MultiWorkerEstimatorBenchmarkSynth(
Resnet50MultiWorkerEstimatorBenchmark):
"""ResNet50, multi-worker, Estimator, synthetic data."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
def_flags['use_synthetic_data'] = True
def_flags['max_train_steps'] = 110
def_flags['train_epochs'] = 1
super(Resnet50MultiWorkerEstimatorBenchmarkSynth, self).__init__(
output_dir=output_dir, default_flags=def_flags)
class Resnet56EstimatorAccuracy(EstimatorBenchmark):
"""Accuracy tests for Estimator ResNet56."""
local_flags = None
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
"""A benchmark class.
Args:
output_dir: directory where to output e.g. log files
root_data_dir: directory under which to look for dataset
**kwargs: arbitrary named arguments. This is needed to make the
constructor forward compatible in case PerfZero provides more
named arguments before updating the constructor.
"""
flag_methods = [cifar_main.define_cifar_flags]
self.data_dir = os.path.join(root_data_dir, CIFAR_DATA_DIR_NAME)
super(Resnet56EstimatorAccuracy, self).__init__(
output_dir=output_dir, flag_methods=flag_methods)
def benchmark_graph_1_gpu(self):
"""Test layers model with Estimator and distribution strategies."""
self._setup()
flags.FLAGS.num_gpus = 1
flags.FLAGS.data_dir = self.data_dir
flags.FLAGS.batch_size = 128
flags.FLAGS.train_epochs = 182
flags.FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu')
flags.FLAGS.resnet_size = 56
flags.FLAGS.dtype = 'fp32'
flags.FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_1_gpu(self):
"""Test layers FP16 model with Estimator and distribution strategies."""
self._setup()
flags.FLAGS.num_gpus = 1
flags.FLAGS.data_dir = self.data_dir
flags.FLAGS.batch_size = 128
flags.FLAGS.train_epochs = 182
flags.FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_1_gpu')
flags.FLAGS.resnet_size = 56
flags.FLAGS.dtype = 'fp16'
flags.FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_2_gpu(self):
"""Test layers model with Estimator and dist_strat. 2 GPUs."""
self._setup()
flags.FLAGS.num_gpus = 2
flags.FLAGS.data_dir = self.data_dir
flags.FLAGS.batch_size = 128
flags.FLAGS.train_epochs = 182
flags.FLAGS.model_dir = self._get_model_dir('benchmark_graph_2_gpu')
flags.FLAGS.resnet_size = 56
flags.FLAGS.dtype = 'fp32'
flags.FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def benchmark_graph_fp16_2_gpu(self):
"""Test layers FP16 model with Estimator and dist_strat. 2 GPUs."""
self._setup()
flags.FLAGS.num_gpus = 2
flags.FLAGS.data_dir = self.data_dir
flags.FLAGS.batch_size = 128
flags.FLAGS.train_epochs = 182
flags.FLAGS.model_dir = self._get_model_dir('benchmark_graph_fp16_2_gpu')
flags.FLAGS.resnet_size = 56
flags.FLAGS.dtype = 'fp16'
flags.FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def unit_test(self):
"""A lightweight test that can finish quickly."""
self._setup()
flags.FLAGS.num_gpus = 1
flags.FLAGS.data_dir = self.data_dir
flags.FLAGS.batch_size = 128
flags.FLAGS.train_epochs = 1
flags.FLAGS.model_dir = self._get_model_dir('unit_test')
flags.FLAGS.resnet_size = 8
flags.FLAGS.dtype = 'fp32'
flags.FLAGS.hooks = ['ExamplesPerSecondHook']
self._run_and_report_benchmark()
def _run_and_report_benchmark(self):
"""Executes benchmark and reports result."""
start_time_sec = time.time()
stats = cifar_main.run_cifar(flags.FLAGS)
wall_time_sec = time.time() - start_time_sec
self._report_benchmark(stats,
wall_time_sec,
top_1_min=0.926,
top_1_max=0.938)
@@ -0,0 +1,433 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Runs a ResNet model on the ImageNet dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import datetime
import time
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + '/../../../')
# import pydevd_pycharm
# pydevd_pycharm.settrace('10.174.181.209', port=8008, stdoutToServer=True, stderrToServer=True)
from absl import app as absl_app
from absl import flags
from six.moves import range
import tensorflow as tf
from official.r1.resnet import imagenet_preprocessing
from official.r1.resnet import resnet_model
from official.r1.resnet import resnet_run_loop
from official.utils.flags import core as flags_core
from official.utils.logs import logger
import logging
############## npu modify begin #############
from npu_bridge.estimator import npu_ops
from hccl.manage.api import get_local_rank_id
from hccl.manage.api import get_rank_size
from hccl.manage.api import get_rank_id
from tensorflow.core.protobuf import rewriter_config_pb2
tf.compat.v1.logging.set_verbosity(tf.logging.INFO)
############## npu modify end ###############
DEFAULT_IMAGE_SIZE = 224
NUM_CHANNELS = 3
NUM_CLASSES = 1001
NUM_IMAGES = {
'train': 1281167,
'validation': 50000,
}
_NUM_TRAIN_FILES = 1024
_SHUFFLE_BUFFER = 10000
DATASET_NAME = 'ImageNet'
#log_file1 = 'result/logger_resnet50.log'
#log_file1 = os.path.join(os.path.abspath(os.path.dirname(__file__)),'../../../../result/logger_resnet50.log')
###############################################################################
# Data processing
###############################################################################
def get_filenames(is_training, data_dir):
"""Return filenames for dataset."""
if is_training:
return [
os.path.join(data_dir, 'train-%05d-of-01024' % i)
for i in range(_NUM_TRAIN_FILES)]
else:
return [
os.path.join(data_dir, 'validation-%05d-of-00128' % i)
for i in range(128)]
def _parse_example_proto(example_serialized):
"""Parses an Example proto containing a training example of an image.
The output of the build_image_data.py image preprocessing script is a dataset
containing serialized Example protocol buffers. Each Example proto contains
the following fields (values are included as examples):
image/height: 462
image/width: 581
image/colorspace: 'RGB'
image/channels: 3
image/class/label: 615
image/class/synset: 'n03623198'
image/class/text: 'knee pad'
image/object/bbox/xmin: 0.1
image/object/bbox/xmax: 0.9
image/object/bbox/ymin: 0.2
image/object/bbox/ymax: 0.6
image/object/bbox/label: 615
image/format: 'JPEG'
image/filename: 'ILSVRC2012_val_00041207.JPEG'
image/encoded: <JPEG encoded string>
Args:
example_serialized: scalar Tensor tf.string containing a serialized
Example protocol buffer.
Returns:
image_buffer: Tensor tf.string containing the contents of a JPEG file.
label: Tensor tf.int32 containing the label.
bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged as
[ymin, xmin, ymax, xmax].
"""
# Dense features in Example proto.
feature_map = {
'image/encoded': tf.io.FixedLenFeature([], dtype=tf.string,
default_value=''),
'image/class/label': tf.io.FixedLenFeature([], dtype=tf.int64,
default_value=-1),
'image/class/text': tf.io.FixedLenFeature([], dtype=tf.string,
default_value=''),
}
sparse_float32 = tf.io.VarLenFeature(dtype=tf.float32)
# Sparse features in Example proto.
feature_map.update(
{k: sparse_float32 for k in ['image/object/bbox/xmin',
'image/object/bbox/ymin',
'image/object/bbox/xmax',
'image/object/bbox/ymax']})
features = tf.io.parse_single_example(serialized=example_serialized,
features=feature_map)
label = tf.cast(features['image/class/label'], dtype=tf.int32)
xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0)
ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0)
xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0)
ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0)
# Note that we impose an ordering of (y, x) just to make life difficult.
bbox = tf.concat([ymin, xmin, ymax, xmax], 0)
# Force the variable number of bounding boxes into the shape
# [1, num_boxes, coords].
bbox = tf.expand_dims(bbox, 0)
bbox = tf.transpose(a=bbox, perm=[0, 2, 1])
return features['image/encoded'], label, bbox
def parse_record(raw_record, is_training, dtype):
"""Parses a record containing a training example of an image.
The input record is parsed into a label and image, and the image is passed
through preprocessing steps (cropping, flipping, and so on).
Args:
raw_record: scalar Tensor tf.string containing a serialized
Example protocol buffer.
is_training: A boolean denoting whether the input is for training.
dtype: data type to use for images/features.
Returns:
Tuple with processed image tensor and one-hot-encoded label tensor.
"""
image_buffer, label, bbox = _parse_example_proto(raw_record)
#work_num, root_dir, datatime, resnet_logger = hwlog.env(log_file1)
# add 预处理
#resnet_logger.info("namespace:%s,time_ts:%s, event_type:pre_process_event" % (work_num, date_time))
#resnet_logger.info("namespace:%s,time_ts:%s,event_type:init_start" % (work_num, date_time))
image = imagenet_preprocessing.preprocess_image(
image_buffer=image_buffer,
bbox=bbox,
output_height=DEFAULT_IMAGE_SIZE,
output_width=DEFAULT_IMAGE_SIZE,
num_channels=NUM_CHANNELS,
is_training=is_training)
# resnet_logger.info("namespace:%s,time_ts:%d,event_type:init_end, root_dir:%s" % (work_num, datatime, root_dir))
image = tf.cast(image, dtype)
return image, label
def input_fn(is_training,
data_dir,
batch_size,
num_epochs=1,
dtype=tf.float32,
datasets_num_private_threads=None,
parse_record_fn=parse_record,
input_context=None,
drop_remainder=False,
tf_data_experimental_slack=False):
"""Input function which provides batches for train or eval.
Args:
is_training: A boolean denoting whether the input is for training.
data_dir: The directory containing the input data.
batch_size: The number of samples per batch.
num_epochs: The number of epochs to repeat the dataset.
dtype: Data type to use for images/features
datasets_num_private_threads: Number of private threads for tf.data.
parse_record_fn: Function to use for parsing the records.
input_context: A `tf.distribute.InputContext` object passed in by
`tf.distribute.Strategy`.
drop_remainder: A boolean indicates whether to drop the remainder of the
batches. If True, the batch dimension will be static.
tf_data_experimental_slack: Whether to enable tf.data's
`experimental_slack` option.
Returns:
A dataset that can be used for iteration.
"""
filenames = get_filenames(is_training, data_dir)
dataset = tf.data.Dataset.from_tensor_slices(filenames)
if input_context:
############## npu modify begin #############
dataset = dataset.shard(get_rank_size(),
get_rank_id())
############## npu modify end ###############
if is_training:
# Shuffle the input files
dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES)
# Convert to individual records.
# cycle_length = 10 means that up to 10 files will be read and deserialized in
# parallel. You may want to increase this number if you have a large number of
# CPU cores.
dataset = dataset.interleave(
tf.data.TFRecordDataset,
cycle_length=10,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
return resnet_run_loop.process_record_dataset(
dataset=dataset,
is_training=is_training,
batch_size=batch_size,
shuffle_buffer=_SHUFFLE_BUFFER,
parse_record_fn=parse_record_fn,
num_epochs=num_epochs,
dtype=dtype,
datasets_num_private_threads=datasets_num_private_threads,
drop_remainder=drop_remainder,
tf_data_experimental_slack=tf_data_experimental_slack,
)
def get_synth_input_fn(dtype):
return resnet_run_loop.get_synth_input_fn(
DEFAULT_IMAGE_SIZE, DEFAULT_IMAGE_SIZE, NUM_CHANNELS, NUM_CLASSES,
dtype=dtype)
###############################################################################
# Running the model
###############################################################################
class ImagenetModel(resnet_model.Model):
"""Model class with appropriate defaults for Imagenet data."""
def __init__(self, resnet_size, data_format=None, num_classes=NUM_CLASSES,
resnet_version=resnet_model.DEFAULT_VERSION,
dtype=resnet_model.DEFAULT_DTYPE):
"""These are the parameters that work for Imagenet data.
Args:
resnet_size: The number of convolutional layers needed in the model.
data_format: Either 'channels_first' or 'channels_last', specifying which
data format to use when setting up the model.
num_classes: The number of output classes needed from the model. This
enables users to extend the same model to their own datasets.
resnet_version: Integer representing which version of the ResNet network
to use. See README for details. Valid values: [1, 2]
dtype: The TensorFlow dtype to use for calculations.
"""
# For bigger models, we want to use "bottleneck" layers
if resnet_size < 50:
bottleneck = False
else:
bottleneck = True
super(ImagenetModel, self).__init__(
resnet_size=resnet_size,
bottleneck=bottleneck,
num_classes=num_classes,
num_filters=64,
kernel_size=7,
conv_stride=2,
first_pool_size=3,
first_pool_stride=2,
block_sizes=_get_block_sizes(resnet_size),
block_strides=[1, 2, 2, 2],
resnet_version=resnet_version,
data_format=data_format,
dtype=dtype
)
def _get_block_sizes(resnet_size):
"""Retrieve the size of each block_layer in the ResNet model.
The number of block layers used for the Resnet model varies according
to the size of the model. This helper grabs the layer set we want, throwing
an error if a non-standard size has been selected.
Args:
resnet_size: The number of convolutional layers needed in the model.
Returns:
A list of block sizes to use in building the model.
Raises:
KeyError: if invalid resnet_size is received.
"""
choices = {
18: [2, 2, 2, 2],
34: [3, 4, 6, 3],
50: [3, 4, 6, 3],
101: [3, 4, 23, 3],
152: [3, 8, 36, 3],
200: [3, 24, 36, 3]
}
try:
return choices[resnet_size]
except KeyError:
err = ('Could not find layers for selected Resnet size.\n'
'Size received: {}; sizes allowed: {}.'.format(
resnet_size, list(choices.keys())))
raise ValueError(err)
def imagenet_model_fn(features, labels, mode, params):
"""Our model_fn for ResNet to be used with our Estimator."""
# Warmup and higher lr may not be valid for fine tuning with small batches
# and smaller numbers of training images.
if params['fine_tune']:
warmup = False
base_lr = .1
else:
warmup = True
base_lr = .128
learning_rate_fn = resnet_run_loop.learning_rate_with_decay(
batch_size=params['num_gpus']*params['batch_size'],
batch_denom=256, num_images=NUM_IMAGES['train'],
boundary_epochs=[30, 60, 80, 90], decay_rates=[1, 0.1, 0.01, 0.001, 1e-4],
warmup=warmup, base_lr=base_lr)
return resnet_run_loop.resnet_model_fn(
features=features,
labels=labels,
mode=mode,
model_class=ImagenetModel,
resnet_size=params['resnet_size'],
weight_decay=flags.FLAGS.weight_decay,
learning_rate_fn=learning_rate_fn,
momentum=0.9,
data_format=params['data_format'],
resnet_version=params['resnet_version'],
loss_scale=params['loss_scale'],
loss_filter_fn=None,
dtype=params['dtype'],
fine_tune=params['fine_tune'],
label_smoothing=flags.FLAGS.label_smoothing
)
def define_imagenet_flags():
resnet_run_loop.define_resnet_flags(
resnet_size_choices=['18', '34', '50', '101', '152', '200'],
dynamic_loss_scale=True,
fp16_implementation=True)
flags.adopt_module_key_flags(resnet_run_loop)
flags_core.set_defaults(train_epochs=90)
#Loss scale is defautt used because Davinci core supports mixed precision naturally
flags_core.set_defaults(loss_scale='512')
def run_imagenet(flags_obj):
"""Run ResNet ImageNet training and eval loop.
Args:
flags_obj: An object containing parsed flag values.
Returns:
Dict of results of the run. Contains the keys `eval_results` and
`train_hooks`. `eval_results` contains accuracy (top_1) and
accuracy_top_5. `train_hooks` is a list the instances of hooks used during
training.
"""
input_function = (flags_obj.use_synthetic_data and
get_synth_input_fn(flags_core.get_tf_dtype(flags_obj)) or
input_fn)
result = resnet_run_loop.resnet_main(
flags_obj, imagenet_model_fn, input_function, DATASET_NAME,NUM_IMAGES,
shape=[DEFAULT_IMAGE_SIZE, DEFAULT_IMAGE_SIZE, NUM_CHANNELS],)
return result
def main(flags_obj):
############## npu modify begin #############
# Init NPU ,then can call HCCL Interface
init_sess,npu_init=resnet_run_loop.init_npu()
init_sess.run(npu_init)
# i=1
# while(1):
# i+=1
############## npu modify end ###############
with logger.benchmark_context(flags.FLAGS):
run_imagenet(flags.FLAGS)
def benchmark_main():
absl_app.run(main)
def benchmark_pre():
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
define_imagenet_flags()
if __name__ == '__main__':
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
define_imagenet_flags()
absl_app.run(main)
@@ -0,0 +1,262 @@
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Provides utilities to preprocess images.
Training images are sampled using the provided bounding boxes, and subsequently
cropped to the sampled bounding box. Images are additionally flipped randomly,
then resized to the target output size (without aspect-ratio preservation).
Images used during evaluation are resized (with aspect-ratio preservation) and
centrally cropped.
All images undergo mean color subtraction.
Note that these steps are colloquially referred to as "ResNet preprocessing,"
and they differ from "VGG preprocessing," which does not use bounding boxes
and instead does an aspect-preserving resize followed by random crop during
training. (These both differ from "Inception preprocessing," which introduces
color distortion steps.)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
_R_MEAN = 123.68
_G_MEAN = 116.78
_B_MEAN = 103.94
_CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN]
# The lower bound for the smallest side of the image for aspect-preserving
# resizing. For example, if an image is 500 x 1000, it will be resized to
# _RESIZE_MIN x (_RESIZE_MIN * 2).
_RESIZE_MIN = 256
def _decode_crop_and_flip(image_buffer, bbox, num_channels):
"""Crops the given image to a random part of the image, and randomly flips.
We use the fused decode_and_crop op, which performs better than the two ops
used separately in series, but note that this requires that the image be
passed in as an un-decoded string Tensor.
Args:
image_buffer: scalar string Tensor representing the raw JPEG image buffer.
bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged as
[ymin, xmin, ymax, xmax].
num_channels: Integer depth of the image buffer for decoding.
Returns:
3-D tensor with cropped image.
"""
# A large fraction of image datasets contain a human-annotated bounding box
# delineating the region of the image containing the object of interest. We
# choose to create a new bounding box for the object which is a randomly
# distorted version of the human-annotated bounding box that obeys an
# allowed range of aspect ratios, sizes and overlap with the human-annotated
# bounding box. If no box is supplied, then we assume the bounding box is
# the entire image.
sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
tf.image.extract_jpeg_shape(image_buffer),
bounding_boxes=bbox,
min_object_covered=0.1,
aspect_ratio_range=[0.75, 1.33],
area_range=[0.05, 1.0],
max_attempts=100,
use_image_if_no_bounding_boxes=True)
bbox_begin, bbox_size, _ = sample_distorted_bounding_box
# Reassemble the bounding box in the format the crop op requires.
offset_y, offset_x, _ = tf.unstack(bbox_begin)
target_height, target_width, _ = tf.unstack(bbox_size)
crop_window = tf.stack([offset_y, offset_x, target_height, target_width])
# Use the fused decode and crop op here, which is faster than each in series.
cropped = tf.image.decode_and_crop_jpeg(
image_buffer, crop_window, channels=num_channels)
# Flip to add a little more random distortion in.
cropped = tf.image.random_flip_left_right(cropped)
return cropped
def _central_crop(image, crop_height, crop_width):
"""Performs central crops of the given image list.
Args:
image: a 3-D image tensor
crop_height: the height of the image following the crop.
crop_width: the width of the image following the crop.
Returns:
3-D tensor with cropped image.
"""
shape = tf.shape(input=image)
height, width = shape[0], shape[1]
amount_to_be_cropped_h = (height - crop_height)
crop_top = amount_to_be_cropped_h // 2
amount_to_be_cropped_w = (width - crop_width)
crop_left = amount_to_be_cropped_w // 2
return tf.slice(
image, [crop_top, crop_left, 0], [crop_height, crop_width, -1])
def _mean_image_subtraction(image, means, num_channels):
"""Subtracts the given means from each image channel.
For example:
means = [123.68, 116.779, 103.939]
image = _mean_image_subtraction(image, means)
Note that the rank of `image` must be known.
Args:
image: a tensor of size [height, width, C].
means: a C-vector of values to subtract from each channel.
num_channels: number of color channels in the image that will be distorted.
Returns:
the centered image.
Raises:
ValueError: If the rank of `image` is unknown, if `image` has a rank other
than three or if the number of channels in `image` doesn't match the
number of values in `means`.
"""
if image.get_shape().ndims != 3:
raise ValueError('Input must be of size [height, width, C>0]')
if len(means) != num_channels:
raise ValueError('len(means) must match the number of channels')
# We have a 1-D tensor of means; convert to 3-D.
# Note(b/130245863): we explicitly call `broadcast` instead of simply
# expanding dimensions for better performance.
means = tf.broadcast_to(means, tf.shape(image))
return image - means
def _smallest_size_at_least(height, width, resize_min):
"""Computes new shape with the smallest side equal to `smallest_side`.
Computes new shape with the smallest side equal to `smallest_side` while
preserving the original aspect ratio.
Args:
height: an int32 scalar tensor indicating the current height.
width: an int32 scalar tensor indicating the current width.
resize_min: A python integer or scalar `Tensor` indicating the size of
the smallest side after resize.
Returns:
new_height: an int32 scalar tensor indicating the new height.
new_width: an int32 scalar tensor indicating the new width.
"""
resize_min = tf.cast(resize_min, tf.float32)
# Convert to floats to make subsequent calculations go smoothly.
height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32)
smaller_dim = tf.minimum(height, width)
scale_ratio = resize_min / smaller_dim
# Convert back to ints to make heights and widths that TF ops will accept.
new_height = tf.cast(height * scale_ratio, tf.int32)
new_width = tf.cast(width * scale_ratio, tf.int32)
return new_height, new_width
def _aspect_preserving_resize(image, resize_min):
"""Resize images preserving the original aspect ratio.
Args:
image: A 3-D image `Tensor`.
resize_min: A python integer or scalar `Tensor` indicating the size of
the smallest side after resize.
Returns:
resized_image: A 3-D tensor containing the resized image.
"""
shape = tf.shape(input=image)
height, width = shape[0], shape[1]
new_height, new_width = _smallest_size_at_least(height, width, resize_min)
return _resize_image(image, new_height, new_width)
def _resize_image(image, height, width):
"""Simple wrapper around tf.resize_images.
This is primarily to make sure we use the same `ResizeMethod` and other
details each time.
Args:
image: A 3-D image `Tensor`.
height: The target height for the resized image.
width: The target width for the resized image.
Returns:
resized_image: A 3-D tensor containing the resized image. The first two
dimensions have the shape [height, width].
"""
return tf.compat.v1.image.resize(
image, [height, width], method=tf.image.ResizeMethod.BILINEAR,
align_corners=False)
def preprocess_image(image_buffer, bbox, output_height, output_width,
num_channels, is_training=False):
"""Preprocesses the given image.
Preprocessing includes decoding, cropping, and resizing for both training
and eval images. Training preprocessing, however, introduces some random
distortion of the image to improve accuracy.
Args:
image_buffer: scalar string Tensor representing the raw JPEG image buffer.
bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged as
[ymin, xmin, ymax, xmax].
output_height: The height of the image after preprocessing.
output_width: The width of the image after preprocessing.
num_channels: Integer depth of the image buffer for decoding.
is_training: `True` if we're preprocessing the image for training and
`False` otherwise.
Returns:
A preprocessed image.
"""
if is_training:
# For training, we want to randomize some of the distortions.
image = _decode_crop_and_flip(image_buffer, bbox, num_channels)
image = _resize_image(image, output_height, output_width)
else:
# For validation, we want to decode, resize, then just crop the middle.
image = tf.image.decode_jpeg(image_buffer, channels=num_channels)
image = _aspect_preserving_resize(image, _RESIZE_MIN)
image = _central_crop(image, output_height, output_width)
image.set_shape([output_height, output_width, num_channels])
return _mean_image_subtraction(image, _CHANNEL_MEANS, num_channels)
@@ -0,0 +1,326 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest
import tensorflow as tf # pylint: disable=g-bad-import-order
from official.r1.resnet import imagenet_main
from official.utils.misc import keras_utils
from official.utils.testing import integration
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
_BATCH_SIZE = 32
_LABEL_CLASSES = 1001
class BaseTest(tf.test.TestCase):
_num_validation_images = None
@classmethod
def setUpClass(cls): # pylint: disable=invalid-name
super(BaseTest, cls).setUpClass()
imagenet_main.define_imagenet_flags()
def setUp(self):
super(BaseTest, self).setUp()
if keras_utils.is_v2_0:
tf.compat.v1.disable_eager_execution()
self._num_validation_images = imagenet_main.NUM_IMAGES['validation']
imagenet_main.NUM_IMAGES['validation'] = 4
def tearDown(self):
super(BaseTest, self).tearDown()
tf.io.gfile.rmtree(self.get_temp_dir())
imagenet_main.NUM_IMAGES['validation'] = self._num_validation_images
def _tensor_shapes_helper(self, resnet_size, resnet_version, dtype, with_gpu):
"""Checks the tensor shapes after each phase of the ResNet model."""
def reshape(shape):
"""Returns the expected dimensions depending on if a GPU is being used."""
# If a GPU is used for the test, the shape is returned (already in NCHW
# form). When GPU is not used, the shape is converted to NHWC.
if with_gpu:
return shape
return shape[0], shape[2], shape[3], shape[1]
graph = tf.Graph()
with graph.as_default(), self.test_session(
graph=graph, use_gpu=with_gpu, force_gpu=with_gpu):
model = imagenet_main.ImagenetModel(
resnet_size=resnet_size,
data_format='channels_first' if with_gpu else 'channels_last',
resnet_version=resnet_version,
dtype=dtype
)
inputs = tf.random.uniform([1, 224, 224, 3])
output = model(inputs, training=True)
initial_conv = graph.get_tensor_by_name('resnet_model/initial_conv:0')
max_pool = graph.get_tensor_by_name('resnet_model/initial_max_pool:0')
block_layer1 = graph.get_tensor_by_name('resnet_model/block_layer1:0')
block_layer2 = graph.get_tensor_by_name('resnet_model/block_layer2:0')
block_layer3 = graph.get_tensor_by_name('resnet_model/block_layer3:0')
block_layer4 = graph.get_tensor_by_name('resnet_model/block_layer4:0')
reduce_mean = graph.get_tensor_by_name('resnet_model/final_reduce_mean:0')
dense = graph.get_tensor_by_name('resnet_model/final_dense:0')
self.assertAllEqual(initial_conv.shape, reshape((1, 64, 112, 112)))
self.assertAllEqual(max_pool.shape, reshape((1, 64, 56, 56)))
# The number of channels after each block depends on whether we're
# using the building_block or the bottleneck_block.
if resnet_size < 50:
self.assertAllEqual(block_layer1.shape, reshape((1, 64, 56, 56)))
self.assertAllEqual(block_layer2.shape, reshape((1, 128, 28, 28)))
self.assertAllEqual(block_layer3.shape, reshape((1, 256, 14, 14)))
self.assertAllEqual(block_layer4.shape, reshape((1, 512, 7, 7)))
self.assertAllEqual(reduce_mean.shape, reshape((1, 512, 1, 1)))
else:
self.assertAllEqual(block_layer1.shape, reshape((1, 256, 56, 56)))
self.assertAllEqual(block_layer2.shape, reshape((1, 512, 28, 28)))
self.assertAllEqual(block_layer3.shape, reshape((1, 1024, 14, 14)))
self.assertAllEqual(block_layer4.shape, reshape((1, 2048, 7, 7)))
self.assertAllEqual(reduce_mean.shape, reshape((1, 2048, 1, 1)))
self.assertAllEqual(dense.shape, (1, _LABEL_CLASSES))
self.assertAllEqual(output.shape, (1, _LABEL_CLASSES))
def tensor_shapes_helper(self, resnet_size, resnet_version, with_gpu=False):
self._tensor_shapes_helper(resnet_size=resnet_size,
resnet_version=resnet_version,
dtype=tf.float32, with_gpu=with_gpu)
self._tensor_shapes_helper(resnet_size=resnet_size,
resnet_version=resnet_version,
dtype=tf.float16, with_gpu=with_gpu)
def test_tensor_shapes_resnet_18_v1(self):
self.tensor_shapes_helper(18, resnet_version=1)
def test_tensor_shapes_resnet_18_v2(self):
self.tensor_shapes_helper(18, resnet_version=2)
def test_tensor_shapes_resnet_34_v1(self):
self.tensor_shapes_helper(34, resnet_version=1)
def test_tensor_shapes_resnet_34_v2(self):
self.tensor_shapes_helper(34, resnet_version=2)
def test_tensor_shapes_resnet_50_v1(self):
self.tensor_shapes_helper(50, resnet_version=1)
def test_tensor_shapes_resnet_50_v2(self):
self.tensor_shapes_helper(50, resnet_version=2)
def test_tensor_shapes_resnet_101_v1(self):
self.tensor_shapes_helper(101, resnet_version=1)
def test_tensor_shapes_resnet_101_v2(self):
self.tensor_shapes_helper(101, resnet_version=2)
def test_tensor_shapes_resnet_152_v1(self):
self.tensor_shapes_helper(152, resnet_version=1)
def test_tensor_shapes_resnet_152_v2(self):
self.tensor_shapes_helper(152, resnet_version=2)
def test_tensor_shapes_resnet_200_v1(self):
self.tensor_shapes_helper(200, resnet_version=1)
def test_tensor_shapes_resnet_200_v2(self):
self.tensor_shapes_helper(200, resnet_version=2)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_18_with_gpu_v1(self):
self.tensor_shapes_helper(18, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_18_with_gpu_v2(self):
self.tensor_shapes_helper(18, resnet_version=2, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_34_with_gpu_v1(self):
self.tensor_shapes_helper(34, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_34_with_gpu_v2(self):
self.tensor_shapes_helper(34, resnet_version=2, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_50_with_gpu_v1(self):
self.tensor_shapes_helper(50, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_50_with_gpu_v2(self):
self.tensor_shapes_helper(50, resnet_version=2, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_101_with_gpu_v1(self):
self.tensor_shapes_helper(101, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_101_with_gpu_v2(self):
self.tensor_shapes_helper(101, resnet_version=2, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_152_with_gpu_v1(self):
self.tensor_shapes_helper(152, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_152_with_gpu_v2(self):
self.tensor_shapes_helper(152, resnet_version=2, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_200_with_gpu_v1(self):
self.tensor_shapes_helper(200, resnet_version=1, with_gpu=True)
@unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU')
def test_tensor_shapes_resnet_200_with_gpu_v2(self):
self.tensor_shapes_helper(200, resnet_version=2, with_gpu=True)
def resnet_model_fn_helper(self, mode, resnet_version, dtype):
"""Tests that the EstimatorSpec is given the appropriate arguments."""
tf.compat.v1.train.create_global_step()
input_fn = imagenet_main.get_synth_input_fn(dtype)
dataset = input_fn(True, '', _BATCH_SIZE)
iterator = tf.compat.v1.data.make_initializable_iterator(dataset)
features, labels = iterator.get_next()
spec = imagenet_main.imagenet_model_fn(
features, labels, mode, {
'dtype': dtype,
'resnet_size': 50,
'data_format': 'channels_last',
'batch_size': _BATCH_SIZE,
'resnet_version': resnet_version,
'loss_scale': 128 if dtype == tf.float16 else 1,
'fine_tune': False,
})
predictions = spec.predictions
self.assertAllEqual(predictions['probabilities'].shape,
(_BATCH_SIZE, _LABEL_CLASSES))
self.assertEqual(predictions['probabilities'].dtype, tf.float32)
self.assertAllEqual(predictions['classes'].shape, (_BATCH_SIZE,))
self.assertEqual(predictions['classes'].dtype, tf.int64)
if mode != tf.estimator.ModeKeys.PREDICT:
loss = spec.loss
self.assertAllEqual(loss.shape, ())
self.assertEqual(loss.dtype, tf.float32)
if mode == tf.estimator.ModeKeys.EVAL:
eval_metric_ops = spec.eval_metric_ops
self.assertAllEqual(eval_metric_ops['accuracy'][0].shape, ())
self.assertAllEqual(eval_metric_ops['accuracy'][1].shape, ())
self.assertEqual(eval_metric_ops['accuracy'][0].dtype, tf.float32)
self.assertEqual(eval_metric_ops['accuracy'][1].dtype, tf.float32)
def test_resnet_model_fn_train_mode_v1(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.TRAIN, resnet_version=1,
dtype=tf.float32)
def test_resnet_model_fn_train_mode_v2(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.TRAIN, resnet_version=2,
dtype=tf.float32)
def test_resnet_model_fn_eval_mode_v1(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.EVAL, resnet_version=1,
dtype=tf.float32)
def test_resnet_model_fn_eval_mode_v2(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.EVAL, resnet_version=2,
dtype=tf.float32)
def test_resnet_model_fn_predict_mode_v1(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.PREDICT, resnet_version=1,
dtype=tf.float32)
def test_resnet_model_fn_predict_mode_v2(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.PREDICT, resnet_version=2,
dtype=tf.float32)
def _test_imagenetmodel_shape(self, resnet_version):
batch_size = 135
num_classes = 246
model = imagenet_main.ImagenetModel(
50, data_format='channels_last', num_classes=num_classes,
resnet_version=resnet_version)
fake_input = tf.random.uniform([batch_size, 224, 224, 3])
output = model(fake_input, training=True)
self.assertAllEqual(output.shape, (batch_size, num_classes))
def test_imagenetmodel_shape_v1(self):
self._test_imagenetmodel_shape(resnet_version=1)
def test_imagenetmodel_shape_v2(self):
self._test_imagenetmodel_shape(resnet_version=2)
def test_imagenet_end_to_end_synthetic_v1(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '1', '-batch_size', '4',
'--max_train_steps', '1']
)
def test_imagenet_end_to_end_synthetic_v2(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '2', '-batch_size', '4',
'--max_train_steps', '1']
)
def test_imagenet_end_to_end_synthetic_v1_tiny(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '1', '-batch_size', '4',
'-resnet_size', '18', '--max_train_steps', '1']
)
def test_imagenet_end_to_end_synthetic_v2_tiny(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '2', '-batch_size', '4',
'-resnet_size', '18', '--max_train_steps', '1']
)
def test_imagenet_end_to_end_synthetic_v1_huge(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '1', '-batch_size', '4',
'-resnet_size', '200', '--max_train_steps', '1']
)
def test_imagenet_end_to_end_synthetic_v2_huge(self):
integration.run_synthetic(
main=imagenet_main.run_imagenet, tmp_root=self.get_temp_dir(),
extra_flags=['-resnet_version', '2', '-batch_size', '4',
'-resnet_size', '200', '--max_train_steps', '1']
)
if __name__ == '__main__':
tf.test.main()
@@ -0,0 +1,552 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains definitions for Residual Networks.
Residual networks ('v1' ResNets) were originally proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
The full preactivation 'v2' ResNet variant was introduced by:
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
The key difference of the full preactivation 'v2' variant compared to the
'v1' variant in [1] is the use of batch normalization before every weight layer
rather than after.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
_BATCH_NORM_DECAY = 0.997
_BATCH_NORM_EPSILON = 1e-5
DEFAULT_VERSION = 2
DEFAULT_DTYPE = tf.float32
CASTABLE_TYPES = (tf.float16,)
ALLOWED_TYPES = (DEFAULT_DTYPE,) + CASTABLE_TYPES
################################################################################
# Convenience functions for building the ResNet model.
################################################################################
def batch_norm(inputs, training, data_format):
"""Performs a batch normalization using a standard set of parameters."""
# We set fused=True for a significant performance boost. See
# https://www.tensorflow.org/performance/performance_guide#common_fused_ops
return tf.compat.v1.layers.batch_normalization(
inputs=inputs, axis=1 if data_format == 'channels_first' else 3,
momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON, center=True,
scale=True, training=training, fused=True)
def fixed_padding(inputs, kernel_size, data_format):
"""Pads the input along the spatial dimensions independently of input size.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
Should be a positive integer.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
A tensor with the same format as the input with the data either intact
(if kernel_size == 1) or padded (if kernel_size > 1).
"""
pad_total = kernel_size - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
if data_format == 'channels_first':
padded_inputs = tf.pad(tensor=inputs,
paddings=[[0, 0], [0, 0], [pad_beg, pad_end],
[pad_beg, pad_end]])
else:
padded_inputs = tf.pad(tensor=inputs,
paddings=[[0, 0], [pad_beg, pad_end],
[pad_beg, pad_end], [0, 0]])
return padded_inputs
def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format):
"""Strided 2-D convolution with explicit padding."""
# The padding is consistent and is based only on `kernel_size`, not on the
# dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone).
if strides > 1:
inputs = fixed_padding(inputs, kernel_size, data_format)
return tf.compat.v1.layers.conv2d(
inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,
padding=('SAME' if strides == 1 else 'VALID'), use_bias=False,
kernel_initializer=tf.compat.v1.variance_scaling_initializer(),
data_format=data_format)
################################################################################
# ResNet block definitions.
################################################################################
def _building_block_v1(inputs, filters, training, projection_shortcut, strides,
data_format):
"""A single block for ResNet v1, without a bottleneck.
Convolution then batch normalization then ReLU as described by:
Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the convolutions.
training: A Boolean for whether the model is in training or inference
mode. Needed for batch normalization.
projection_shortcut: The function to use for projection shortcuts
(typically a 1x1 convolution when downsampling the input).
strides: The block's stride. If greater than 1, this block will ultimately
downsample the input.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block; shape should match inputs.
"""
shortcut = inputs
if projection_shortcut is not None:
shortcut = projection_shortcut(inputs)
shortcut = batch_norm(inputs=shortcut, training=training,
data_format=data_format)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=strides,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs += shortcut
inputs = tf.nn.relu(inputs)
return inputs
def _building_block_v2(inputs, filters, training, projection_shortcut, strides,
data_format):
"""A single block for ResNet v2, without a bottleneck.
Batch normalization then ReLu then convolution as described by:
Identity Mappings in Deep Residual Networks
https://arxiv.org/pdf/1603.05027.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the convolutions.
training: A Boolean for whether the model is in training or inference
mode. Needed for batch normalization.
projection_shortcut: The function to use for projection shortcuts
(typically a 1x1 convolution when downsampling the input).
strides: The block's stride. If greater than 1, this block will ultimately
downsample the input.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block; shape should match inputs.
"""
shortcut = inputs
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
# The projection shortcut should come after the first batch norm and ReLU
# since it performs a 1x1 convolution.
if projection_shortcut is not None:
shortcut = projection_shortcut(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=strides,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=1,
data_format=data_format)
return inputs + shortcut
def _bottleneck_block_v1(inputs, filters, training, projection_shortcut,
strides, data_format):
"""A single block for ResNet v1, with a bottleneck.
Similar to _building_block_v1(), except using the "bottleneck" blocks
described in:
Convolution then batch normalization then ReLU as described by:
Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the convolutions.
training: A Boolean for whether the model is in training or inference
mode. Needed for batch normalization.
projection_shortcut: The function to use for projection shortcuts
(typically a 1x1 convolution when downsampling the input).
strides: The block's stride. If greater than 1, this block will ultimately
downsample the input.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block; shape should match inputs.
"""
shortcut = inputs
if projection_shortcut is not None:
shortcut = projection_shortcut(inputs)
shortcut = batch_norm(inputs=shortcut, training=training,
data_format=data_format)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=1, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=strides,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs += shortcut
inputs = tf.nn.relu(inputs)
return inputs
def _bottleneck_block_v2(inputs, filters, training, projection_shortcut,
strides, data_format):
"""A single block for ResNet v2, with a bottleneck.
Similar to _building_block_v2(), except using the "bottleneck" blocks
described in:
Convolution then batch normalization then ReLU as described by:
Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Adapted to the ordering conventions of:
Batch normalization then ReLu then convolution as described by:
Identity Mappings in Deep Residual Networks
https://arxiv.org/pdf/1603.05027.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the convolutions.
training: A Boolean for whether the model is in training or inference
mode. Needed for batch normalization.
projection_shortcut: The function to use for projection shortcuts
(typically a 1x1 convolution when downsampling the input).
strides: The block's stride. If greater than 1, this block will ultimately
downsample the input.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block; shape should match inputs.
"""
shortcut = inputs
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
# The projection shortcut should come after the first batch norm and ReLU
# since it performs a 1x1 convolution.
if projection_shortcut is not None:
shortcut = projection_shortcut(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=1, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=strides,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
data_format=data_format)
return inputs + shortcut
def block_layer(inputs, filters, bottleneck, block_fn, blocks, strides,
training, name, data_format):
"""Creates one layer of blocks for the ResNet model.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the first convolution of the layer.
bottleneck: Is the block created a bottleneck block.
block_fn: The block to use within the model, either `building_block` or
`bottleneck_block`.
blocks: The number of blocks contained in the layer.
strides: The stride to use for the first convolution of the layer. If
greater than 1, this layer will ultimately downsample the input.
training: Either True or False, whether we are currently training the
model. Needed for batch norm.
name: A string name for the tensor output of the block layer.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block layer.
"""
# Bottleneck blocks end with 4x the number of filters as they start with
filters_out = filters * 4 if bottleneck else filters
def projection_shortcut(inputs):
return conv2d_fixed_padding(
inputs=inputs, filters=filters_out, kernel_size=1, strides=strides,
data_format=data_format)
# Only the first block per block_layer uses projection_shortcut and strides
inputs = block_fn(inputs, filters, training, projection_shortcut, strides,
data_format)
for _ in range(1, blocks):
inputs = block_fn(inputs, filters, training, None, 1, data_format)
return tf.identity(inputs, name)
class Model(object):
"""Base class for building the Resnet Model."""
def __init__(self, resnet_size, bottleneck, num_classes, num_filters,
kernel_size,
conv_stride, first_pool_size, first_pool_stride,
block_sizes, block_strides,
resnet_version=DEFAULT_VERSION, data_format=None,
dtype=DEFAULT_DTYPE):
"""Creates a model for classifying an image.
Args:
resnet_size: A single integer for the size of the ResNet model.
bottleneck: Use regular blocks or bottleneck blocks.
num_classes: The number of classes used as labels.
num_filters: The number of filters to use for the first block layer
of the model. This number is then doubled for each subsequent block
layer.
kernel_size: The kernel size to use for convolution.
conv_stride: stride size for the initial convolutional layer
first_pool_size: Pool size to be used for the first pooling layer.
If none, the first pooling layer is skipped.
first_pool_stride: stride size for the first pooling layer. Not used
if first_pool_size is None.
block_sizes: A list containing n values, where n is the number of sets of
block layers desired. Each value should be the number of blocks in the
i-th set.
block_strides: List of integers representing the desired stride size for
each of the sets of block layers. Should be same length as block_sizes.
resnet_version: Integer representing which version of the ResNet network
to use. See README for details. Valid values: [1, 2]
data_format: Input format ('channels_last', 'channels_first', or None).
If set to None, the format is dependent on whether a GPU is available.
dtype: The TensorFlow dtype to use for calculations. If not specified
tf.float32 is used.
Raises:
ValueError: if invalid version is selected.
"""
self.resnet_size = resnet_size
if not data_format:
data_format = (
'channels_first' if tf.test.is_built_with_cuda() else 'channels_last')
self.resnet_version = resnet_version
if resnet_version not in (1, 2):
raise ValueError(
'Resnet version should be 1 or 2. See README for citations.')
self.bottleneck = bottleneck
if bottleneck:
if resnet_version == 1:
self.block_fn = _bottleneck_block_v1
else:
self.block_fn = _bottleneck_block_v2
else:
if resnet_version == 1:
self.block_fn = _building_block_v1
else:
self.block_fn = _building_block_v2
if dtype not in ALLOWED_TYPES:
raise ValueError('dtype must be one of: {}'.format(ALLOWED_TYPES))
self.data_format = data_format
self.num_classes = num_classes
self.num_filters = num_filters
self.kernel_size = kernel_size
self.conv_stride = conv_stride
self.first_pool_size = first_pool_size
self.first_pool_stride = first_pool_stride
self.block_sizes = block_sizes
self.block_strides = block_strides
self.dtype = dtype
self.pre_activation = resnet_version == 2
def _custom_dtype_getter(self, getter, name, shape=None, dtype=DEFAULT_DTYPE,
*args, **kwargs):
"""Creates variables in fp32, then casts to fp16 if necessary.
This function is a custom getter. A custom getter is a function with the
same signature as tf.get_variable, except it has an additional getter
parameter. Custom getters can be passed as the `custom_getter` parameter of
tf.variable_scope. Then, tf.get_variable will call the custom getter,
instead of directly getting a variable itself. This can be used to change
the types of variables that are retrieved with tf.get_variable.
The `getter` parameter is the underlying variable getter, that would have
been called if no custom getter was used. Custom getters typically get a
variable with `getter`, then modify it in some way.
This custom getter will create an fp32 variable. If a low precision
(e.g. float16) variable was requested it will then cast the variable to the
requested dtype. The reason we do not directly create variables in low
precision dtypes is that applying small gradients to such variables may
cause the variable not to change.
Args:
getter: The underlying variable getter, that has the same signature as
tf.get_variable and returns a variable.
name: The name of the variable to get.
shape: The shape of the variable to get.
dtype: The dtype of the variable to get. Note that if this is a low
precision dtype, the variable will be created as a tf.float32 variable,
then cast to the appropriate dtype
*args: Additional arguments to pass unmodified to getter.
**kwargs: Additional keyword arguments to pass unmodified to getter.
Returns:
A variable which is cast to fp16 if necessary.
"""
if dtype in CASTABLE_TYPES:
var = getter(name, shape, tf.float32, *args, **kwargs)
return tf.cast(var, dtype=dtype, name=name + '_cast')
else:
return getter(name, shape, dtype, *args, **kwargs)
def _model_variable_scope(self):
"""Returns a variable scope that the model should be created under.
If self.dtype is a castable type, model variable will be created in fp32
then cast to self.dtype before being used.
Returns:
A variable scope for the model.
"""
return tf.compat.v1.variable_scope('resnet_model',
custom_getter=self._custom_dtype_getter)
def __call__(self, inputs, training):
"""Add operations to classify a batch of input images.
Args:
inputs: A Tensor representing a batch of input images.
training: A boolean. Set to True to add operations required only when
training the classifier.
Returns:
A logits Tensor with shape [<batch_size>, self.num_classes].
"""
with self._model_variable_scope():
if self.data_format == 'channels_first':
# Convert the inputs from channels_last (NHWC) to channels_first (NCHW).
# This provides a large performance boost on GPU. See
# https://www.tensorflow.org/performance/performance_guide#data_formats
inputs = tf.transpose(a=inputs, perm=[0, 3, 1, 2])
inputs = conv2d_fixed_padding(
inputs=inputs, filters=self.num_filters, kernel_size=self.kernel_size,
strides=self.conv_stride, data_format=self.data_format)
inputs = tf.identity(inputs, 'initial_conv')
# We do not include batch normalization or activation functions in V2
# for the initial conv1 because the first ResNet unit will perform these
# for both the shortcut and non-shortcut paths as part of the first
# block's projection. Cf. Appendix of [2].
if self.resnet_version == 1:
inputs = batch_norm(inputs, training, self.data_format)
inputs = tf.nn.relu(inputs)
if self.first_pool_size:
############## npu modify begin #############
#max_pooling2d is replaced by max_pool_with_argmax for better performance
inputs,argmax = tf.compat.v1.nn.max_pool_with_argmax(
input=inputs, ksize=(1,self.first_pool_size,self.first_pool_size,1),
strides=(1,self.first_pool_stride,self.first_pool_stride,1), padding='SAME',
data_format='NCHW' if self.data_format == 'channels_first' else 'NHWC')
############## npu modify end ###############
inputs = tf.identity(inputs, 'initial_max_pool')
for i, num_blocks in enumerate(self.block_sizes):
num_filters = self.num_filters * (2**i)
inputs = block_layer(
inputs=inputs, filters=num_filters, bottleneck=self.bottleneck,
block_fn=self.block_fn, blocks=num_blocks,
strides=self.block_strides[i], training=training,
name='block_layer{}'.format(i + 1), data_format=self.data_format)
# Only apply the BN and ReLU for model that does pre_activation in each
# building/bottleneck block, eg resnet V2.
if self.pre_activation:
inputs = batch_norm(inputs, training, self.data_format)
inputs = tf.nn.relu(inputs)
# The current top layer has shape
# `batch_size x pool_size x pool_size x final_size`.
# ResNet does an Average Pooling layer over pool_size,
# but that is the same as doing a reduce_mean. We do a reduce_mean
# here because it performs better than AveragePooling2D.
axes = [2, 3] if self.data_format == 'channels_first' else [1, 2]
inputs = tf.reduce_mean(input_tensor=inputs, axis=axes, keepdims=True)
inputs = tf.identity(inputs, 'final_reduce_mean')
inputs = tf.squeeze(inputs, axes)
inputs = tf.compat.v1.layers.dense(inputs=inputs, units=self.num_classes)
inputs = tf.identity(inputs, 'final_dense')
return inputs
@@ -0,0 +1,979 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains utility and supporting functions for ResNet.
This module contains ResNet code which does not directly build layers. This
includes dataset management, hyperparameter and optimizer code, and argument
parsing. Code for defining the ResNet layers can be found in resnet_model.py.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import math
import multiprocessing
import os
import datetime
import time
from absl import flags
import tensorflow as tf
import logging
import sys
############## npu modify begin #############
from npu_bridge.estimator.npu.npu_config import NPURunConfig
from npu_bridge.estimator.npu.npu_estimator import NPUEstimator
from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
from npu_bridge.estimator import npu_ops
from npu_bridge.hccl import hccl_ops
from hccl.manage.api import get_local_rank_id
from hccl.manage.api import get_rank_size
from hccl.manage.api import get_rank_id
from tensorflow.core.protobuf import rewriter_config_pb2
############## npu modify end ###############
from official.r1.resnet import imagenet_preprocessing
from official.r1.resnet import resnet_model
from official.r1.utils import export
from official.utils.flags import core as flags_core
from official.utils.logs import hooks_helper
from official.utils.logs import logger
from official.utils.misc import distribution_utils
from official.utils.misc import model_helpers
from benchmark_log import hwlog
################################################################################
# Functions for input processing.
################################################################################
def process_record_dataset(dataset,
is_training,
batch_size,
shuffle_buffer,
parse_record_fn,
num_epochs=1,
dtype=tf.float32,
datasets_num_private_threads=None,
drop_remainder=False,
tf_data_experimental_slack=False):
"""Given a Dataset with raw records, return an iterator over the records.
Args:
dataset: A Dataset representing raw records
is_training: A boolean denoting whether the input is for training.
batch_size: The number of samples per batch.
shuffle_buffer: The buffer size to use when shuffling records. A larger
value results in better randomness, but smaller values reduce startup
time and use less memory.
parse_record_fn: A function that takes a raw record and returns the
corresponding (image, label) pair.
num_epochs: The number of epochs to repeat the dataset.
dtype: Data type to use for images/features.
datasets_num_private_threads: Number of threads for a private
threadpool created for all datasets computation.
drop_remainder: A boolean indicates whether to drop the remainder of the
batches. If True, the batch dimension will be static.
tf_data_experimental_slack: Whether to enable tf.data's
`experimental_slack` option.
Returns:
Dataset of (image, label) pairs ready for iteration.
"""
# Defines a specific size thread pool for tf.data operations.
if datasets_num_private_threads:
options = tf.data.Options()
options.experimental_threading.private_threadpool_size = (
datasets_num_private_threads)
dataset = dataset.with_options(options)
tf.compat.v1.logging.info('datasets_num_private_threads: %s',
datasets_num_private_threads)
# Disable intra-op parallelism to optimize for throughput instead of latency.
options = tf.data.Options()
options.experimental_threading.max_intra_op_parallelism = 1
dataset = dataset.with_options(options)
# Prefetches a batch at a time to smooth out the time taken to load input
# files for shuffling and processing.
dataset = dataset.prefetch(buffer_size=batch_size)
if is_training:
# Shuffles records before repeating to respect epoch boundaries.
dataset = dataset.shuffle(buffer_size=shuffle_buffer)
# Repeats the dataset for the number of epochs to train.
#dataset = dataset.repeat(num_epochs)
dataset = dataset.repeat()
# Parses the raw records into images and labels.
dataset = dataset.map(
lambda value: parse_record_fn(value, is_training, dtype),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
# Operations between the final prefetch and the get_next call to the iterator
# will happen synchronously during run time. We prefetch here again to
# background all of the above processing work and keep it out of the
# critical training path. Setting buffer_size to tf.data.experimental.AUTOTUNE
# allows DistributionStrategies to adjust how many batches to fetch based
# on how many devices are present.
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
if tf_data_experimental_slack:
options = tf.data.Options()
options.experimental_slack = True
dataset = dataset.with_options(options)
return dataset
def get_synth_input_fn(height, width, num_channels, num_classes,
dtype=tf.float32):
"""Returns an input function that returns a dataset with random data.
This input_fn returns a data set that iterates over a set of random data and
bypasses all preprocessing, e.g. jpeg decode and copy. The host to device
copy is still included. This used to find the upper throughput bound when
tunning the full input pipeline.
Args:
height: Integer height that will be used to create a fake image tensor.
width: Integer width that will be used to create a fake image tensor.
num_channels: Integer depth that will be used to create a fake image tensor.
num_classes: Number of classes that should be represented in the fake labels
tensor
dtype: Data type for features/images.
Returns:
An input_fn that can be used in place of a real one to return a dataset
that can be used for iteration.
"""
# pylint: disable=unused-argument
def input_fn(is_training, data_dir, batch_size, *args, **kwargs):
"""Returns dataset filled with random data."""
# Synthetic input should be within [0, 255].
inputs = tf.random.truncated_normal(
[batch_size] + [height, width, num_channels],
dtype=dtype,
mean=127,
stddev=60,
name='synthetic_inputs')
labels = tf.random.uniform(
[batch_size],
minval=0,
maxval=num_classes - 1,
dtype=tf.int32,
name='synthetic_labels')
data = tf.data.Dataset.from_tensors((inputs, labels)).repeat()
data = data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return data
return input_fn
def image_bytes_serving_input_fn(image_shape, dtype=tf.float32):
"""Serving input fn for raw jpeg images."""
def _preprocess_image(image_bytes):
"""Preprocess a single raw image."""
# Bounding box around the whole image.
bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=dtype, shape=[1, 1, 4])
height, width, num_channels = image_shape
image = imagenet_preprocessing.preprocess_image(
image_bytes, bbox, height, width, num_channels, is_training=False)
return image
image_bytes_list = tf.compat.v1.placeholder(
shape=[None], dtype=tf.string, name='input_tensor')
images = tf.map_fn(
_preprocess_image, image_bytes_list, back_prop=False, dtype=dtype)
return tf.estimator.export.TensorServingInputReceiver(
images, {'image_bytes': image_bytes_list})
def override_flags_and_set_envars_for_gpu_thread_pool(flags_obj):
"""Override flags and set env_vars for performance.
These settings exist to test the difference between using stock settings
and manual tuning. It also shows some of the ENV_VARS that can be tweaked to
squeeze a few extra examples per second. These settings are defaulted to the
current platform of interest, which changes over time.
On systems with small numbers of cpu cores, e.g. under 8 logical cores,
setting up a gpu thread pool with `tf_gpu_thread_mode=gpu_private` may perform
poorly.
Args:
flags_obj: Current flags, which will be adjusted possibly overriding
what has been set by the user on the command-line.
"""
cpu_count = multiprocessing.cpu_count()
tf.compat.v1.logging.info('Logical CPU cores: %s', cpu_count)
# Sets up thread pool for each GPU for op scheduling.
per_gpu_thread_count = 1
total_gpu_thread_count = per_gpu_thread_count * flags_obj.num_gpus
os.environ['TF_GPU_THREAD_MODE'] = flags_obj.tf_gpu_thread_mode
os.environ['TF_GPU_THREAD_COUNT'] = str(per_gpu_thread_count)
tf.compat.v1.logging.info('TF_GPU_THREAD_COUNT: %s',
os.environ['TF_GPU_THREAD_COUNT'])
tf.compat.v1.logging.info('TF_GPU_THREAD_MODE: %s',
os.environ['TF_GPU_THREAD_MODE'])
# Reduces general thread pool by number of threads used for GPU pool.
main_thread_count = cpu_count - total_gpu_thread_count
flags_obj.inter_op_parallelism_threads = main_thread_count
# Sets thread count for tf.data. Logical cores minus threads assign to the
# private GPU pool along with 2 thread per GPU for event monitoring and
# sending / receiving tensors.
num_monitoring_threads = 2 * flags_obj.num_gpus
flags_obj.datasets_num_private_threads = (cpu_count - total_gpu_thread_count
- num_monitoring_threads)
################################################################################
# Functions for running training/eval/validation loops for the model.
################################################################################
def learning_rate_with_decay(
batch_size, batch_denom, num_images, boundary_epochs, decay_rates,
base_lr=0.1, warmup=False):
"""Get a learning rate that decays step-wise as training progresses.
Args:
batch_size: the number of examples processed in each training batch.
batch_denom: this value will be used to scale the base learning rate.
`0.1 * batch size` is divided by this number, such that when
batch_denom == batch_size, the initial learning rate will be 0.1.
num_images: total number of images that will be used for training.
boundary_epochs: list of ints representing the epochs at which we
decay the learning rate.
decay_rates: list of floats representing the decay rates to be used
for scaling the learning rate. It should have one more element
than `boundary_epochs`, and all elements should have the same type.
base_lr: Initial learning rate scaled based on batch_denom.
warmup: Run a 5 epoch warmup to the initial lr.
Returns:
Returns a function that takes a single argument - the number of batches
trained so far (global_step)- and returns the learning rate to be used
for training the next batch.
"""
initial_learning_rate = base_lr * batch_size / batch_denom
batches_per_epoch = num_images / batch_size
# Reduce the learning rate at certain epochs.
# CIFAR-10: divide by 10 at epoch 100, 150, and 200
# ImageNet: divide by 10 at epoch 30, 60, 80, and 90
boundaries = [int(batches_per_epoch * epoch) for epoch in boundary_epochs]
vals = [initial_learning_rate * decay for decay in decay_rates]
def learning_rate_fn(global_step):
"""Builds scaled learning rate function with 5 epoch warm up."""
############## npu modify begin #############
#Using int32 for better computing performance
global_step=tf.cast(global_step,tf.int32)
############## npu modify end ###############
lr = tf.compat.v1.train.piecewise_constant(global_step, boundaries, vals)
if warmup:
warmup_steps = int(batches_per_epoch * 5)
warmup_lr = (
initial_learning_rate * tf.cast(global_step, tf.float32) / tf.cast(
warmup_steps, tf.float32))
return tf.cond(pred=global_step < warmup_steps,
true_fn=lambda: warmup_lr,
false_fn=lambda: lr)
return lr
def poly_rate_fn(global_step):
"""Handles linear scaling rule, gradual warmup, and LR decay.
The learning rate starts at 0, then it increases linearly per step. After
FLAGS.poly_warmup_epochs, we reach the base learning rate (scaled to account
for batch size). The learning rate is then decayed using a polynomial rate
decay schedule with power 2.0.
Args:
global_step: the current global_step
Returns:
returns the current learning rate
"""
# Learning rate schedule for LARS polynomial schedule
if flags.FLAGS.batch_size < 8192:
plr = 5.0
w_epochs = 5
elif flags.FLAGS.batch_size < 16384:
plr = 10.0
w_epochs = 5
elif flags.FLAGS.batch_size < 32768:
plr = 25.0
w_epochs = 5
else:
plr = 32.0
w_epochs = 14
w_steps = int(w_epochs * batches_per_epoch)
wrate = (plr * tf.cast(global_step, tf.float32) / tf.cast(
w_steps, tf.float32))
# TODO(pkanwar): use a flag to help calc num_epochs.
num_epochs = 90
train_steps = batches_per_epoch * num_epochs
min_step = tf.constant(1, dtype=tf.int64)
decay_steps = tf.maximum(min_step, tf.subtract(global_step, w_steps))
poly_rate = tf.train.polynomial_decay(
plr,
decay_steps,
train_steps - w_steps + 1,
power=2.0)
return tf.where(global_step <= w_steps, wrate, poly_rate)
# For LARS we have a new learning rate schedule
if flags.FLAGS.enable_lars:
return poly_rate_fn
return learning_rate_fn
def resnet_model_fn(features, labels, mode, model_class,
resnet_size, weight_decay, learning_rate_fn, momentum,
data_format, resnet_version, loss_scale,
loss_filter_fn=None, dtype=resnet_model.DEFAULT_DTYPE,
fine_tune=False, label_smoothing=0.0):
"""Shared functionality for different resnet model_fns.
Initializes the ResnetModel representing the model layers
and uses that model to build the necessary EstimatorSpecs for
the `mode` in question. For training, this means building losses,
the optimizer, and the train op that get passed into the EstimatorSpec.
For evaluation and prediction, the EstimatorSpec is returned without
a train op, but with the necessary parameters for the given mode.
Args:
features: tensor representing input images
labels: tensor representing class labels for all input images
mode: current estimator mode; should be one of
`tf.estimator.ModeKeys.TRAIN`, `EVALUATE`, `PREDICT`
model_class: a class representing a TensorFlow model that has a __call__
function. We assume here that this is a subclass of ResnetModel.
resnet_size: A single integer for the size of the ResNet model.
weight_decay: weight decay loss rate used to regularize learned variables.
learning_rate_fn: function that returns the current learning rate given
the current global_step
momentum: momentum term used for optimization
data_format: Input format ('channels_last', 'channels_first', or None).
If set to None, the format is dependent on whether a GPU is available.
resnet_version: Integer representing which version of the ResNet network to
use. See README for details. Valid values: [1, 2]
loss_scale: The factor to scale the loss for numerical stability. A detailed
summary is present in the arg parser help text.
loss_filter_fn: function that takes a string variable name and returns
True if the var should be included in loss calculation, and False
otherwise. If None, batch_normalization variables will be excluded
from the loss.
dtype: the TensorFlow dtype to use for calculations.
fine_tune: If True only train the dense layers(final layers).
label_smoothing: If greater than 0 then smooth the labels.
Returns:
EstimatorSpec parameterized according to the input params and the
current mode.
"""
# Generate a summary node for the images
tf.compat.v1.summary.image('images', features, max_outputs=6)
############## npu modify begin #############
# Checks that features/images have same data type being used for calculations.
if features.dtype != dtype:
features=tf.cast(features,dtype)
############## npu modify end ###############
model = model_class(resnet_size, data_format, resnet_version=resnet_version,
dtype=dtype)
logits = model(features, mode == tf.estimator.ModeKeys.TRAIN)
# This acts as a no-op if the logits are already in fp32 (provided logits are
# not a SparseTensor). If dtype is is low precision, logits must be cast to
# fp32 for numerical stability.
logits = tf.cast(logits, tf.float32)
predictions = {
'classes': tf.argmax(input=logits, axis=1),
'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
}
if mode == tf.estimator.ModeKeys.PREDICT:
# Return the predictions and the specification for serving a SavedModel
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
export_outputs={
'predict': tf.estimator.export.PredictOutput(predictions)
})
# Calculate loss, which includes softmax cross entropy and L2 regularization.
if label_smoothing != 0.0:
one_hot_labels = tf.one_hot(labels, 1001)
cross_entropy = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=one_hot_labels,
label_smoothing=label_smoothing)
else:
cross_entropy = tf.compat.v1.losses.sparse_softmax_cross_entropy(
logits=logits, labels=labels)
# Create a tensor named cross_entropy for logging purposes.
tf.identity(cross_entropy, name='cross_entropy')
tf.compat.v1.summary.scalar('cross_entropy', cross_entropy)
# If no loss_filter_fn is passed, assume we want the default behavior,
# which is that batch_normalization variables are excluded from loss.
def exclude_batch_norm(name):
return 'batch_normalization' not in name
loss_filter_fn = loss_filter_fn or exclude_batch_norm
# Add weight decay to the loss.
l2_loss = weight_decay * tf.add_n(
# loss is computed using fp32 for numerical stability.
[
tf.nn.l2_loss(tf.cast(v, tf.float32))
for v in tf.compat.v1.trainable_variables()
if loss_filter_fn(v.name)
])
tf.compat.v1.summary.scalar('l2_loss', l2_loss)
loss = cross_entropy + l2_loss
if mode == tf.estimator.ModeKeys.TRAIN:
global_step = tf.compat.v1.train.get_or_create_global_step()
learning_rate = learning_rate_fn(global_step)
# Create a tensor named learning_rate for logging purposes
tf.identity(learning_rate, name='learning_rate')
tf.compat.v1.summary.scalar('learning_rate', learning_rate)
if flags.FLAGS.enable_lars:
from tensorflow.contrib import opt as contrib_opt # pylint: disable=g-import-not-at-top
optimizer = contrib_opt.LARSOptimizer(
learning_rate,
momentum=momentum,
weight_decay=weight_decay,
skip_list=['batch_normalization', 'bias'])
else:
optimizer = tf.compat.v1.train.MomentumOptimizer(
learning_rate=learning_rate,
momentum=momentum
)
############## npu modify begin #############
optimizer = NPUDistributedOptimizer(optimizer)
############## npu modify end ###############
fp16_implementation = getattr(flags.FLAGS, 'fp16_implementation', None)
if fp16_implementation == 'graph_rewrite':
optimizer = (
tf.compat.v1.train.experimental.enable_mixed_precision_graph_rewrite(
optimizer, loss_scale=loss_scale))
def _dense_grad_filter(gvs):
"""Only apply gradient updates to the final layer.
This function is used for fine tuning.
Args:
gvs: list of tuples with gradients and variable info
Returns:
filtered gradients so that only the dense layer remains
"""
return [(g, v) for g, v in gvs if 'dense' in v.name]
#loss_scale = 512
if loss_scale != 1 and fp16_implementation != 'graph_rewrite':
# When computing fp16 gradients, often intermediate tensor values are
# so small, they underflow to 0. To avoid this, we multiply the loss by
# loss_scale to make these tensor values loss_scale times bigger.
scaled_grad_vars = optimizer.compute_gradients(loss * loss_scale)
print(">>>>>>>>>>>>>>>>>>>")
print(loss_scale)
print("<<<<<<<<<<<<<<<<<<")
if fine_tune:
scaled_grad_vars = _dense_grad_filter(scaled_grad_vars)
# Once the gradient computation is complete we can scale the gradients
# back to the correct scale before passing them to the optimizer.
unscaled_grad_vars = [(grad / loss_scale, var)
for grad, var in scaled_grad_vars]
minimize_op = optimizer.apply_gradients(unscaled_grad_vars, global_step)
else:
grad_vars = optimizer.compute_gradients(loss)
if fine_tune:
grad_vars = _dense_grad_filter(grad_vars)
minimize_op = optimizer.apply_gradients(grad_vars, global_step)
update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
train_op = tf.group(minimize_op, update_ops)
else:
train_op = None
############## npu modify begin #############
#Using float32 for better performance
accuracy = tf.compat.v1.metrics.accuracy(tf.cast(labels,tf.float32), predictions['classes'])
############## npu modify end ###############
accuracy_top_5 = tf.compat.v1.metrics.mean(
tf.nn.in_top_k(predictions=logits, targets=labels, k=5, name='top_5_op'))
############## npu modify begin #############
#Using for 8P
rank_size = int(os.getenv('RANK_SIZE'))
newaccuracy = (hccl_ops.allreduce(accuracy[0], "sum") / rank_size, accuracy[1])
newaccuracy_top_5 = (hccl_ops.allreduce(accuracy_top_5[0], "sum") / rank_size, accuracy_top_5[1])
############## npu modify begin #############
metrics = {'accuracy': newaccuracy,
'accuracy_top_5': newaccuracy_top_5}
# Create a tensor named train_accuracy for logging purposes
tf.identity(accuracy[1], name='train_accuracy')
tf.identity(accuracy_top_5[1], name='train_accuracy_top_5')
tf.compat.v1.summary.scalar('train_accuracy', accuracy[1])
tf.compat.v1.summary.scalar('train_accuracy_top_5', accuracy_top_5[1])
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
############## npu modify begin #############
def init_npu():
"""Initialize npu manually.
Returns:
`init_sess` npu init session config.
`npu_init` npu init ops.
"""
npu_init = npu_ops.initialize_system()
config = tf.ConfigProto()
#npu mix precision attribute set to true when using mix precision
config.graph_options.rewrite_options.remapping = rewriter_config_pb2.RewriterConfig.OFF
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
#custom_op.parameter_map["precision_mode"].b = True
custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision")
custom_op.parameter_map["use_off_line"].b = True
init_sess = tf.Session(config=config)
print("this is init sess config ------------- ",config)
print("this is npu_init ------------- ", npu_init)
# i=1
# while(1):
# i+=1
return init_sess,npu_init
############## npu modify end ###############
def resnet_main(
flags_obj, model_function, input_function, dataset_name, num_images, shape=None):
"""Shared main loop for ResNet Models.
Args:
flags_obj: An object containing parsed flags. See define_resnet_flags()
for details.
model_function: the function that instantiates the Model and builds the
ops for train/eval. This will be passed directly into the estimator.
input_function: the function that processes the dataset and returns a
dataset that the estimator can train on. This will be wrapped with
all the relevant flags for running and passed to estimator.
dataset_name: the name of the dataset for training and evaluation. This is
used for logging purpose.
shape: list of ints representing the shape of the images used for training.
This is only used if flags_obj.export_dir is passed.
Returns:
Dict of results of the run. Contains the keys `eval_results` and
`train_hooks`. `eval_results` contains accuracy (top_1) and accuracy_top_5.
`train_hooks` is a list the instances of hooks used during training.
"""
# Set other logger configurations
# work_num="work " + str(os.environ.get("DEVICE_INDEX"))
# hwlog.config(
# default_namespace=work_num,
# default_stack_offset=1,
# default_clear_line=False,
# root_dir=os.path.normpath(
# os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "..")))
# global logger1
# logger1 = get_logger('rizhi', log_file1)
# print("work_num is ", work_num)
# exit()
model_helpers.apply_clean(flags.FLAGS)
# Ensures flag override logic is only executed if explicitly triggered.
if flags_obj.tf_gpu_thread_mode:
override_flags_and_set_envars_for_gpu_thread_pool(flags_obj)
# Configures cluster spec for distribution strategy.
num_workers = distribution_utils.configure_cluster(flags_obj.worker_hosts,
flags_obj.task_index)
# Creates session config. allow_soft_placement = True, is required for
# multi-GPU and is not harmful for other modes.
session_config = tf.compat.v1.ConfigProto(
inter_op_parallelism_threads=flags_obj.inter_op_parallelism_threads,
intra_op_parallelism_threads=flags_obj.intra_op_parallelism_threads,
allow_soft_placement=True)
distribution_strategy = distribution_utils.get_distribution_strategy(
distribution_strategy=flags_obj.distribution_strategy,
num_gpus=flags_core.get_num_gpus(flags_obj),
all_reduce_alg=flags_obj.all_reduce_alg,
num_packs=flags_obj.num_packs)
############## npu modify begin #############
# Creates a `NPURunConfig` that checkpoints every 115200 steps
run_config = NPURunConfig(
model_dir=flags_obj.model_dir,
session_config=session_config,
keep_checkpoint_max=5,
save_summary_steps=0,
#save_checkpoints_steps=115200,
save_checkpoints_steps=flags_obj.save_checkpoints_steps,
enable_data_pre_proc=True,
#iterations_per_loop=100,
iterations_per_loop=flags_obj.iterations_per_loop,
#enable_auto_mix_precision=True,
precision_mode='allow_mix_precision',
hcom_parallel=True
)
############## npu modify end ###############
# Initializes model with all but the dense layer from pretrained ResNet.
if flags_obj.pretrained_model_checkpoint_path is not None:
warm_start_settings = tf.estimator.WarmStartSettings(
flags_obj.pretrained_model_checkpoint_path,
vars_to_warm_start='^(?!.*dense)')
else:
warm_start_settings = None
############## npu modify begin #############
# Creates a `NPUEstimator` instead of using tf.estimator.Estimator
classifier = NPUEstimator(
model_fn=model_function, model_dir=flags_obj.model_dir, config=run_config,
params={
'resnet_size': int(flags_obj.resnet_size),
'data_format': flags_obj.data_format,
'batch_size': flags_obj.batch_size,
'resnet_version': int(flags_obj.resnet_version),
'loss_scale': flags_core.get_loss_scale(flags_obj,
default_for_fp16=128),
'dtype': flags_core.get_tf_dtype(flags_obj),
'fine_tune': flags_obj.fine_tune,
'num_workers': num_workers,
'num_gpus' : flags_core.get_num_gpus(flags_obj),
})
############## npu modify end ###############
run_params = {
'batch_size': flags_obj.batch_size,
'dtype': flags_core.get_tf_dtype(flags_obj),
'resnet_size': flags_obj.resnet_size,
'resnet_version': flags_obj.resnet_version,
'synthetic_data': flags_obj.use_synthetic_data,
'train_epochs': flags_obj.train_epochs,
'num_workers': num_workers,
}
if flags_obj.use_synthetic_data:
dataset_name = dataset_name + '-synthetic'
benchmark_logger = logger.get_benchmark_logger()
benchmark_logger.log_run_info('resnet', dataset_name, run_params,
test_id=flags_obj.benchmark_test_id)
train_hooks = hooks_helper.get_train_hooks(
flags_obj.hooks,
model_dir=flags_obj.model_dir,
batch_size=flags_obj.batch_size)
def input_fn_train(num_epochs, input_context=None):
############## npu modify begin #############
# Using dtype=tf.float16 for higher data transmission performance
# drop_remainder currently only support true
# batch_size means single card batch instead of global batch size
return input_function(
is_training=True,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
num_epochs=num_epochs,
dtype=tf.float16,
input_context=input_context,
drop_remainder=True)
def input_fn_eval():
# batch_size means single card batch instead of global batch size
# Using dtype=tf.float16 for higher data transmission performance
# drop_remainder currently only support true
return input_function(
is_training=False,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
num_epochs=1,
dtype=tf.float16,
input_context=True,
drop_remainder=True)
############## npu modify end ###############
train_epochs = (0 if flags_obj.eval_only or not flags_obj.train_epochs else
flags_obj.train_epochs)
use_train_and_evaluate = flags_obj.use_train_and_evaluate or num_workers > 1
############## npu_kai modify end ###############
# init_sess, npu_init = init_npu()
# npu_shutdown = npu_ops.shutdown_system()
############## npu_kai modify end ###############
if use_train_and_evaluate:
train_spec = tf.estimator.TrainSpec(
input_fn=lambda input_context=None: input_fn_train(
train_epochs, input_context=input_context),
hooks=train_hooks,
max_steps=flags_obj.max_train_steps)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn_eval)
tf.compat.v1.logging.info('Starting to train and evaluate.')
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
# tf.estimator.train_and_evalute doesn't return anything in multi-worker
# case.
eval_results = {}
else:
if train_epochs == 0:
# If --eval_only is set, perform a single loop with zero train epochs.
schedule, n_loops = [0], 1
else:
# Compute the number of times to loop while training. All but the last
# pass will train for `epochs_between_evals` epochs, while the last will
# train for the number needed to reach `training_epochs`. For instance if
# train_epochs = 25 and epochs_between_evals = 10
# schedule will be set to [10, 10, 5]. That is to say, the loop will:
# Train for 10 epochs and then evaluate.
# Train for another 10 epochs and then evaluate.
# Train for a final 5 epochs (to reach 25 epochs) and then evaluate.
n_loops = math.ceil(train_epochs / flags_obj.epochs_between_evals)
schedule = [flags_obj.epochs_between_evals for _ in range(int(n_loops))]
schedule[-1] = train_epochs - sum(schedule[:-1]) # over counting.
current_max_steps = 0
############## npu modify begin #############
#if flags_obj.max_train_steps is None:
# flags_obj.max_train_steps = (num_images['train']/flags_obj.batch_size)/flags_core.get_num_gpus(flags_obj)
# max_eval_steps = num_images['validation']/flags_obj.batch_size
# else:
# max_eval_steps = flags_obj.max_train_steps
# for cycle_index, num_train_epochs in enumerate(schedule):
# print(cycle_index)
# print(num_train_epochs)
############## npu modify end #############
for cycle_index, num_train_epochs in enumerate(schedule):
tf.compat.v1.logging.info('Starting cycle: %d/%d', cycle_index,
int(n_loops))
############## npu modify begin #############
if flags_obj.max_train_steps is None:
current_max_steps += (
num_images['train'] / flags_obj.batch_size) * num_train_epochs / flags_core.get_num_gpus(
flags_obj)
else:
current_max_steps += flags_obj.max_train_steps
############## npu modify end #############
# add zwx5326390训练开始
# hwlogger.event(key=hwlog.constants.GLOBAL_BATCH_SIZE, value=flags_obj.batch_size)
#work_num, root_dir, datatime, resnet_logger = hwlog.env(log_file1)
#date_time = hwlog.get_time()
#resnet_logger.info("namespace: %s,time_ts: %s, global_batch_size: %d, num_train_epochs: %d" %(\
#work_num, date_time, flags_obj.batch_size, num_train_epochs))
#remark_logger.info("ABK time_ts: %s, current_epoch: %d, batch_size: %d, file: %s, lineno: %s" % (date_time,
# num_train_epochs, flags_obj.batch_size,file_name, sys._getframe().f_lineno))
hwlog.remark_print(key=hwlog.CURRENT_EPOCH, value=num_train_epochs)
if num_train_epochs:
# Since we are calling classifier.train immediately in each loop, the
# value of num_train_epochs in the lambda function will not be changed
# before it is used. So it is safe to ignore the pylint error here
# pylint: disable=cell-var-from-loop
# hwlogger.start(key=hwlog.constants.EPOCH_START)
from hccl.split.api import set_split_strategy_by_idx
set_split_strategy_by_idx([86,160])
classifier.train(
input_fn=lambda input_context=True: input_fn_train(
num_train_epochs, input_context=input_context),
hooks=train_hooks,
max_steps=current_max_steps)
# hwlogger.end(key=hwlog.constants.EPOCH_STOP)
############## npu modify begin #############
# npu resorce will be destoryed When the training is over
# Reinitialize is needed if using hccl interface before next process
init_sess,npu_init=init_npu()
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
init_sess.run(npu_init)
############## npu modify end ###############
# flags_obj.max_train_steps is generally associated with testing and
# profiling. As a result it is frequently called with synthetic data,
# which will iterate forever. Passing steps=flags_obj.max_train_steps
# allows the eval (which is generally unimportant in those circumstances)
# to terminate. Note that eval will run for max_train_steps each loop,
# regardless of the global_step count.
tf.compat.v1.logging.info('Starting to evaluate.')
eval_results = classifier.evaluate(input_fn=input_fn_eval,
steps=num_images['validation']/flags_obj.batch_size)
benchmark_logger.log_evaluation_result(eval_results)
#date_time = hwlog.get_time()
#remark_logger.info("ABK time_ts: %s, accuracy: %f, accuracy_top_5: %f, file: %s, lineno: %s" % (date_time,
# float(eval_results.get("accuracy")),float(eval_results.get("accuracy_top_5")), file_name,sys._getframe().f_lineno))
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP1, value=float(eval_results.get("accuracy")))
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP5, value=float(eval_results.get("accuracy_top_5")))
if model_helpers.past_stop_threshold(
flags_obj.stop_threshold, eval_results['accuracy']):
break
############## npu modify begin #############
# npu resorce will be destoryed when evaluate finish
# Reinitialize is needed before using hccl interface
if cycle_index < n_loops-1:
init_sess,npu_init=init_npu()
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
#from hccl.split.api import set_split_strategy_by_idx
# set_split_strategy_by_idx([86,160])
init_sess.run(npu_init)
############## npu modify end ###############
if flags_obj.export_dir is not None:
# Exports a saved model for the given classifier.
export_dtype = flags_core.get_tf_dtype(flags_obj)
if flags_obj.image_bytes_as_serving_input:
input_receiver_fn = functools.partial(
image_bytes_serving_input_fn, shape, dtype=export_dtype)
else:
input_receiver_fn = export.build_tensor_serving_input_receiver_fn(
shape, batch_size=flags_obj.batch_size, dtype=export_dtype)
classifier.export_savedmodel(flags_obj.export_dir, input_receiver_fn,
strip_default_attrs=True)
############## npu modify begin #############
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
############## npu modify end ###############
stats = {}
stats['eval_results'] = eval_results
stats['train_hooks'] = train_hooks
return stats
def define_resnet_flags(resnet_size_choices=None, dynamic_loss_scale=False,
fp16_implementation=False):
"""Add flags and validators for ResNet."""
flags_core.define_base(clean=True, train_epochs=True,
epochs_between_evals=True, stop_threshold=True,
num_gpu=True, hooks=True, export_dir=True,
distribution_strategy=True)
flags_core.define_performance(num_parallel_calls=False,
inter_op=True,
intra_op=True,
synthetic_data=True,
dtype=True,
all_reduce_alg=True,
num_packs=True,
tf_gpu_thread_mode=True,
datasets_num_private_threads=True,
dynamic_loss_scale=dynamic_loss_scale,
fp16_implementation=fp16_implementation,
loss_scale=True,
tf_data_experimental_slack=True,
max_train_steps=True)
flags_core.define_image()
flags_core.define_benchmark()
flags_core.define_distribution()
flags.adopt_module_key_flags(flags_core)
flags.DEFINE_enum(
name='resnet_version', short_name='rv', default='1',
enum_values=['1', '2'],
help=flags_core.help_wrap(
'Version of ResNet. (1 or 2) See README.md for details.'))
flags.DEFINE_bool(
name='fine_tune', short_name='ft', default=False,
help=flags_core.help_wrap(
'If True do not train any parameters except for the final layer.'))
flags.DEFINE_string(
name='pretrained_model_checkpoint_path', short_name='pmcp', default=None,
help=flags_core.help_wrap(
'If not None initialize all the network except the final layer with '
'these values'))
flags.DEFINE_boolean(
name='eval_only', default=False,
help=flags_core.help_wrap('Skip training and only perform evaluation on '
'the latest checkpoint.'))
flags.DEFINE_boolean(
name='image_bytes_as_serving_input', default=False,
help=flags_core.help_wrap(
'If True exports savedmodel with serving signature that accepts '
'JPEG image bytes instead of a fixed size [HxWxC] tensor that '
'represents the image. The former is easier to use for serving at '
'the expense of image resize/cropping being done as part of model '
'inference. Note, this flag only applies to ImageNet and cannot '
'be used for CIFAR.'))
flags.DEFINE_boolean(
name='use_train_and_evaluate', default=False,
help=flags_core.help_wrap(
'If True, uses `tf.estimator.train_and_evaluate` for the training '
'and evaluation loop, instead of separate calls to `classifier.train '
'and `classifier.evaluate`, which is the default behavior.'))
flags.DEFINE_bool(
name='enable_lars', default=False,
help=flags_core.help_wrap(
'Enable LARS optimizer for large batch training.'))
flags.DEFINE_float(
name='label_smoothing', default=0.0,
help=flags_core.help_wrap(
'Label smoothing parameter used in the softmax_cross_entropy'))
flags.DEFINE_float(
name='weight_decay', default=1e-4,
help=flags_core.help_wrap(
'Weight decay coefficiant for l2 regularization.'))
choice_kwargs = dict(
name='resnet_size', short_name='rs', default='50',
help=flags_core.help_wrap('The size of the ResNet model to use.'))
if resnet_size_choices is None:
flags.DEFINE_string(**choice_kwargs)
else:
flags.DEFINE_enum(enum_values=resnet_size_choices, **choice_kwargs)
@@ -0,0 +1,901 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains utility and supporting functions for ResNet.
This module contains ResNet code which does not directly build layers. This
includes dataset management, hyperparameter and optimizer code, and argument
parsing. Code for defining the ResNet layers can be found in resnet_model.py.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import math
import multiprocessing
import os
from absl import flags
import tensorflow as tf
############## npu modify begin #############
from npu_bridge.estimator.npu.npu_config import NPURunConfig
from npu_bridge.estimator.npu.npu_estimator import NPUEstimator
from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
from npu_bridge.estimator import npu_ops
from hccl.manage.api import get_local_rank_id
from hccl.manage.api import get_rank_size
from hccl.manage.api import get_rank_id
from tensorflow.core.protobuf import rewriter_config_pb2
############## npu modify end ###############
from official.r1.resnet import imagenet_preprocessing
from official.r1.resnet import resnet_model
from official.r1.utils import export
from official.utils.flags import core as flags_core
from official.utils.logs import hooks_helper
from official.utils.logs import logger
from official.utils.misc import distribution_utils
from official.utils.misc import model_helpers
################################################################################
# Functions for input processing.
################################################################################
def process_record_dataset(dataset,
is_training,
batch_size,
shuffle_buffer,
parse_record_fn,
num_epochs=1,
dtype=tf.float32,
datasets_num_private_threads=None,
drop_remainder=False,
tf_data_experimental_slack=False):
"""Given a Dataset with raw records, return an iterator over the records.
Args:
dataset: A Dataset representing raw records
is_training: A boolean denoting whether the input is for training.
batch_size: The number of samples per batch.
shuffle_buffer: The buffer size to use when shuffling records. A larger
value results in better randomness, but smaller values reduce startup
time and use less memory.
parse_record_fn: A function that takes a raw record and returns the
corresponding (image, label) pair.
num_epochs: The number of epochs to repeat the dataset.
dtype: Data type to use for images/features.
datasets_num_private_threads: Number of threads for a private
threadpool created for all datasets computation.
drop_remainder: A boolean indicates whether to drop the remainder of the
batches. If True, the batch dimension will be static.
tf_data_experimental_slack: Whether to enable tf.data's
`experimental_slack` option.
Returns:
Dataset of (image, label) pairs ready for iteration.
"""
# Defines a specific size thread pool for tf.data operations.
if datasets_num_private_threads:
options = tf.data.Options()
options.experimental_threading.private_threadpool_size = (
datasets_num_private_threads)
dataset = dataset.with_options(options)
tf.compat.v1.logging.info('datasets_num_private_threads: %s',
datasets_num_private_threads)
# Disable intra-op parallelism to optimize for throughput instead of latency.
options = tf.data.Options()
options.experimental_threading.max_intra_op_parallelism = 1
dataset = dataset.with_options(options)
# Prefetches a batch at a time to smooth out the time taken to load input
# files for shuffling and processing.
dataset = dataset.prefetch(buffer_size=batch_size)
if is_training:
# Shuffles records before repeating to respect epoch boundaries.
dataset = dataset.shuffle(buffer_size=shuffle_buffer)
# Repeats the dataset for the number of epochs to train.
#dataset = dataset.repeat(num_epochs)
dataset = dataset.repeat()
# Parses the raw records into images and labels.
dataset = dataset.map(
lambda value: parse_record_fn(value, is_training, dtype),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
# Operations between the final prefetch and the get_next call to the iterator
# will happen synchronously during run time. We prefetch here again to
# background all of the above processing work and keep it out of the
# critical training path. Setting buffer_size to tf.data.experimental.AUTOTUNE
# allows DistributionStrategies to adjust how many batches to fetch based
# on how many devices are present.
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
if tf_data_experimental_slack:
options = tf.data.Options()
options.experimental_slack = True
dataset = dataset.with_options(options)
return dataset
def get_synth_input_fn(height, width, num_channels, num_classes,
dtype=tf.float32):
"""Returns an input function that returns a dataset with random data.
This input_fn returns a data set that iterates over a set of random data and
bypasses all preprocessing, e.g. jpeg decode and copy. The host to device
copy is still included. This used to find the upper throughput bound when
tunning the full input pipeline.
Args:
height: Integer height that will be used to create a fake image tensor.
width: Integer width that will be used to create a fake image tensor.
num_channels: Integer depth that will be used to create a fake image tensor.
num_classes: Number of classes that should be represented in the fake labels
tensor
dtype: Data type for features/images.
Returns:
An input_fn that can be used in place of a real one to return a dataset
that can be used for iteration.
"""
# pylint: disable=unused-argument
def input_fn(is_training, data_dir, batch_size, *args, **kwargs):
"""Returns dataset filled with random data."""
# Synthetic input should be within [0, 255].
inputs = tf.random.truncated_normal(
[batch_size] + [height, width, num_channels],
dtype=dtype,
mean=127,
stddev=60,
name='synthetic_inputs')
labels = tf.random.uniform(
[batch_size],
minval=0,
maxval=num_classes - 1,
dtype=tf.int32,
name='synthetic_labels')
data = tf.data.Dataset.from_tensors((inputs, labels)).repeat()
data = data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return data
return input_fn
def image_bytes_serving_input_fn(image_shape, dtype=tf.float32):
"""Serving input fn for raw jpeg images."""
def _preprocess_image(image_bytes):
"""Preprocess a single raw image."""
# Bounding box around the whole image.
bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=dtype, shape=[1, 1, 4])
height, width, num_channels = image_shape
image = imagenet_preprocessing.preprocess_image(
image_bytes, bbox, height, width, num_channels, is_training=False)
return image
image_bytes_list = tf.compat.v1.placeholder(
shape=[None], dtype=tf.string, name='input_tensor')
images = tf.map_fn(
_preprocess_image, image_bytes_list, back_prop=False, dtype=dtype)
return tf.estimator.export.TensorServingInputReceiver(
images, {'image_bytes': image_bytes_list})
def override_flags_and_set_envars_for_gpu_thread_pool(flags_obj):
"""Override flags and set env_vars for performance.
These settings exist to test the difference between using stock settings
and manual tuning. It also shows some of the ENV_VARS that can be tweaked to
squeeze a few extra examples per second. These settings are defaulted to the
current platform of interest, which changes over time.
On systems with small numbers of cpu cores, e.g. under 8 logical cores,
setting up a gpu thread pool with `tf_gpu_thread_mode=gpu_private` may perform
poorly.
Args:
flags_obj: Current flags, which will be adjusted possibly overriding
what has been set by the user on the command-line.
"""
cpu_count = multiprocessing.cpu_count()
tf.compat.v1.logging.info('Logical CPU cores: %s', cpu_count)
# Sets up thread pool for each GPU for op scheduling.
per_gpu_thread_count = 1
total_gpu_thread_count = per_gpu_thread_count * flags_obj.num_gpus
os.environ['TF_GPU_THREAD_MODE'] = flags_obj.tf_gpu_thread_mode
os.environ['TF_GPU_THREAD_COUNT'] = str(per_gpu_thread_count)
tf.compat.v1.logging.info('TF_GPU_THREAD_COUNT: %s',
os.environ['TF_GPU_THREAD_COUNT'])
tf.compat.v1.logging.info('TF_GPU_THREAD_MODE: %s',
os.environ['TF_GPU_THREAD_MODE'])
# Reduces general thread pool by number of threads used for GPU pool.
main_thread_count = cpu_count - total_gpu_thread_count
flags_obj.inter_op_parallelism_threads = main_thread_count
# Sets thread count for tf.data. Logical cores minus threads assign to the
# private GPU pool along with 2 thread per GPU for event monitoring and
# sending / receiving tensors.
num_monitoring_threads = 2 * flags_obj.num_gpus
flags_obj.datasets_num_private_threads = (cpu_count - total_gpu_thread_count
- num_monitoring_threads)
################################################################################
# Functions for running training/eval/validation loops for the model.
################################################################################
def learning_rate_with_decay(
batch_size, batch_denom, num_images, boundary_epochs, decay_rates,
base_lr=0.1, warmup=False):
"""Get a learning rate that decays step-wise as training progresses.
Args:
batch_size: the number of examples processed in each training batch.
batch_denom: this value will be used to scale the base learning rate.
`0.1 * batch size` is divided by this number, such that when
batch_denom == batch_size, the initial learning rate will be 0.1.
num_images: total number of images that will be used for training.
boundary_epochs: list of ints representing the epochs at which we
decay the learning rate.
decay_rates: list of floats representing the decay rates to be used
for scaling the learning rate. It should have one more element
than `boundary_epochs`, and all elements should have the same type.
base_lr: Initial learning rate scaled based on batch_denom.
warmup: Run a 5 epoch warmup to the initial lr.
Returns:
Returns a function that takes a single argument - the number of batches
trained so far (global_step)- and returns the learning rate to be used
for training the next batch.
"""
initial_learning_rate = base_lr * batch_size / batch_denom
batches_per_epoch = num_images / batch_size
# Reduce the learning rate at certain epochs.
# CIFAR-10: divide by 10 at epoch 100, 150, and 200
# ImageNet: divide by 10 at epoch 30, 60, 80, and 90
boundaries = [int(batches_per_epoch * epoch) for epoch in boundary_epochs]
vals = [initial_learning_rate * decay for decay in decay_rates]
def learning_rate_fn(global_step):
"""Builds scaled learning rate function with 5 epoch warm up."""
############## npu modify begin #############
#Using int32 for better computing performance
global_step=tf.cast(global_step,tf.int32)
############## npu modify end ###############
lr = tf.compat.v1.train.piecewise_constant(global_step, boundaries, vals)
if warmup:
warmup_steps = int(batches_per_epoch * 5)
warmup_lr = (
initial_learning_rate * tf.cast(global_step, tf.float32) / tf.cast(
warmup_steps, tf.float32))
return tf.cond(pred=global_step < warmup_steps,
true_fn=lambda: warmup_lr,
false_fn=lambda: lr)
return lr
def poly_rate_fn(global_step):
"""Handles linear scaling rule, gradual warmup, and LR decay.
The learning rate starts at 0, then it increases linearly per step. After
FLAGS.poly_warmup_epochs, we reach the base learning rate (scaled to account
for batch size). The learning rate is then decayed using a polynomial rate
decay schedule with power 2.0.
Args:
global_step: the current global_step
Returns:
returns the current learning rate
"""
# Learning rate schedule for LARS polynomial schedule
if flags.FLAGS.batch_size < 8192:
plr = 5.0
w_epochs = 5
elif flags.FLAGS.batch_size < 16384:
plr = 10.0
w_epochs = 5
elif flags.FLAGS.batch_size < 32768:
plr = 25.0
w_epochs = 5
else:
plr = 32.0
w_epochs = 14
w_steps = int(w_epochs * batches_per_epoch)
wrate = (plr * tf.cast(global_step, tf.float32) / tf.cast(
w_steps, tf.float32))
# TODO(pkanwar): use a flag to help calc num_epochs.
num_epochs = 90
train_steps = batches_per_epoch * num_epochs
min_step = tf.constant(1, dtype=tf.int64)
decay_steps = tf.maximum(min_step, tf.subtract(global_step, w_steps))
poly_rate = tf.train.polynomial_decay(
plr,
decay_steps,
train_steps - w_steps + 1,
power=2.0)
return tf.where(global_step <= w_steps, wrate, poly_rate)
# For LARS we have a new learning rate schedule
if flags.FLAGS.enable_lars:
return poly_rate_fn
return learning_rate_fn
def resnet_model_fn(features, labels, mode, model_class,
resnet_size, weight_decay, learning_rate_fn, momentum,
data_format, resnet_version, loss_scale,
loss_filter_fn=None, dtype=resnet_model.DEFAULT_DTYPE,
fine_tune=False, label_smoothing=0.0):
"""Shared functionality for different resnet model_fns.
Initializes the ResnetModel representing the model layers
and uses that model to build the necessary EstimatorSpecs for
the `mode` in question. For training, this means building losses,
the optimizer, and the train op that get passed into the EstimatorSpec.
For evaluation and prediction, the EstimatorSpec is returned without
a train op, but with the necessary parameters for the given mode.
Args:
features: tensor representing input images
labels: tensor representing class labels for all input images
mode: current estimator mode; should be one of
`tf.estimator.ModeKeys.TRAIN`, `EVALUATE`, `PREDICT`
model_class: a class representing a TensorFlow model that has a __call__
function. We assume here that this is a subclass of ResnetModel.
resnet_size: A single integer for the size of the ResNet model.
weight_decay: weight decay loss rate used to regularize learned variables.
learning_rate_fn: function that returns the current learning rate given
the current global_step
momentum: momentum term used for optimization
data_format: Input format ('channels_last', 'channels_first', or None).
If set to None, the format is dependent on whether a GPU is available.
resnet_version: Integer representing which version of the ResNet network to
use. See README for details. Valid values: [1, 2]
loss_scale: The factor to scale the loss for numerical stability. A detailed
summary is present in the arg parser help text.
loss_filter_fn: function that takes a string variable name and returns
True if the var should be included in loss calculation, and False
otherwise. If None, batch_normalization variables will be excluded
from the loss.
dtype: the TensorFlow dtype to use for calculations.
fine_tune: If True only train the dense layers(final layers).
label_smoothing: If greater than 0 then smooth the labels.
Returns:
EstimatorSpec parameterized according to the input params and the
current mode.
"""
# Generate a summary node for the images
tf.compat.v1.summary.image('images', features, max_outputs=6)
############## npu modify begin #############
# Checks that features/images have same data type being used for calculations.
if features.dtype != dtype:
features=tf.cast(features,dtype)
############## npu modify end ###############
model = model_class(resnet_size, data_format, resnet_version=resnet_version,
dtype=dtype)
logits = model(features, mode == tf.estimator.ModeKeys.TRAIN)
# This acts as a no-op if the logits are already in fp32 (provided logits are
# not a SparseTensor). If dtype is is low precision, logits must be cast to
# fp32 for numerical stability.
logits = tf.cast(logits, tf.float32)
predictions = {
'classes': tf.argmax(input=logits, axis=1),
'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
}
if mode == tf.estimator.ModeKeys.PREDICT:
# Return the predictions and the specification for serving a SavedModel
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
export_outputs={
'predict': tf.estimator.export.PredictOutput(predictions)
})
# Calculate loss, which includes softmax cross entropy and L2 regularization.
if label_smoothing != 0.0:
one_hot_labels = tf.one_hot(labels, 1001)
cross_entropy = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=one_hot_labels,
label_smoothing=label_smoothing)
else:
cross_entropy = tf.compat.v1.losses.sparse_softmax_cross_entropy(
logits=logits, labels=labels)
# Create a tensor named cross_entropy for logging purposes.
tf.identity(cross_entropy, name='cross_entropy')
tf.compat.v1.summary.scalar('cross_entropy', cross_entropy)
# If no loss_filter_fn is passed, assume we want the default behavior,
# which is that batch_normalization variables are excluded from loss.
def exclude_batch_norm(name):
return 'batch_normalization' not in name
loss_filter_fn = loss_filter_fn or exclude_batch_norm
# Add weight decay to the loss.
l2_loss = weight_decay * tf.add_n(
# loss is computed using fp32 for numerical stability.
[
tf.nn.l2_loss(tf.cast(v, tf.float32))
for v in tf.compat.v1.trainable_variables()
if loss_filter_fn(v.name)
])
tf.compat.v1.summary.scalar('l2_loss', l2_loss)
loss = cross_entropy + l2_loss
if mode == tf.estimator.ModeKeys.TRAIN:
global_step = tf.compat.v1.train.get_or_create_global_step()
learning_rate = learning_rate_fn(global_step)
# Create a tensor named learning_rate for logging purposes
tf.identity(learning_rate, name='learning_rate')
tf.compat.v1.summary.scalar('learning_rate', learning_rate)
if flags.FLAGS.enable_lars:
from tensorflow.contrib import opt as contrib_opt # pylint: disable=g-import-not-at-top
optimizer = contrib_opt.LARSOptimizer(
learning_rate,
momentum=momentum,
weight_decay=weight_decay,
skip_list=['batch_normalization', 'bias'])
else:
optimizer = tf.compat.v1.train.MomentumOptimizer(
learning_rate=learning_rate,
momentum=momentum
)
############## npu modify begin #############
optimizer = NPUDistributedOptimizer(optimizer)
############## npu modify end ###############
fp16_implementation = getattr(flags.FLAGS, 'fp16_implementation', None)
if fp16_implementation == 'graph_rewrite':
optimizer = (
tf.compat.v1.train.experimental.enable_mixed_precision_graph_rewrite(
optimizer, loss_scale=loss_scale))
def _dense_grad_filter(gvs):
"""Only apply gradient updates to the final layer.
This function is used for fine tuning.
Args:
gvs: list of tuples with gradients and variable info
Returns:
filtered gradients so that only the dense layer remains
"""
return [(g, v) for g, v in gvs if 'dense' in v.name]
# if loss_scale != 1 and fp16_implementation != 'graph_rewrite':
# When computing fp16 gradients, often intermediate tensor values are
# so small, they underflow to 0. To avoid this, we multiply the loss by
# loss_scale to make these tensor values loss_scale times bigger.
loss_scale = 512
scaled_grad_vars = optimizer.compute_gradients(loss * loss_scale)
if fine_tune:
scaled_grad_vars = _dense_grad_filter(scaled_grad_vars)
# Once the gradient computation is complete we can scale the gradients
# back to the correct scale before passing them to the optimizer.
unscaled_grad_vars = [(grad / loss_scale, var)
for grad, var in scaled_grad_vars]
minimize_op = optimizer.apply_gradients(unscaled_grad_vars, global_step)
#else:
# grad_vars = optimizer.compute_gradients(loss)
# if fine_tune:
# grad_vars = _dense_grad_filter(grad_vars)
# minimize_op = optimizer.apply_gradients(grad_vars, global_step)
update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
train_op = tf.group(minimize_op, update_ops)
else:
train_op = None
############## npu modify begin #############
#Using float32 for better performance
accuracy = tf.compat.v1.metrics.accuracy(tf.cast(labels,tf.float32), predictions['classes'])
############## npu modify end ###############
accuracy_top_5 = tf.compat.v1.metrics.mean(
tf.nn.in_top_k(predictions=logits, targets=labels, k=5, name='top_5_op'))
metrics = {'accuracy': accuracy,
'accuracy_top_5': accuracy_top_5}
# Create a tensor named train_accuracy for logging purposes
tf.identity(accuracy[1], name='train_accuracy')
tf.identity(accuracy_top_5[1], name='train_accuracy_top_5')
tf.compat.v1.summary.scalar('train_accuracy', accuracy[1])
tf.compat.v1.summary.scalar('train_accuracy_top_5', accuracy_top_5[1])
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
############## npu modify begin #############
def init_npu():
"""Initialize npu manually.
Returns:
`init_sess` npu init session config.
`npu_init` npu init ops.
"""
npu_init = npu_ops.initialize_system()
config = tf.ConfigProto()
#npu mix precision attribute set to true when using mix precision
config.graph_options.rewrite_options.remapping = rewriter_config_pb2.RewriterConfig.OFF
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
#custom_op.parameter_map["precision_mode"].b = True
custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision")
custom_op.parameter_map["use_off_line"].b = True
init_sess = tf.Session(config=config)
return init_sess,npu_init
############## npu modify end ###############
def resnet_main(
flags_obj, model_function, input_function, dataset_name, num_images, shape=None):
"""Shared main loop for ResNet Models.
Args:
flags_obj: An object containing parsed flags. See define_resnet_flags()
for details.
model_function: the function that instantiates the Model and builds the
ops for train/eval. This will be passed directly into the estimator.
input_function: the function that processes the dataset and returns a
dataset that the estimator can train on. This will be wrapped with
all the relevant flags for running and passed to estimator.
dataset_name: the name of the dataset for training and evaluation. This is
used for logging purpose.
shape: list of ints representing the shape of the images used for training.
This is only used if flags_obj.export_dir is passed.
Returns:
Dict of results of the run. Contains the keys `eval_results` and
`train_hooks`. `eval_results` contains accuracy (top_1) and accuracy_top_5.
`train_hooks` is a list the instances of hooks used during training.
"""
model_helpers.apply_clean(flags.FLAGS)
# Ensures flag override logic is only executed if explicitly triggered.
if flags_obj.tf_gpu_thread_mode:
override_flags_and_set_envars_for_gpu_thread_pool(flags_obj)
# Configures cluster spec for distribution strategy.
num_workers = distribution_utils.configure_cluster(flags_obj.worker_hosts,
flags_obj.task_index)
# Creates session config. allow_soft_placement = True, is required for
# multi-GPU and is not harmful for other modes.
session_config = tf.compat.v1.ConfigProto(
inter_op_parallelism_threads=flags_obj.inter_op_parallelism_threads,
intra_op_parallelism_threads=flags_obj.intra_op_parallelism_threads,
allow_soft_placement=True)
distribution_strategy = distribution_utils.get_distribution_strategy(
distribution_strategy=flags_obj.distribution_strategy,
num_gpus=flags_core.get_num_gpus(flags_obj),
all_reduce_alg=flags_obj.all_reduce_alg,
num_packs=flags_obj.num_packs)
############## npu modify begin #############
# Creates a `NPURunConfig` that checkpoints every 115200 steps
run_config = NPURunConfig(
model_dir=flags_obj.model_dir,
session_config=session_config,
keep_checkpoint_max=5,
save_checkpoints_steps=115200,
enable_data_pre_proc=True,
iterations_per_loop=100,
#enable_auto_mix_precision=True,
precision_mode='allow_mix_precision',
hcom_parallel=True
)
############## npu modify end ###############
# Initializes model with all but the dense layer from pretrained ResNet.
if flags_obj.pretrained_model_checkpoint_path is not None:
warm_start_settings = tf.estimator.WarmStartSettings(
flags_obj.pretrained_model_checkpoint_path,
vars_to_warm_start='^(?!.*dense)')
else:
warm_start_settings = None
############## npu modify begin #############
# Creates a `NPUEstimator` instead of using tf.estimator.Estimator
classifier = NPUEstimator(
model_fn=model_function, model_dir=flags_obj.model_dir, config=run_config,
params={
'resnet_size': int(flags_obj.resnet_size),
'data_format': flags_obj.data_format,
'batch_size': flags_obj.batch_size,
'resnet_version': int(flags_obj.resnet_version),
'loss_scale': flags_core.get_loss_scale(flags_obj,
default_for_fp16=128),
'dtype': flags_core.get_tf_dtype(flags_obj),
'fine_tune': flags_obj.fine_tune,
'num_workers': num_workers,
'num_gpus' : flags_core.get_num_gpus(flags_obj),
})
############## npu modify end ###############
run_params = {
'batch_size': flags_obj.batch_size,
'dtype': flags_core.get_tf_dtype(flags_obj),
'resnet_size': flags_obj.resnet_size,
'resnet_version': flags_obj.resnet_version,
'synthetic_data': flags_obj.use_synthetic_data,
'train_epochs': flags_obj.train_epochs,
'num_workers': num_workers,
}
if flags_obj.use_synthetic_data:
dataset_name = dataset_name + '-synthetic'
benchmark_logger = logger.get_benchmark_logger()
benchmark_logger.log_run_info('resnet', dataset_name, run_params,
test_id=flags_obj.benchmark_test_id)
train_hooks = hooks_helper.get_train_hooks(
flags_obj.hooks,
model_dir=flags_obj.model_dir,
batch_size=flags_obj.batch_size)
def input_fn_train(num_epochs, input_context=None):
############## npu modify begin #############
# Using dtype=tf.float16 for higher data transmission performance
# drop_remainder currently only support true
# batch_size means single card batch instead of global batch size
return input_function(
is_training=True,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
num_epochs=num_epochs,
dtype=tf.float16,
input_context=input_context,
drop_remainder=True)
def input_fn_eval():
# batch_size means single card batch instead of global batch size
# Using dtype=tf.float16 for higher data transmission performance
# drop_remainder currently only support true
return input_function(
is_training=False,
data_dir=flags_obj.data_dir,
batch_size=flags_obj.batch_size,
num_epochs=1,
dtype=tf.float16,
drop_remainder=True)
############## npu modify end ###############
train_epochs = (0 if flags_obj.eval_only or not flags_obj.train_epochs else
flags_obj.train_epochs)
use_train_and_evaluate = flags_obj.use_train_and_evaluate or num_workers > 1
if use_train_and_evaluate:
train_spec = tf.estimator.TrainSpec(
input_fn=lambda input_context=None: input_fn_train(
train_epochs, input_context=input_context),
hooks=train_hooks,
max_steps=flags_obj.max_train_steps)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn_eval)
tf.compat.v1.logging.info('Starting to train and evaluate.')
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
# tf.estimator.train_and_evalute doesn't return anything in multi-worker
# case.
eval_results = {}
else:
if train_epochs == 0:
# If --eval_only is set, perform a single loop with zero train epochs.
schedule, n_loops = [0], 1
else:
# Compute the number of times to loop while training. All but the last
# pass will train for `epochs_between_evals` epochs, while the last will
# train for the number needed to reach `training_epochs`. For instance if
# train_epochs = 25 and epochs_between_evals = 10
# schedule will be set to [10, 10, 5]. That is to say, the loop will:
# Train for 10 epochs and then evaluate.
# Train for another 10 epochs and then evaluate.
# Train for a final 5 epochs (to reach 25 epochs) and then evaluate.
n_loops = math.ceil(train_epochs / flags_obj.epochs_between_evals)
schedule = [flags_obj.epochs_between_evals for _ in range(int(n_loops))]
schedule[-1] = train_epochs - sum(schedule[:-1]) # over counting.
############## npu modify begin #############
if flags_obj.max_train_steps is None:
flags_obj.max_train_steps = (num_images['train']/flags_obj.batch_size)/flags_core.get_num_gpus(flags_obj)
max_eval_steps = num_images['validation']/flags_obj.batch_size
else:
max_eval_steps = flags_obj.max_train_steps
############## npu modify end #############
for cycle_index, num_train_epochs in enumerate(schedule):
tf.compat.v1.logging.info('Starting cycle: %d/%d', cycle_index,
int(n_loops))
if num_train_epochs:
# Since we are calling classifier.train immediately in each loop, the
# value of num_train_epochs in the lambda function will not be changed
# before it is used. So it is safe to ignore the pylint error here
# pylint: disable=cell-var-from-loop
classifier.train(
input_fn=lambda input_context=True: input_fn_train(
num_train_epochs, input_context=input_context),
hooks=train_hooks,
max_steps=flags_obj.max_train_steps*(cycle_index+1))
############## npu modify begin #############
# npu resorce will be destoryed When the training is over
# Reinitialize is needed if using hccl interface before next process
init_sess,npu_init=init_npu()
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
init_sess.run(npu_init)
############## npu modify end ###############
# flags_obj.max_train_steps is generally associated with testing and
# profiling. As a result it is frequently called with synthetic data,
# which will iterate forever. Passing steps=flags_obj.max_train_steps
# allows the eval (which is generally unimportant in those circumstances)
# to terminate. Note that eval will run for max_train_steps each loop,
# regardless of the global_step count.
tf.compat.v1.logging.info('Starting to evaluate.')
eval_results = classifier.evaluate(input_fn=input_fn_eval,
steps=max_eval_steps)
benchmark_logger.log_evaluation_result(eval_results)
if model_helpers.past_stop_threshold(
flags_obj.stop_threshold, eval_results['accuracy']):
break
############## npu modify begin #############
# npu resorce will be destoryed when evaluate finish
# Reinitialize is needed before using hccl interface
init_sess,npu_init=init_npu()
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
init_sess.run(npu_init)
############## npu modify end ###############
if flags_obj.export_dir is not None:
# Exports a saved model for the given classifier.
export_dtype = flags_core.get_tf_dtype(flags_obj)
if flags_obj.image_bytes_as_serving_input:
input_receiver_fn = functools.partial(
image_bytes_serving_input_fn, shape, dtype=export_dtype)
else:
input_receiver_fn = export.build_tensor_serving_input_receiver_fn(
shape, batch_size=flags_obj.batch_size, dtype=export_dtype)
classifier.export_savedmodel(flags_obj.export_dir, input_receiver_fn,
strip_default_attrs=True)
############## npu modify begin #############
npu_shutdown = npu_ops.shutdown_system()
init_sess.run(npu_shutdown)
############## npu modify end ###############
stats = {}
stats['eval_results'] = eval_results
stats['train_hooks'] = train_hooks
return stats
def define_resnet_flags(resnet_size_choices=None, dynamic_loss_scale=False,
fp16_implementation=False):
"""Add flags and validators for ResNet."""
flags_core.define_base(clean=True, train_epochs=True,
epochs_between_evals=True, stop_threshold=True,
num_gpu=True, hooks=True, export_dir=True,
distribution_strategy=True)
flags_core.define_performance(num_parallel_calls=False,
inter_op=True,
intra_op=True,
synthetic_data=True,
dtype=True,
all_reduce_alg=True,
num_packs=True,
tf_gpu_thread_mode=True,
datasets_num_private_threads=True,
dynamic_loss_scale=dynamic_loss_scale,
fp16_implementation=fp16_implementation,
loss_scale=True,
tf_data_experimental_slack=True,
max_train_steps=True)
flags_core.define_image()
flags_core.define_benchmark()
flags_core.define_distribution()
flags.adopt_module_key_flags(flags_core)
flags.DEFINE_enum(
name='resnet_version', short_name='rv', default='1',
enum_values=['1', '2'],
help=flags_core.help_wrap(
'Version of ResNet. (1 or 2) See README.md for details.'))
flags.DEFINE_bool(
name='fine_tune', short_name='ft', default=False,
help=flags_core.help_wrap(
'If True do not train any parameters except for the final layer.'))
flags.DEFINE_string(
name='pretrained_model_checkpoint_path', short_name='pmcp', default=None,
help=flags_core.help_wrap(
'If not None initialize all the network except the final layer with '
'these values'))
flags.DEFINE_boolean(
name='eval_only', default=False,
help=flags_core.help_wrap('Skip training and only perform evaluation on '
'the latest checkpoint.'))
flags.DEFINE_boolean(
name='image_bytes_as_serving_input', default=False,
help=flags_core.help_wrap(
'If True exports savedmodel with serving signature that accepts '
'JPEG image bytes instead of a fixed size [HxWxC] tensor that '
'represents the image. The former is easier to use for serving at '
'the expense of image resize/cropping being done as part of model '
'inference. Note, this flag only applies to ImageNet and cannot '
'be used for CIFAR.'))
flags.DEFINE_boolean(
name='use_train_and_evaluate', default=False,
help=flags_core.help_wrap(
'If True, uses `tf.estimator.train_and_evaluate` for the training '
'and evaluation loop, instead of separate calls to `classifier.train '
'and `classifier.evaluate`, which is the default behavior.'))
flags.DEFINE_bool(
name='enable_lars', default=False,
help=flags_core.help_wrap(
'Enable LARS optimizer for large batch training.'))
flags.DEFINE_float(
name='label_smoothing', default=0.0,
help=flags_core.help_wrap(
'Label smoothing parameter used in the softmax_cross_entropy'))
flags.DEFINE_float(
name='weight_decay', default=1e-4,
help=flags_core.help_wrap(
'Weight decay coefficiant for l2 regularization.'))
choice_kwargs = dict(
name='resnet_size', short_name='rs', default='50',
help=flags_core.help_wrap('The size of the ResNet model to use.'))
if resnet_size_choices is None:
flags.DEFINE_string(**choice_kwargs)
else:
flags.DEFINE_enum(enum_values=resnet_size_choices, **choice_kwargs)
@@ -0,0 +1,581 @@
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:20.962.831 [tdt/device/../common/src/log.cpp:158][TSDaemon] begin to send heartbeat to appmon,[tdt/device/src/tsd/tsdaemon.cpp:1580:SendHeartBeatToAppMon]8462 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:20.963.000 [tdt/device/../common/src/log.cpp:149][TsdEVENT] send heartbeat to appmon success,[tdt/device/src/tsd/tsdaemon.cpp:1587:SendHeartBeatToAppMon]8462
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.682 [tdt/device/../common/src/log.cpp:158][HdcSever] drv accept an session,[tdt/device/../common/src/hdc_server.cpp:330:AcceptHdcSession]8454 Msg: running ok
[INFO] HDC(8380,tsdaemon):2020-05-12-11:05:22.243.730 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:1609][drvHdcSetSessionReference:1609] >>> session 55, pid 8380
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.752 [tdt/device/../common/src/log.cpp:158][HdcSever] drvHdcSetSessionReference success,[tdt/device/../common/src/hdc_server.cpp:342:AcceptHdcSession]8454 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.772 [tdt/device/../common/src/log.cpp:158][HdcSever] drv accept an session and drvHdcSetSessionReference success, sessionId=1,[tdt/device/../common/src/hdc_server.cpp:351:AcceptHdcSession]8454 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.788 [tdt/device/../common/src/log.cpp:158][HdcSever] accept an session sessionId=1, open recv thread,[tdt/device/../common/src/hdc_server.cpp:279:Accept]8454 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.823 [tdt/device/../common/src/log.cpp:158]HdcServer::AcceptConnection Start,[tdt/device/../common/src/hdc_server.cpp:310:AcceptHdcSession]8454 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.854 [tdt/device/../common/src/log.cpp:158]HdcServer::RecvData thread = 281470605762992,[tdt/device/../common/src/hdc_server.cpp:154:RecvData]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.945 [tdt/device/../common/src/log.cpp:158]tsdaemon get process sign successfully, procpid:40927 signSize:48,[tdt/device/src/tsd/tsdaemon.cpp:901:FmkToTsdMsg]30221 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.963 [tdt/device/../common/src/log.cpp:149][TsdEVENT]FmkToTsdMsg dev[0] msg[6] sessionId[1] realDev[0] fmkSignPid[40927] profilingMode[0] rankSize[1],[tdt/device/src/tsd/tsdaemon.cpp:905:FmkToTsdMsg]30221
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:22.243.982 [tdt/device/../common/src/log.cpp:149][TsdEVENT]From FMK Start >>>>>>>>>> TSD dev[0] sessionId[1] realDev[0] fmkPid[40927] rankSize[1],[tdt/device/src/tsd/tsdaemon.cpp:853:FmkToTsdMsgProc]30221
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.000 [tdt/device/../common/src/log.cpp:158][TSDaemon] isAllLastRcvThreadClean_ value:0,[tdt/device/src/tsd/tsdaemon.cpp:819:CleanAllLastRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.013 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] JoinAllPPCRcvThreads() enter [threadSize=1]!,[tdt/device/src/tsd/ppc_server.cpp:96:JoinAllPPCRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.039 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] JoinAllPPCRcvThreads() [ppc tid=281470588977584] [threadSize=1] [freeThreadSize=1].,[tdt/device/src/tsd/ppc_server.cpp:105:JoinAllPPCRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.056 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] JoinAllPPCRcvThreads() [free tid=281470588977584].,[tdt/device/src/tsd/ppc_server.cpp:111:JoinAllPPCRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.071 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] JoinAllPPCRcvThreads() Find free tid and joinable.,[tdt/device/src/tsd/ppc_server.cpp:114:JoinAllPPCRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.086 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] JoinAllPPCRcvThreads() exit [threadSize=0] [freeThreadSize=0].,[tdt/device/src/tsd/ppc_server.cpp:129:JoinAllPPCRcvThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.101 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvHdcThreads() enter [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:333:CleanTsdRcvHdcThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.120 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvHdcThreads() [tid=281470597370288] [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:341:CleanTsdRcvHdcThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.139 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvHdcThreads() [tid=281470572192176] [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:341:CleanTsdRcvHdcThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.152 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvHdcThreads() exit [threadSize=0]!,[tdt/device/src/tsd/tsdaemon.cpp:346:CleanTsdRcvHdcThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.167 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvPPCThreads() enter [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:356:CleanTsdRcvPPCThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.181 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvPPCThreads() [tid=281470580584880] [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:364:CleanTsdRcvPPCThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.197 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvPPCThreads() [tid=281470563799472] [threadSize=2]!,[tdt/device/src/tsd/tsdaemon.cpp:364:CleanTsdRcvPPCThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.241 [tdt/device/../common/src/log.cpp:158][TSDaemon] CleanTsdRcvPPCThreads() exit [threadSize=0]!,[tdt/device/src/tsd/tsdaemon.cpp:369:CleanTsdRcvPPCThreads]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.259 [tdt/device/../common/src/log.cpp:158][TSDaemon] StartSubProcess deviceId: 0, fmkPid: 40927, sessionId: 1, state: 0,[tdt/device/src/tsd/tsdaemon.cpp:630:StartSubProcess]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.273 [tdt/device/../common/src/log.cpp:158][TSDaemon] StartSubProcess rankSize: 1,,[tdt/device/src/tsd/tsdaemon.cpp:635:StartSubProcess]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.285 [tdt/device/../common/src/log.cpp:158][TSDaemon] Process HCCP is abandoned to start, the rank size is 1,[tdt/device/src/tsd/tsdaemon.cpp:651:StartSubProcess]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.337 [tdt/device/../common/src/log.cpp:158][TSDaemon] start delete file, direct is /home/HwHiAiUser/hdcd/device0/,[tdt/device/src/tsd/tsdaemon.cpp:1878:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.367 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is .,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.382 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is ..,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.398 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is etc,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.412 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is upgrade,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.457 [tdt/device/../common/src/log.cpp:158][TSDaemon] ExecuteStart() [tid=281470563799472]!,[tdt/device/src/tsd/tsdaemon.cpp:477:ExecuteStart]30222 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.244.477 [tdt/device/../common/src/log.cpp:158][TSDaemon] check pathName[/var/aicpu_scheduler], pathLen[20] procName[aicpu_scheduler], len[15], MAX_LEN[256] ,[tdt/device/src/tsd/tsdaemon.cpp:1514:CheckProcessInputParam]30222 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:22.245.278 [tdt/device/../common/src/log.cpp:149][TsdEVENT]#### Start TSD->SubProcess[PROC] Start Msg Device[0] Proc[aicpu_scheduler] fmkPid[40927] fatherPid[8380] subPid[30223] #### profilingMode is[0],[tdt/device/src/tsd/tsdaemon.cpp:498:ExecuteStart]30222
[OPLOG] TDT(8380,tsdaemon):2020-05-12-11:05:22.245.328 [tdt/device/../common/src/log.cpp:151][tdt/device/src/tsd/tsdaemon.cpp:499:ExecuteStart]30222 alloc resource {devOS:[30223]} for {dev:0}
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.245.390 [tdt/device/../common/src/log.cpp:158][TSDaemon] SetTsdToFmkMsg:deviceId[0], sessionId[1], subProcPid[30223],[tdt/device/src/tsd/tsdaemon.cpp:774:SetTsdToFmkMsg]30222 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.279.681 [hardware/dev_plat/../dev_plat/devhdc/hdc_cfg_parse.c:190][CfgFileOpen:190] >>> /etc/hdcBasic.cfg not exist
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.279.712 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:554][hdcInit:554] >>> HDC pcie init,use default segment(524288)
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.279.765 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:539][hdcPcieInit:539] >>> after init hdc segment 524224, socket segment 0
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.279.780 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:574][hdcInit:574] >>> HDC init success.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.281.768 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.646658] [hdcdrv] [hdcdrv_config 2288] <aicpu_scheduler:30223> pid 30223 use segment 524224.
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.284.329 [tdt/device/../common/src/log.cpp:143]Register data type failed: hiaiSerializeFunc is existed,[tdt/common/common_inc/data_type_reg.h:192:Register]30223 Msg: func has already existed
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.284.723 [aicpu/aicpu_device/aicpu_schedule/compute_process/main.cc:185][AICPUFW] [main 185] Compute process(cloud) compile time is 02:13:48 Apr 26 2020
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.284.751 [aicpu/aicpu_device/aicpu_schedule/compute_process/main.cc:168][AICPUFW] [ParseArgs 168] Parse args success. deviceId=0, pid=40927, pidSign=e9b203cc443d80564c8c88a0d111cb95145fae36b00d1ec1, profilingMode=0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.284.960 [hardware/dev_plat/../dev_core/devdrv/devdrv_container.c:193][devdrv] [devdrv_do_container 193] para.num(4).
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.284.983 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:83][AICPUFW] [Start 83] Aicpu_scheduler will start, hostpid=40927, deviceId=0, hostDeviceId = 0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.284.999 [hardware/dev_plat/aicpufw/aicpufw_api.c:125][AICPUFW] [drvDevBindPid 125] drvDevBindPid enter: chip_id = 0, hostpid=40927, mode=0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.285.015 [hardware/dev_plat/aicpufw/aicpufw_dev.c:348][AICPUFW] [aicpufw_dev_bind_pid 348] chip0 aicpu bind pid (40927).
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.285.805 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.651711] [devdrv] [devdrv_manager_container_init_devlist_ns 1139] <aicpu_scheduler:30223> num(0), dev(0)
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.285.825 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.651716] [devdrv] [devdrv_manager_container_init_devlist_ns 1139] <aicpu_scheduler:30223> num(1), dev(1)
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.285.836 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.651719] [devdrv] [devdrv_manager_container_init_devlist_ns 1139] <aicpu_scheduler:30223> num(2), dev(2)
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.285.846 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.651722] [devdrv] [devdrv_manager_container_init_devlist_ns 1139] <aicpu_scheduler:30223> num(3), dev(3)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.370.966 [hardware/dev_plat/aicpufw/aicpufw_dev.c:76][AICPUFW] [aicpufw_dev_open 76] chip_id:0 is opened success, fd=18.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.371.155 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:283][AICPUFW] [InitFd 283] InitFd begin, deviceId=0.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:22.371.186 [datapreprocess/src/task_queue.cc:58][DP_PREPROCESS] [I] [datapreprocess/src/task_queue.cc:58] Begin create task queue eventfd.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:22.371.209 [datapreprocess/src/task_queue.cc:70][DP_PREPROCESS] [I] [datapreprocess/src/task_queue.cc:70] End create task queue eventfd 19.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:22.371.224 [datapreprocess/src/task_queue.cc:58][DP_PREPROCESS] [I] [datapreprocess/src/task_queue.cc:58] Begin create task queue eventfd.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:22.371.241 [datapreprocess/src/task_queue.cc:70][DP_PREPROCESS] [I] [datapreprocess/src/task_queue.cc:70] End create task queue eventfd 20.
[TRACE] DP(30223,aicpu_scheduler):2020-05-12-11:05:22.371.256 [status:START] [datapreprocess/src/task_queue.cc:262]DP_PREPROCESS module has been initialized
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.371.275 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:315][AICPUFW] [InitFd 315] InitFd end, deviceId=0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.371.288 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:100][AICPUFW] [Start 100] Begin start aicpu work task, deviceId=0, hostpid=40927.
[INFO] DEVMM(30223,aicpu_scheduler):2020-05-12-11:05:22.371.300 [hardware/dev_plat/../dev_plat/devmm/agentmm/agentmm_svm.c:137][drvMemInitSvmDevice 137] <curpid:30223,0xfe8dc010> init svm start pid=40927.
[INFO] DEVMM(30223,aicpu_scheduler):2020-05-12-11:05:22.372.570 [hardware/dev_plat/../dev_plat/devmm/agentmm/agentmm_svm.c:120][devmm_init_svm_device 120] <curpid:30223,0xfe8dc010> init svm (hpid:40927) succ.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.593 [hardware/dev_plat/aicpufw/aicpufw_api.c:89][AICPUFW] [drvCreateAicpuWorkTasks 89] drvCreateAicpuWorkTasks enter: chip_id = 0, pid=40927, mode=0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.610 [hardware/dev_plat/aicpufw/aicpufw_api.c:103][AICPUFW] [drvCreateAicpuWorkTasks 103] chip[0] start load kernel serve, pid=40927.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.739 [hardware/dev_plat/aicpufw/aicpufw_dev.c:226][AICPUFW] [aicpufw_dev_register_pid 226] chip0 register pid (40927).
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.839 [hardware/dev_plat/../dev_core/devdrv/devdrv_manager.c:838][devdrv] [drvGetCpuInfo 838] Dev[1] cpu info:1 14 1 4 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.875 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65531)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.891 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65515)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.902 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65530)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.914 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65514)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.923 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65529)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.935 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65513)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.943 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65528)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.955 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65512)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.964 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65527)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.976 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65511)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.985 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65526)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.372.998 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65510)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.007 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65525)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.019 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65509)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.029 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65524)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.041 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65508)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.050 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65523)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.061 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65507)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.070 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65522)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.081 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65506)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.090 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65521)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.101 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65505)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.110 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65520)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.122 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65504)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.130 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65519)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.142 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65503)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.150 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(65518)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.162 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(65502)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.170 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(0)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.182 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(0)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.191 [hardware/dev_plat/aicpufw/aicpufw_dev.c:209][AICPUFW] [aicpufw_dev_get_info 209] _ts_irq(0)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.202 [hardware/dev_plat/aicpufw/aicpufw_dev.c:210][AICPUFW] [aicpufw_dev_get_info 210] _cpu_irq(0)
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.215 [hardware/dev_plat/aicpufw/aicpufw_dev.c:142][AICPUFW] [aicpufw_dev_mmap 142] mmap opened fd=18.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.227 [hardware/dev_plat/aicpufw/aicpufw_dev.c:152][AICPUFW] [aicpufw_dev_mmap 152] start mmap, fd=18, offset: 4096, size: 258048.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.255 [hardware/dev_plat/aicpufw/aicpufw_dev.c:155][AICPUFW] [aicpufw_dev_mmap 155] finish mmap, fd=18, offset: 4096, size: 258048, addr: 0x0xfffefe877000.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.270 [hardware/dev_plat/aicpufw/aicpufw_thread.c:121][AICPUFW] [aicpufw_thread_data_init 121] chip[0] ts[0] finish mmap sram_offset[4096] sram_size[258048], ret[0]
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.287 [hardware/dev_plat/aicpufw/aicpufw_dev.c:142][AICPUFW] [aicpufw_dev_mmap 142] mmap opened fd=18.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.297 [hardware/dev_plat/aicpufw/aicpufw_dev.c:152][AICPUFW] [aicpufw_dev_mmap 152] start mmap, fd=18, offset: 262144, size: 1048576.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.323 [hardware/dev_plat/aicpufw/aicpufw_dev.c:155][AICPUFW] [aicpufw_dev_mmap 155] finish mmap, fd=18, offset: 262144, size: 1048576, addr: 0x0xfffefe777000.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.338 [hardware/dev_plat/aicpufw/aicpufw_thread.c:142][AICPUFW] [aicpufw_thread_data_init 142] chip[0] chip_info.chip_id is 0x6528.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.354 [hardware/dev_plat/aicpufw/aicpufw_dev.c:142][AICPUFW] [aicpufw_dev_mmap 142] mmap opened fd=18.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.368 [hardware/dev_plat/aicpufw/aicpufw_dev.c:152][AICPUFW] [aicpufw_dev_mmap 152] start mmap, fd=18, offset: 1310720, size: 4096.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.385 [hardware/dev_plat/aicpufw/aicpufw_dev.c:155][AICPUFW] [aicpufw_dev_mmap 155] finish mmap, fd=18, offset: 1310720, size: 4096, addr: 0x0xfffeffffa000.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.400 [hardware/dev_plat/aicpufw/aicpufw_thread.c:165][AICPUFW] [aicpufw_thread_data_init 165] finish mmap chip[0] ts[0] sram[0x0xfffefe877000]
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.420 [hardware/dev_plat/aicpufw/aicpufw_thread.c:710][AICPUFW] [aicpufw_thread_create 710] chip0 aicpu num: 14, first_aicpu: 2.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.441 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=0 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.454 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:0, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.852 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:0, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.872 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[0] id 281470656012688
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.874 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.737826] [aicpufw-drv] [aicpufw_init_dfx 467] <aicpu_scheduler:30223>there are 2 processes open,current tgid: 30223
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.881 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=0 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.892 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=1 begin
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.896 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.739661] [aicpufw-drv] [aicpufw_drv_add_match_info_check 1240] <aicpu_scheduler:30223>register pid(40927) ts_index(0) monitor_is_running(1).
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.900 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:1, ts_ind:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.910 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.739665] [aicpufw-drv] [aicpufw_drv_add_match_info 1282] <aicpu_scheduler:30223>register pid(40927) ts_index(0) monitor_is_running(1).
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.921 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.739680] [aicpufw-drv] [aicpufw_drv_get_moniter_info 1583] <aicpu_m_ioctl:8314>aicpufw event happened. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.930 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.739733] [devdrv] [devdrv_manager_get_cpu_info 1927] <aicpu_scheduler:30223> aicpu_num=14, ccpu_num=1
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.941 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740140] [aicpufw-drv] [aicpufw_drv_mmap 1123] <aicpu_scheduler:30223>mmap_sram,ts_index=0, offset=4096, ts_size=4096.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.940 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:1, ret:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.950 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740143] [aicpufw-drv] [aicpufw_drv_mmap_sram 974] <aicpu_scheduler:30223>sram status mem: virt_addr = 0xfffefe877000, tgid = 30223, size = 258048,offset = 4096, numa node = 0, ts = 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.952 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[1] id 281470647619984
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.962 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=1 end ret=0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.964 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740150] [aicpufw-drv] [aicpufw_drv_mmap_sram 995] <aicpu_scheduler:30223>finish sram status mem: virt_addr = 0xfffefe877000, tgid = 30223, size = 258048,offset = 4096, numa node = 0, ts = 0, ret = 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.973 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=2 begin
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.976 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740210] [aicpufw-drv] [aicpufw_drv_mmap 1127] <aicpu_scheduler:30223>mmap_gicd,ts_index=0, offset=262144, ts_size=4096, sram_size=258048.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.373.983 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:2, ts_ind:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.988 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740212] [aicpufw-drv] [aicpufw_drv_mmap_gicd 926] <aicpu_scheduler:30223>gicd status mem: virt_addr = 0xfffefe777000, tgid = 30223, size = 1048576,offset = 262144, numa node = 0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.373.999 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740278] [aicpufw-drv] [aicpufw_drv_mmap 1132] <aicpu_scheduler:30223>mmap_gicr,ts_index=0, offset=1310720, ts_size=4096, sram_size=258048, gicd_size=1048576.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.374.008 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.740280] [aicpufw-drv] [aicpufw_drv_mmap_gicr 892] <aicpu_scheduler:30223>ts gicr mem: virt_addr = 0xfffeffffa000, tgid = 30223, size = 4096,offset = 1310720, numa node = 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.011 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:2, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.023 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[2] id 281470639227280
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.033 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=2 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.042 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=3 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.051 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:3, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.079 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:3, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.090 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[3] id 281470630834576
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.099 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=3 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.109 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=4 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.118 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:4, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.154 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:4, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.168 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[4] id 281470622441872
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.178 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=4 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.190 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=5 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.199 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:5, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.230 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:5, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.243 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[5] id 281470614049168
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.253 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=5 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.265 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=6 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.275 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:6, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.307 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:6, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.321 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[6] id 281470605656464
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.330 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=6 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.342 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=7 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.351 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:7, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.381 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:7, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.395 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[7] id 281470597263760
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.404 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=7 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.415 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=8 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.425 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:8, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.455 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:8, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.469 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[8] id 281470588871056
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.478 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=8 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.490 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=9 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.499 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:9, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.528 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:9, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.541 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[9] id 281470580478352
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.550 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=9 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.561 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=10 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.571 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:10, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.599 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:10, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.612 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[10] id 281470572085648
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.621 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=10 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.633 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=11 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.642 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:11, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.676 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:11, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.690 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[11] id 281470563692944
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.699 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=11 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.711 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=12 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.720 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:12, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.748 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:12, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.762 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[12] id 281470555300240
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.771 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=12 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.783 [hardware/dev_plat/aicpufw/aicpufw_thread.c:720][AICPUFW] [aicpufw_thread_create 720] pthread_create for aicpu_index=13 begin
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.792 [hardware/dev_plat/aicpufw/aicpufw_thread.c:656][AICPUFW] [aicpufw_thread_create_one 656] thread_create_one chip_id:0, aicpu_index:13, ts_ind:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.820 [hardware/dev_plat/aicpufw/aicpufw_thread.c:682][AICPUFW] [aicpufw_thread_create_one 682] thread_create_one aicpu_index:13, ret:0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.833 [hardware/dev_plat/aicpufw/aicpufw_thread.c:689][AICPUFW] [aicpufw_thread_create_one 689] thread[13] id 281470546907536
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.842 [hardware/dev_plat/aicpufw/aicpufw_thread.c:722][AICPUFW] [aicpufw_thread_create 722] pthread_create for aicpu_index=13 end ret=0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.854 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.864 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.876 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.885 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.897 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.907 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.918 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.928 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.940 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.949 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.961 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.970 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.982 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.374.992 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.003 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.012 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.024 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.033 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.045 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.054 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.065 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.075 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.086 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.095 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.107 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.117 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.129 [hardware/dev_plat/aicpufw/aicpufw_thread.c:731][AICPUFW] [aicpufw_thread_create 731] sem_post start chip_id=0, ts_ind=0, aicpu_index13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.138 [hardware/dev_plat/aicpufw/aicpufw_thread.c:733][AICPUFW] [aicpufw_thread_create 733] sem_post end chip_id=0, ts_ind=0, aicpu_index13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.150 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.217 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 5] begin, tid: 30230.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.329 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 6] begin, tid: 30231.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.366 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 7, chip 0, ts 0, aicpu_index 5, thread id 30230
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.411 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.443 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 3] begin, tid: 30228.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.503 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 8] begin, tid: 30233.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.550 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 9] begin, tid: 30234.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.602 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 4] begin, tid: 30229.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.634 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 8, chip 0, ts 0, aicpu_index 6, thread id 30231
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.673 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.690 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.706 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.721 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.741 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.762 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.776 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.799 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 11, chip 0, ts 0, aicpu_index 9, thread id 30234
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.824 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.841 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.855 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.870 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.900 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 0] begin, tid: 30225.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.375.963 [hardware/dev_plat/../dev_core/devdrv/devdrv_aicpu.c:449][devdrv] [devdrv_load_kernel_serve_thread 449] thread for load kernel start.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.027 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[11] begin, tid: 30236.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.081 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[12] begin, tid: 30237.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.127 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 7] begin, tid: 30232.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.179 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[13] begin, tid: 30238.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.204 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 5, chip 0, ts 0, aicpu_index 3, thread id 30228
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.231 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.247 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.259 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.273 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.291 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 6, chip 0, ts 0, aicpu_index 4, thread id 30229
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.316 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.332 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.346 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.362 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.391 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 15, chip 0, ts 0, aicpu_index 13, thread id 30238
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.431 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.447 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.462 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.477 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.500 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 2, chip 0, ts 0, aicpu_index 0, thread id 30225
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.528 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.545 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.559 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.578 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.604 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[10] begin, tid: 30235.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.657 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 2] begin, tid: 30227.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.706 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:181][AICPUFW] [EventThreadTask 181] Aicpu device[0]:thread[ 1] begin, tid: 30226.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.731 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 13, chip 0, ts 0, aicpu_index 11, thread id 30236
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.760 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 3, chip 0, ts 0, aicpu_index 1, thread id 30226
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.785 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.801 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.815 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.830 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.851 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 4, chip 0, ts 0, aicpu_index 2, thread id 30227
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.878 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.894 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.908 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.925 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.945 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index0
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.963 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.978 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index1
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.376.993 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.008 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index2
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.023 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.039 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index3
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.054 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.070 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index4
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.085 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.099 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index5
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.114 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.130 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index6
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.145 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.168 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 9, chip 0, ts 0, aicpu_index 7, thread id 30232
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.195 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.211 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.225 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.242 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.261 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 10, chip 0, ts 0, aicpu_index 8, thread id 30233
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.284 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.300 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.315 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.330 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.351 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 12, chip 0, ts 0, aicpu_index 10, thread id 30235
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.377 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.393 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.407 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.422 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.441 [hardware/dev_plat/aicpufw/aicpufw_thread.c:475][AICPUFW] [aicpufw_thread_set_affinity 475] thread_set_affinity, bind_cpu_index 14, chip 0, ts 0, aicpu_index 12, thread id 30237
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.465 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.481 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.496 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.510 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.527 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index7
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.543 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.558 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index8
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.572 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.587 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index9
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.602 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.618 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index10
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.633 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.652 [hardware/dev_plat/aicpufw/aicpufw_thread.c:536][AICPUFW] [aicpufw_thread_callback 536] sem_wait start, chip_id: 0, ts_ind: 0, aicpu index: 11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.668 [hardware/dev_plat/aicpufw/aicpufw_thread.c:538][AICPUFW] [aicpufw_thread_callback 538] sem_wait end, chip_id: 0, ts_ind: 0, aicpu index: 11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.692 [hardware/dev_plat/aicpufw/aicpufw_thread.c:546][AICPUFW] [aicpufw_thread_callback 546] sem_post start, chip_id: 0, ts_ind: 0, aicpu index: 11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.703 [hardware/dev_plat/aicpufw/aicpufw_thread.c:548][AICPUFW] [aicpufw_thread_callback 548] sem_post end, chip_id: 0, ts_ind: 0, aicpu index: 11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.713 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index11
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.725 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.734 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index12
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.742 [hardware/dev_plat/aicpufw/aicpufw_thread.c:737][AICPUFW] [aicpufw_thread_create 737] sem_wait start chip_id=0, ts_ind=0, aicpu_index13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.749 [hardware/dev_plat/aicpufw/aicpufw_thread.c:739][AICPUFW] [aicpufw_thread_create 739] sem_wait end chip_id=0, ts_ind=0, aicpu_index13
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.761 [hardware/dev_plat/aicpufw/aicpufw_timer.c:133][AICPUFW] [aicpufw_timer_init 133] aicpufw_timer_init end, AICPU_TASK_TIMEOUT=30.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.377.775 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.743120] [devdrv] [devdrv_manager_get_kernel_lib_process 198] <devdrv_load_ser:30223> begin to get kernel lib, device pid: 30223.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.377.790 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.743123] [devdrv] [devdrv_manager_get_kernel_lib_process 211] <devdrv_load_ser:30223> host_pid: 40927.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.377.799 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.743131] [devdrv] [devdrv_manager_get_kernel_lib_process 247] <devdrv_load_ser:30223> begin to wait, host pid: 40927, device pid: 30223.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.833 [hardware/dev_plat/aicpufw/aicpufw_supervisor_thread.c:173][AICPUFW] [aicpufw_sup_thread_proc 173] supervisor thread begin, current tid:30239 thread
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.854 [hardware/dev_plat/aicpufw/aicpufw_thread.c:797][AICPUFW] [aicpufw_thread_init 797] chip_id 0 thread init end.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.866 [hardware/dev_plat/aicpufw/aicpufw_api.c:77][AICPUFW] [aicpufw_create_work_tasks 77] drvCreateAicpuWorkTasks exit: chip_id = 0, pid=40927, mode=0.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.377.892 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:233][AICPUFW] [StartTdtServer 233] Start tdt server, deviceId=0.
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.377.932 [tdt/device/../common/src/log.cpp:158]BindCpu init and bindCoreList size is 1,[tdt/device/src/hdc/bind_cpu.cpp:38:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.034 [tdt/device/../common/src/log.cpp:158]BindCpu cpu core num = 64,[tdt/device/src/hdc/bind_cpu.cpp:40:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.051 [tdt/device/../common/src/log.cpp:158]BindCoreList member is 1,[tdt/device/src/hdc/bind_cpu.cpp:42:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.075 [tdt/device/../common/src/log.cpp:158]Begin to Init tdtserver [devicID_=0] and [newInputDeviceId=0],[tdt/device/src/hdc/tdt_server_impl.cpp:664:Init]30223 Msg: running ok
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.110 [tdt/device/../common/src/log.cpp:143]{"Start":"TDT_RECV"},[tdt/device/../common/src/statistic.cpp:113:PeriodStatisticManager]30223 Msg: warnging
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.128 [tdt/device/../common/src/log.cpp:143]{"Start":"DP_ENQUEUE"},[tdt/device/../common/src/statistic.cpp:114:PeriodStatisticManager]30223 Msg: warnging
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.147 [tdt/device/../common/src/log.cpp:143]{"Start":"RECV_ENLARGE"},[tdt/device/../common/src/statistic.cpp:115:PeriodStatisticManager]30223 Msg: warnging
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.167 [tdt/device/../common/src/log.cpp:143]{"Start":"RECV_REDUCE"},[tdt/device/../common/src/statistic.cpp:116:PeriodStatisticManager]30223 Msg: warnging
[WARNING] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.185 [tdt/device/../common/src/log.cpp:143]{"Start":"RELEASE_DATA"},[tdt/device/../common/src/statistic.cpp:117:PeriodStatisticManager]30223 Msg: warnging
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.285 [tdt/device/../common/src/log.cpp:158]HdcServer::Init Start,[tdt/device/../common/src/hdc_server.cpp:104:Init]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.378.335 [hardware/dev_plat/../dev_plat/devhdc/hdc_server.c:287][drvHdcServerCreate:287] >>> create server (listen device: 0) success
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.356 [tdt/device/../common/src/log.cpp:158]HdcCommon::InitMsgSize Start,[tdt/device/../common/src/hdc_common.cpp:28:InitMsgSize]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.374 [tdt/device/../common/src/log.cpp:158]msgMaxSize_ = 524224, msgShortHeadDataMaxSize_ = 524212 msgLongHeadDataMaxSize_ = 524200,[tdt/device/../common/src/hdc_common.cpp:42:InitMsgSize]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.410 [tdt/device/../common/src/log.cpp:158]hdcserver in tdtserver is already initialed,[tdt/device/src/hdc/tdt_server_impl.cpp:652:InitDirectly]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.428 [tdt/device/../common/src/log.cpp:158]Begin to init device's hdc channel of tdt.,[tdt/device/src/hdc/tdt_server.cpp:32:TDTServerInit]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.448 [tdt/device/../common/src/log.cpp:158]begin tdt device init, [deviceId=0],[tdt/device/src/hdc/tdt_device_impl.cpp:65:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.466 [tdt/device/../common/src/log.cpp:158]TuningDataTransfer is initialized with deviceID:0,[tdt/device/src/hdc/tuning_data_transfer.cpp:71:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.640 [tdt/device/../common/src/log.cpp:158]HdcCommon::InitMsgSize Start,[tdt/device/../common/src/hdc_common.cpp:28:InitMsgSize]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.657 [tdt/device/../common/src/log.cpp:158]msgMaxSize_ = 524224, msgShortHeadDataMaxSize_ = 524212 msgLongHeadDataMaxSize_ = 524200,[tdt/device/../common/src/hdc_common.cpp:42:InitMsgSize]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.673 [tdt/device/../common/src/log.cpp:158]capacity.maxSegment: 524224,[tdt/device/../common/src/hdc_client.cpp:152:Init]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.690 [tdt/device/../common/src/log.cpp:158]HdcClient::CreateHdcSession Start,[tdt/device/../common/src/hdc_client.cpp:284:CreateHdcSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.809 [tdt/device/../common/src/log.cpp:158]HdcServer::Accept thread = 281470521594256,[tdt/device/../common/src/hdc_server.cpp:259:Accept]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.878 [tdt/device/../common/src/log.cpp:158][HdcClient] deviceId: 0 connect session,[tdt/device/../common/src/hdc_client.cpp:306:CreateHdcSession]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.378.908 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:1609][drvHdcSetSessionReference:1609] >>> session 56, pid 30223
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.926 [tdt/device/../common/src/log.cpp:158][HdcClient] drvHdcSetSessionReference success,[tdt/device/../common/src/hdc_client.cpp:317:CreateHdcSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.378.945 [tdt/device/../common/src/log.cpp:158][HdcClient] deviceId: 0 connect session and drvHdcSetSessionReference success, sessionId=1,[tdt/device/../common/src/hdc_client.cpp:327:CreateHdcSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.006 [tdt/device/../common/src/log.cpp:158][HdcClient] SendPidMsg sessionId=1, tdt_main_pid=30223,[tdt/device/../common/src/hdc_client.cpp:346:sendPidMsg]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.073 [tdt/device/../common/src/log.cpp:158]TuningDataTransfer tdt client wait for send data begin,[tdt/device/src/hdc/tuning_data_transfer.cpp:173:CreateSingleSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.101 [tdt/device/../common/src/log.cpp:158]BindCPUCore Start,[tdt/device/src/hdc/bind_cpu.cpp:58:BindCPUCore]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.213 [tdt/device/../common/src/log.cpp:158]BindCpu thread=281470521594256 CPU_ISSET successfully on processor[1],[tdt/device/src/hdc/bind_cpu.cpp:103:BindCPUCore]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.239 [tdt/device/../common/src/log.cpp:158]Thread[281470521594256] bindCpu setaffinity successfully,[tdt/device/src/hdc/bind_cpu.cpp:109:BindCPUCore]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.282 [tdt/device/../common/src/log.cpp:158]Free Tdt thread ID = 281470529986960,[tdt/device/src/hdc/tdt_server_impl.cpp:555:FreeTdtMemoryThread]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.304 [tdt/device/../common/src/log.cpp:158]BindCPUCore Start,[tdt/device/src/hdc/bind_cpu.cpp:58:BindCPUCore]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.359 [tdt/device/../common/src/log.cpp:158]TimerFunc thread = 281470538379664,[tdt/device/../common/src/statistic.cpp:140:TimerFunc]30240 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.377 [tdt/device/../common/src/log.cpp:158]HdcServer::AcceptConnection Start,[tdt/device/../common/src/hdc_server.cpp:310:AcceptHdcSession]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.403 [tdt/device/../common/src/log.cpp:158]BindCpu thread=281470529986960 CPU_ISSET successfully on processor[1],[tdt/device/src/hdc/bind_cpu.cpp:103:BindCPUCore]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.421 [tdt/device/../common/src/log.cpp:158]BindCPUCore Start,[tdt/device/src/hdc/bind_cpu.cpp:58:BindCPUCore]30240 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.476 [tdt/device/../common/src/log.cpp:158]BindCPUCore Start,[tdt/device/src/hdc/bind_cpu.cpp:58:BindCPUCore]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.496 [tdt/device/../common/src/log.cpp:158]Thread[281470529986960] bindCpu setaffinity successfully,[tdt/device/src/hdc/bind_cpu.cpp:109:BindCPUCore]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.516 [tdt/device/../common/src/log.cpp:158][deviceId=0] Free Tdt thread is success to bind cpu core,[tdt/device/src/hdc/tdt_server_impl.cpp:559:FreeTdtMemoryThread]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.534 [tdt/device/../common/src/log.cpp:158]BindCpu thread=281470538379664 CPU_ISSET successfully on processor[1],[tdt/device/src/hdc/bind_cpu.cpp:103:BindCPUCore]30240 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.550 [tdt/device/../common/src/log.cpp:158]Thread[281470538379664] bindCpu setaffinity successfully,[tdt/device/src/hdc/bind_cpu.cpp:109:BindCPUCore]30240 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.576 [tdt/device/../common/src/log.cpp:158]BindCpu thread=281470513033616 CPU_ISSET successfully on processor[1],[tdt/device/src/hdc/bind_cpu.cpp:103:BindCPUCore]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.591 [tdt/device/../common/src/log.cpp:158]Thread[281470513033616] bindCpu setaffinity successfully,[tdt/device/src/hdc/bind_cpu.cpp:109:BindCPUCore]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.637 [tdt/device/../common/src/log.cpp:158]TuningDataTransfer find channel OK, sessionID: 1,[tdt/device/src/hdc/tuning_data_transfer.cpp:126:SetSendFlagBySessionID]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.664 [tdt/device/../common/src/log.cpp:158]TuningDataTransfer tdt client wait for send data end,[tdt/device/src/hdc/tuning_data_transfer.cpp:178:CreateSingleSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.688 [tdt/device/../common/src/log.cpp:158]Begin to start send thread.,[tdt/device/src/hdc/tdt_device_impl.cpp:179:StartSendThread]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.732 [tdt/device/../common/src/log.cpp:158]Success start send thread.,[tdt/device/src/hdc/tdt_device_impl.cpp:181:StartSendThread]30223 Msg: running ok
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.379.746 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:265][AICPUFW] [StartTdtServer 265] TDT server init success.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.379.758 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:135][AICPUFW] [Start 135] Aicpu_scheduler has started, deviceId=0, hostpid=40927.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:22.379.771 [aicpu/aicpu_device/aicpu_schedule/compute_process/main.cc:206][AICPUFW] [main 206] Compute process start success.
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.793 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] TsdWaitForShutdownProc() deviceId: 0 waitType:1(0:HCCP,1:COMPUTE) is running,[tdt/device/src/tsd/ppc_client.cpp:220:TsdWaitForShutdownProc]30223 Msg: running ok
[EVENT] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.810 [tdt/device/../common/src/log.cpp:149][PPCCLIEVENT] TsdWaitForShutdown!,[tdt/device/src/tsd/ppc_client.cpp:225:TsdWaitForShutdownProc]30223
[EVENT] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.826 [tdt/device/../common/src/log.cpp:149][PPCCLIEVENT] TsdWaitForShutdown Start Rsp device[0] procType:1(0:HCCP,1:COMPUTE), subProcPid[30223] ,[tdt/device/src/tsd/ppc_client.cpp:244:TsdWaitForShutdownProc]30223
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.846 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] GetPpcSession() will Create session use serverId: 19280,[tdt/device/src/tsd/ppc_client.cpp:110:GetPpcSession]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:22.379.899 [hardware/dev_plat/../dev_plat/devhdc/hdc_ppc.c:129][drvPpcSessionConnect:129] >>> Ppc connect session 26, pid 30223
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.917 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] [serverId=19280] GetPpcSession() receive thread has been started,[tdt/device/src/tsd/ppc_client.cpp:118:GetPpcSession]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.379.935 [tdt/device/../common/src/log.cpp:158]PpcInterface::SendMsg,size=10, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:39:SendMsg]30223 Msg: running ok
[INFO] HDC(8380,tsdaemon):2020-05-12-11:05:22.379.941 [hardware/dev_plat/../dev_plat/devhdc/hdc_ppc.c:268][drvPpcSessionAccept:268] >>> Ppc Accept Session 16, Server fd 12, pid 8380
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.380.006 [tdt/device/../common/src/log.cpp:158]tdt device begin to send to host.,[tdt/device/src/hdc/tdt_device_impl.cpp:275:TdtDeviceSendImpl]30244 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:22.380.025 [tdt/device/../common/src/log.cpp:158]begin to check queueDataSize_.,[tdt/device/src/hdc/tdt_device_impl.cpp:286:TdtDeviceSendImpl]30244 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.091 [tdt/device/../common/src/log.cpp:158]PpcInterface::RecvMsg, size=10, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:122:RecvMsg]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.114 [tdt/device/../common/src/log.cpp:158][PpcServer] SetPpcSession, deviceId:0, subProcPid:30223,[tdt/device/src/tsd/ppc_server.cpp:338:SetPpcSession]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.139 [tdt/device/../common/src/log.cpp:158][TSDaemon]PPCSerToTsdMsg deviceId[0], subProcPid[30223]msgType[0](0:START RSP,2:SHUTDOWN,1:SHUTDOWN RSP,3:SOCKET CLOSE), procType[1](0:HCCP,1:COMPUTE) state[6],[tdt/device/src/tsd/tsdaemon.cpp:1441:PPCSerToTsdMsg]30245 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.160 [tdt/device/../common/src/log.cpp:149][TsdEVENT] #### PPCSer->TSD RspMsg device[0] Start Rsp procType[1] ####,[tdt/device/src/tsd/tsdaemon.cpp:1406:PPCSerToTsdProc]30245
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.205 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] ##### [threadName:ppc_srv_recv_0] RecvData [ret=16846848] Save [notifyDeviceId:0] [notifyProcType:1]####,[tdt/device/src/tsd/ppc_server.cpp:242:RecvData]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.256 [tdt/device/../common/src/log.cpp:158][TSDaemon] StartRspProc() [dev=0][subProcPid=30223][procType=1(0:HCCP,1:COMPUTE)][tid=281470572192176]!,[tdt/device/src/tsd/tsdaemon.cpp:1190:StartRspProc]30246 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.276 [tdt/device/../common/src/log.cpp:158][TSDaemon] StartRspProc curState is: 6,[tdt/device/src/tsd/tsdaemon.cpp:1193:StartRspProc]30246 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.295 [tdt/device/../common/src/log.cpp:158][TSDaemon] GetTsdToFmkMsg deviceId[0] subProcPid[30223],[tdt/device/src/tsd/tsdaemon.cpp:959:GetTsdToFmkMsg]30246 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.314 [tdt/device/../common/src/log.cpp:158][TSDaemon] tsdToFmkSessionIdMap size = 1,[tdt/device/src/tsd/tsdaemon.cpp:964:GetTsdToFmkMsg]30246 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.331 [tdt/device/../common/src/log.cpp:158][TSDaemon] tsdToFmkSessionIdMap [deviceId] size = 1,[tdt/device/src/tsd/tsdaemon.cpp:967:GetTsdToFmkMsg]30246 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.349 [tdt/device/../common/src/log.cpp:158][TSDaemon] SendRspMsgToFmk deviceId: 0, subProcPid: 30223, sessionID: 1,[tdt/device/src/tsd/tsdaemon.cpp:1016:SendRspMsgToFmk]30246 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.364 [tdt/device/../common/src/log.cpp:149][TsdEVENT] #### Start Rsp TSD->FMK device[0] sessionID[1] realDeviceId[0]####,[tdt/device/src/tsd/tsdaemon.cpp:1018:SendRspMsgToFmk]30246
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:22.380.402 [tdt/device/../common/src/log.cpp:158][TSDaemon] StartRspProc subRunState is: 3,[tdt/device/src/tsd/tsdaemon.cpp:1173:TsdWaitRspProcForStar]30246 Msg: running ok
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:22.381.752 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48430.745228] [hdcdrv] [hdcdrv_server_create 2330] <aicpu_scheduler:30223> dev_id 0 service_type service_TDT server create
[EVENT] TDT(30223,aicpu_scheduler):2020-05-12-11:05:23.379.610 [tdt/device/../common/src/log.cpp:149],[tdt/common/common_inc/queue_manager.h:683:ShowSizeForEveryChannel]30240
[EVENT] TDT(30223,aicpu_scheduler):2020-05-12-11:05:23.379.656 [tdt/device/../common/src/log.cpp:149]"DeviceSendPool: " "DeviceRecvPool: " "HostRecvPool: " "DeviceCtrlPool: {SendPool: 0, FreePool: 0}, {RecvPool: 0, FreePool: 0}",[tdt/device/../common/src/memory_pool.cpp:707:GetDevicePoolStatus]30240
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:23.379.673 [tdt/device/../common/src/log.cpp:158]DeviceRecNormalData: Device receive normal message number:0,[tdt/device/../common/src/memory_pool.cpp:709:GetDevicePoolStatus]30240 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.322 [tdt/device/../common/src/log.cpp:158]tsdaemon get process sign successfully, procpid:0 signSize:0,[tdt/device/src/tsd/tsdaemon.cpp:901:FmkToTsdMsg]30221 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.360 [tdt/device/../common/src/log.cpp:149][TsdEVENT]FmkToTsdMsg dev[0] msg[7] sessionId[1] realDev[0] fmkSignPid[0] profilingMode[0] rankSize[1],[tdt/device/src/tsd/tsdaemon.cpp:905:FmkToTsdMsg]30221
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.375 [tdt/device/../common/src/log.cpp:149][TsdEVENT]From FMK Close <<<<<<<<<<< TSDdev[0] sessionId[1] realDev[0] fmkPid[0],[tdt/device/src/tsd/tsdaemon.cpp:861:FmkToTsdMsgProc]30221
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.397 [tdt/device/../common/src/log.cpp:158][TSDaemon] Begin closeSubProcess deviceId:0, sessionId:1, cpProcPid:30223, hccpProcPid:0, subProcState:3,[tdt/device/src/tsd/tsdaemon.cpp:729:CloseSubProcess]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.414 [tdt/device/../common/src/log.cpp:158][TSDaemon] SetTsdToFmkMsg:deviceId[0], sessionId[1], subProcPid[30223],[tdt/device/src/tsd/tsdaemon.cpp:774:SetTsdToFmkMsg]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.429 [tdt/device/../common/src/log.cpp:158][TSDaemon] Process HCCP is abandoned to close, the rank size is 1,[tdt/device/src/tsd/tsdaemon.cpp:747:CloseSubProcess]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.473 [tdt/device/../common/src/log.cpp:158][TSDaemon] start delete file, direct is /home/HwHiAiUser/hdcd/device0/,[tdt/device/src/tsd/tsdaemon.cpp:1878:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.509 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is .,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.524 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is ..,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.541 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is etc,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.560 [tdt/device/../common/src/log.cpp:158][TSDaemon] start scan file, ent name is upgrade,[tdt/device/src/tsd/tsdaemon.cpp:1887:DeleteFileByPath]30221 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.624 [tdt/device/../common/src/log.cpp:158][PpcServer] GetPpcSession, deviceId:0, subProcPid:30223,[tdt/device/src/tsd/ppc_server.cpp:324:GetPpcSession]30255 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.648 [tdt/device/../common/src/log.cpp:158]PpcInterface::SendMsg,size=12, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:39:SendMsg]30255 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.686 [tdt/device/../common/src/log.cpp:158][TSDaemon] Begin ExecuteClose deviceId:0, subProcPid:30223,[tdt/device/src/tsd/tsdaemon.cpp:703:ExecuteClose]30255 Msg: running ok
[OPLOG] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.704 [tdt/device/../common/src/log.cpp:151][tdt/device/src/tsd/tsdaemon.cpp:705:ExecuteClose]30255 free resource {devOS:[30223]} for {dev:0}
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.731 [tdt/device/../common/src/log.cpp:149][TsdEVENT] #### Send TSD->SubProcess[PPCSer] Close Msg Device[0] proType[1] [tid=281470597370288]####,[tdt/device/src/tsd/tsdaemon.cpp:709:ExecuteClose]30255
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.728 [tdt/device/../common/src/log.cpp:158]PpcInterface::RecvMsg, size=12, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:122:RecvMsg]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.755 [tdt/device/../common/src/log.cpp:158]PpcClient::RecvMsgProc, size=12, subpid=30223,[tdt/device/src/tsd/ppc_client.cpp:150:RecvMsgProc]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.773 [tdt/device/../common/src/log.cpp:158]PpcClient::RecvMsgProc, subpid1=30223, subpid2=30223,[tdt/device/src/tsd/ppc_client.cpp:160:RecvMsgProc]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.789 [tdt/device/../common/src/log.cpp:158]PpcInterface::SendMsg,size=12, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:39:SendMsg]30223 Msg: running ok
[EVENT] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.816 [tdt/device/../common/src/log.cpp:149][PPCCLIEVENT] #### PPCClient->TSD Close Rsp send OK device[0] procType:1 ####,[tdt/device/src/tsd/ppc_client.cpp:166:RecvMsgProc]30223
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.832 [tdt/device/../common/src/log.cpp:158][PPCClient]Process Exit DevId:0 procType:1(0:HCCP,1:COMPUTE),[tdt/device/src/tsd/ppc_client.cpp:200:RecvData]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.063.848 [tdt/device/../common/src/log.cpp:158]TsdWaitForShutdown exit,[tdt/device/src/tsd/ppc_client.cpp:274:TsdWaitForShutdown]30223 Msg: running ok
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:24.063.861 [aicpu/aicpu_device/aicpu_schedule/compute_process/main.cc:214][AICPUFW] [main 214] Tsd wait for shut down success.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:24.063.872 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:272][AICPUFW] [StopTdtServer 272] Stop tdt server, deviceId=0.
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.063.987 [tdt/device/../common/src/log.cpp:158]PpcInterface::RecvMsg, size=12, subpid=30223,[tdt/device/src/tsd/ppc_interface.cpp:122:RecvMsg]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.025 [tdt/device/../common/src/log.cpp:158][PpcServer] SetPpcSession, deviceId:0, subProcPid:30223,[tdt/device/src/tsd/ppc_server.cpp:338:SetPpcSession]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.057 [tdt/device/../common/src/log.cpp:158][TSDaemon]PPCSerToTsdMsg deviceId[0], subProcPid[30223]msgType[1](0:START RSP,2:SHUTDOWN,1:SHUTDOWN RSP,3:SOCKET CLOSE), procType[1](0:HCCP,1:COMPUTE) state[6],[tdt/device/src/tsd/tsdaemon.cpp:1441:PPCSerToTsdMsg]30245 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.078 [tdt/device/../common/src/log.cpp:149][TsdEVENT] #### PPCSer->TSD RspMsg device[0] Close Rsp procType[1] ####,[tdt/device/src/tsd/tsdaemon.cpp:1415:PPCSerToTsdProc]30245
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.195 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] ##### [threadName:ppc_srv_recv_0] RecvData [ret=16846848] Save [notifyDeviceId:0] [notifyProcType:1]####,[tdt/device/src/tsd/ppc_server.cpp:242:RecvData]30245 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.227 [tdt/device/../common/src/log.cpp:158]tdtserver is destroying, [devicID_=0],[tdt/device/src/hdc/tdt_server_impl.cpp:709:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.248 [tdt/device/../common/src/log.cpp:158]Stop QueueManager success,[tdt/device/src/hdc/tdt_server_impl.cpp:716:Destroy]30223 Msg: running ok
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:24.064.263 [datapreprocess/src/dp_interface.cc:288][DP_PREPROCESS] [I] [datapreprocess/src/dp_interface.cc:288] Release blocked TDT threads.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:24.064.280 [datapreprocess/src/dp_interface.cc:406][DP_PREPROCESS] [I] [datapreprocess/src/dp_interface.cc:406] Begin write queue blocking eventfd of source().
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:24.064.294 [datapreprocess/src/dp_interface.cc:409][DP_PREPROCESS] [I] [datapreprocess/src/dp_interface.cc:409] Got empty source name, start write all blocking eventfd.
[INFO] DP(30223,aicpu_scheduler):2020-05-12-11:05:24.064.305 [datapreprocess/src/dp_interface.cc:291][DP_PREPROCESS] [I] [datapreprocess/src/dp_interface.cc:291] All TDT threads mark info memory have released.
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.320 [tdt/device/../common/src/log.cpp:158]DataPreprocess exited,[tdt/device/src/hdc/tdt_server_impl.cpp:720:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.338 [tdt/device/../common/src/log.cpp:158]Enqueue Thread exited,[tdt/device/src/hdc/tdt_server_impl.cpp:722:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.368 [tdt/device/../common/src/log.cpp:158][deviceId=0] Free Tdt thread is exist,total free number = 0,enqueue number is = 0,[tdt/device/src/hdc/tdt_server_impl.cpp:570:FreeTdtMemoryThread]30241 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.412 [tdt/device/../common/src/log.cpp:158]Free Thread exited,[tdt/device/src/hdc/tdt_server_impl.cpp:725:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.064.428 [tdt/device/../common/src/log.cpp:158]enter HdcServer::Destroy() function,[tdt/device/../common/src/hdc_server.cpp:582:Destroy]30223 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.462 [tdt/device/../common/src/log.cpp:158][TSDaemon] CloseRspProc() [dev=0][subPid=30223][procType=1(0:HCCP,1:COMPUTE)][tid=281470588977584]!,[tdt/device/src/tsd/tsdaemon.cpp:1267:CloseRspProc]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.485 [tdt/device/../common/src/log.cpp:158][TSDaemon] CloseRspProc curState is: 6,[tdt/device/src/tsd/tsdaemon.cpp:1271:CloseRspProc]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.508 [tdt/device/../common/src/log.cpp:158][TSDaemon] GetTsdToFmkMsg deviceId[0] subProcPid[30223],[tdt/device/src/tsd/tsdaemon.cpp:959:GetTsdToFmkMsg]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.527 [tdt/device/../common/src/log.cpp:158][TSDaemon] tsdToFmkSessionIdMap size = 1,[tdt/device/src/tsd/tsdaemon.cpp:964:GetTsdToFmkMsg]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.544 [tdt/device/../common/src/log.cpp:158][TSDaemon] tsdToFmkSessionIdMap [deviceId] size = 1,[tdt/device/src/tsd/tsdaemon.cpp:967:GetTsdToFmkMsg]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.568 [tdt/device/../common/src/log.cpp:158][TSDaemon] subProcessPid is 30223 on device[0], procType[1](0:HCCP PROC,1:COMPUTE PROC),[tdt/device/src/tsd/tsdaemon.cpp:1085:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.064.588 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.074.535 [hardware/dev_plat/../dev_plat/devhdc/hdc_server.c:367][drvHdcServerDestroy:367] >>> destroy server success, deviceId 0, serviceType 10
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.074.556 [hardware/dev_plat/../dev_plat/devhdc/hdc_server.c:236][drvHdcPcieSessionAccept:236] >>> device:0 server 10 is destroyed, ret:18
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.575 [tdt/device/../common/src/log.cpp:158]drv accept exception, because acceptSwitch has been set false, ret=18,[tdt/device/../common/src/hdc_server.cpp:324:AcceptHdcSession]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.592 [tdt/device/../common/src/log.cpp:158]acceptSwitch has been set false,[tdt/device/../common/src/hdc_server.cpp:269:Accept]30242 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.634 [tdt/device/../common/src/log.cpp:158][HdcServer] SessionIdPidMsg enter into ClearSessionIdPid,[tdt/device/../common/src/hdc_server.cpp:495:ClearSessionIdPid]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.658 [tdt/device/../common/src/log.cpp:158]Begin StopMemoryPool,[tdt/device/../common/src/memory_pool.cpp:3389:StopMemoryPool]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.685 [tdt/device/../common/src/log.cpp:158]DestroyMemoryPool, memoryEnd = 2,[tdt/device/../common/src/memory_pool.cpp:1323:DestroyMemoryPool]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.703 [tdt/device/../common/src/log.cpp:158]FreeMemoryByMap, memoryEnd = 2, memoryType = 1, devId_ = 0,[tdt/device/../common/src/memory_pool.cpp:1131:FreeMemoryByMap]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.721 [tdt/device/../common/src/log.cpp:158]FreeMemoryByMap, memoryEnd = 2, memoryType = 1, devId_ = 0,[tdt/device/../common/src/memory_pool.cpp:1131:FreeMemoryByMap]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.743 [tdt/device/../common/src/log.cpp:158]hdcServer_ destroyed,[tdt/device/src/hdc/tdt_server_impl.cpp:735:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.759 [tdt/device/../common/src/log.cpp:158]Begin to destroy device's hdc channel of tdt.,[tdt/device/src/hdc/tdt_server.cpp:48:TDTServerStop]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.774 [tdt/device/../common/src/log.cpp:158]begin to destroy.,[tdt/device/src/hdc/tdt_device_impl.cpp:367:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.793 [tdt/device/../common/src/log.cpp:158]Stop QueueManager success,[tdt/device/src/hdc/tdt_device_impl.cpp:381:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.808 [tdt/device/../common/src/log.cpp:158]Begin to stop send thread.,[tdt/device/src/hdc/tdt_device_impl.cpp:192:StopSendThread]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.866 [tdt/device/../common/src/log.cpp:158]Success stop send thread.,[tdt/device/src/hdc/tdt_device_impl.cpp:196:StopSendThread]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.885 [tdt/device/../common/src/log.cpp:158]TuningDataTransfer destory hdc client,[tdt/device/src/hdc/tuning_data_transfer.cpp:456:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.900 [tdt/device/../common/src/log.cpp:158]enter HdcClient::Destroy() function,[tdt/device/../common/src/hdc_client.cpp:468:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.917 [tdt/device/../common/src/log.cpp:158]begin drvHdcSessionClose,[tdt/device/../common/src/hdc_client.cpp:415:ClearAllSession]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.074.930 [hardware/dev_plat/../dev_plat/devhdc/hdc_client.c:413][drvHdcClientSessionClose:413] >>> destroy client session(sock: 56)
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.960 [tdt/device/../common/src/log.cpp:158]end drvHdcSessionClose,[tdt/device/../common/src/hdc_client.cpp:420:ClearAllSession]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.074.976 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:1300][drvHdcRecvMsgLen:1300] >>> the session 56 local or remote was closed
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.074.996 [tdt/device/../common/src/log.cpp:158]begin HdcClient::JoinAllRecvThread,[tdt/device/../common/src/hdc_client.cpp:383:JoinAllRecvThread]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.016 [tdt/device/../common/src/log.cpp:158]drvHdcRecv() return 25,[tdt/device/../common/src/hdc_common.cpp:494:RecvHdcDefaultMsg]30243 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.075.018 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.038 [tdt/device/../common/src/log.cpp:158]Receive() return 17903651, which means : hdc service or client socket closed,[tdt/device/../common/src/hdc_common.cpp:438:RecvMsg]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.057 [tdt/device/../common/src/log.cpp:158][deviceId=0][sessionId=1] recv runswitch has been set false,[tdt/device/../common/src/hdc_client.cpp:175:RecvData]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.072 [tdt/device/../common/src/log.cpp:158][deviceId=0] the recv data pthread exit,[tdt/device/../common/src/hdc_client.cpp:190:RecvData]30243 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.138 [tdt/device/../common/src/log.cpp:158]end HdcClient::JoinAllRecvThread,[tdt/device/../common/src/hdc_client.cpp:392:JoinAllRecvThread]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.156 [tdt/device/../common/src/log.cpp:158]begin drvHdcClientDestroy,[tdt/device/../common/src/hdc_client.cpp:450:DestroyClient]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.236 [tdt/device/../common/src/log.cpp:158]end drvHdcClientDestroy,[tdt/device/../common/src/hdc_client.cpp:455:DestroyClient]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.252 [tdt/device/../common/src/log.cpp:158]begin HdcClient::ClearClientPtr,[tdt/device/../common/src/hdc_client.cpp:432:ClearClientPtr]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.269 [tdt/device/../common/src/log.cpp:158]end HdcClient::ClearClientPtr,[tdt/device/../common/src/hdc_client.cpp:440:ClearClientPtr]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.284 [tdt/device/../common/src/log.cpp:158]begin HdcClient::ClearAll,[tdt/device/../common/src/hdc_client.cpp:401:ClearAll]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.299 [tdt/device/../common/src/log.cpp:158]end HdcClient::ClearAll,[tdt/device/../common/src/hdc_client.cpp:404:ClearAll]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.314 [tdt/device/../common/src/log.cpp:158]end HdcClient::Destroy() function,[tdt/device/../common/src/hdc_client.cpp:476:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.330 [tdt/device/../common/src/log.cpp:158]success destroy.,[tdt/device/src/hdc/tdt_device_impl.cpp:388:Destroy]30223 Msg: running ok
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:24.075.343 [aicpu/aicpu_device/aicpu_schedule/compute_process/compute_process.cc:277][AICPUFW] [StopTdtServer 277] TDT server stop success.
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:24.075.355 [hardware/dev_plat/aicpufw/aicpufw_dev.c:94][AICPUFW] [aicpufw_dev_close 94] chip_id:0 will be closed, fd=18.
[TRACE] DP(30223,aicpu_scheduler):2020-05-12-11:05:24.075.373 [status:STOP] [datapreprocess/src/task_queue.cc:292]DP_PREPROCESS module has been closed
[INFO] DRV(30223,aicpu_scheduler):2020-05-12-11:05:24.075.389 [aicpu/aicpu_device/aicpu_schedule/compute_process/main.cc:219][AICPUFW] [main 219] Compute process stopped.
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.408 [tdt/device/../common/src/log.cpp:158]PpcClient::~PpcClient() destructor function called,[tdt/device/src/tsd/ppc_client.cpp:23:~PpcClient]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.422 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] Destroy() enter,[tdt/device/src/tsd/ppc_client.cpp:44:Destroy]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.075.437 [hardware/dev_plat/../dev_plat/devhdc/hdc_ppc.c:151][drvPpcSessionDestroy:151] >>> Ppc Destroy session 26, pid 30223
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.473 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] Destroy() call drvPpcSessionDestroy func return [drvRet:0],[tdt/device/src/tsd/ppc_client.cpp:54:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.490 [tdt/device/../common/src/log.cpp:158][TSDPPCCLI] Destroy() exit,[tdt/device/src/tsd/ppc_client.cpp:73:Destroy]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.508 [tdt/device/../common/src/log.cpp:158]TdtDeviceImpl::~TdtDeviceImpl() destructor function called.,[tdt/device/src/hdc/tdt_device_impl.cpp:54:~TdtDeviceImpl]30223 Msg: running ok
[INFO] TDT(30223,aicpu_scheduler):2020-05-12-11:05:24.075.532 [tdt/device/../common/src/log.cpp:158]TdtServerImpl::~TdtServerImpl() destructor function called,[tdt/device/src/hdc/tdt_server_impl.cpp:71:~TdtServerImpl]30223 Msg: running ok
[INFO] HDC(30223,aicpu_scheduler):2020-05-12-11:05:24.076.267 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:2002][drv_hdc_exit:2002] >>> HDC uninit success
[INFO] HDC(8380,tsdaemon):2020-05-12-11:05:24.077.353 [hardware/dev_plat/../dev_plat/devhdc/hdc_core.c:626][hdcSocketRecvPeek:626] >>> client connection closed: Success(errno: 0)(sock: 16)
[WARNING] TDT(8380,tsdaemon):2020-05-12-11:05:24.077.393 [tdt/device/../common/src/log.cpp:143][TSDPPCIF] drvPpcRecv fail 25,[tdt/device/src/tsd/ppc_interface.cpp:92:RecvMsg]30245 Msg: warnging
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.077.412 [tdt/device/../common/src/log.cpp:149][TSDPPCSER] 0 SOCKET_CLOSED notify dev[0] procType[1] ret=[17379389]?[17379389]TSD to clean,[tdt/device/src/tsd/ppc_server.cpp:199:RecvData]30245
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.077.429 [tdt/device/../common/src/log.cpp:149][TSDPPCSER] 1 SOCKET_CLOSED notify dev[0] procType[1] ret[17379389] TSD to clean,[tdt/device/src/tsd/ppc_server.cpp:204:RecvData]30245
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.077.804 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.441417] [hdcdrv] [hdcdrv_server_destroy 2415] <aicpu_scheduler:30223> dev_id 0 service_type service_TDT server destroy
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.077.829 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.441427] [hdcdrv] [hdcdrv_accept 2470] <Accept:30223> dev_id 0 service_type service_TDT accept wait dev 0 quit, dev status 1, listen status 0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.077.841 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.441858] [hdcdrv] [hdcdrv_recv_peek 2899] <RecvData:30223> dev 0 session 56 local or remote close, local_close_state closed_by_user, remote_close_state closed_by_user,local_session_fd 56, remote_session_fd 227.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.077.850 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.444353] [devdrv] [devdrv_manager_get_kernel_lib_process 251] <devdrv_load_ser:30223> wait_event_interruptible return: -512.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.077.860 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.444495] [devmm] [devmm_notifier_release_private 1225] <aicpufw_sup:30223,30239> device wait ts exit hostpid(40927) exit(0).
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.085.102 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.095.180 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.105.257 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.115.334 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.125.408 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.133.771 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.496591] [devmm] [devmm_notifier_release_private 1231] <aicpufw_sup:30223,30239> ts exited,device hostpid(40927) begin recover resource.
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.137.750 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.147.847 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.157.927 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.168.004 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.178.083 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.188.161 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.198.238 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.208.316 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.218.391 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.228.469 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.238.544 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.248.619 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.258.697 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.268.776 [tdt/device/../common/src/log.cpp:158]The current kill result is [0],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] DRV(8314,aicpufw_monitor):2020-05-12-11:05:24.275.875 [hardware/dev_plat/aicpufw/aicpufw_thread.c:931][AICPUFW] [aicpufw_monitor_recycle_so 931] recycle so pid_dir=/home/HwHiAiUser/tmp/30223/.
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.856 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642610] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>0 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.878 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642614] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>1 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.889 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642616] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>2 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.899 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642618] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>3 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.915 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642620] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>4 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.924 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642622] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>5 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.933 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642624] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>6 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.942 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642627] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>7 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.950 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642628] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>8 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.959 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642630] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>9 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.967 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642632] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>10 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.975 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642634] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>11 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.984 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642636] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>12 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.277.994 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642638] [aicpufw-drv] [aicpufw_drv_release_debug 650] <aicpufw_sup:30223>13 :rev_int_cnt 0 rev_int_ok 0, rev_int_invalid 0,send_to_int_cnt 0 wait_satisfied 0 wait_event_time 0 rev_int_pending 0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.278.003 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642641] [aicpufw-drv] [aicpufw_drv_delete_context 568] <aicpufw_sup:30223>delete match-pid(40927). dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.278.011 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642642] [aicpufw-drv] [aicpufw_drv_release 690] <aicpufw_sup:30223>processes(1),process pid(30223) released.current tgid: 30223 numa node:0. dev_id:0
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.278.019 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.642657] [aicpufw-drv] [aicpufw_drv_get_moniter_info 1583] <aicpu_m_ioctl:8314>aicpufw event happened. dev_id:0
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.860 [tdt/device/../common/src/log.cpp:158]The current kill result is [-1],[tdt/device/src/tsd/tsdaemon.cpp:1090:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.880 [tdt/device/../common/src/log.cpp:158][TSDaemon] Computer Process stop success.,[tdt/device/src/tsd/tsdaemon.cpp:1096:CheckSubProcessExitByType]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.892 [tdt/device/../common/src/log.cpp:158][TSDaemon] SubProcesses exit success on device[0], tryTimes is 21,[tdt/device/src/tsd/tsdaemon.cpp:1098:CheckSubProcessExitByType]30256 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.908 [tdt/device/../common/src/log.cpp:149][TsdEVENT] #### Close Rsp TSD->FMK device[0] sessionID[1] realDeviceId[0]####,[tdt/device/src/tsd/tsdaemon.cpp:1023:SendRspMsgToFmk]30256
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.950 [tdt/device/../common/src/log.cpp:158][TSDaemon] CloseRspProc subRunState is: 0,[tdt/device/src/tsd/tsdaemon.cpp:1248:TsdWaitRspProcForClose]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.278.966 [tdt/device/../common/src/log.cpp:158][TSDaemon] EraseTsdToFmkMsg:deviceId[0], subProcPid[30223],[tdt/device/src/tsd/tsdaemon.cpp:789:EraseTsdToFmkMsg]30256 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.027 [tdt/device/../common/src/log.cpp:158][TSDaemon]###### PPCSerToTsdAbnormalMsg deviceId[0] msgType[3](0:START RSP,1:SHUTDOWN,2:SHUTDOWN RSP,3:SOCKET CLOSE), procType[1](0:HCCP,1:COMPUTE), state[0],[tdt/device/src/tsd/tsdaemon.cpp:1471:PPCSerToTsdAbnormalMsg]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.057 [tdt/device/../common/src/log.cpp:158][TSDaemon] Begin to ClearPPCThreadCleanFlag,[tdt/device/src/tsd/tsdaemon.cpp:1473:PPCSerToTsdAbnormalMsg]30245 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.081 [tdt/device/../common/src/log.cpp:149][TSDPPCSER] 2 SOCKET_CLOSED notify dev[0] procType[1] ret[17379389] TSD to clean,[tdt/device/src/tsd/ppc_server.cpp:211:RecvData]30245
[INFO] HDC(8380,tsdaemon):2020-05-12-11:05:24.279.117 [hardware/dev_plat/../dev_plat/devhdc/hdc_ppc.c:304][drvPpcSessionClose:304] >>> Ppc Close session fd 16, pid 8380 session 0xfffee0000d30
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.139 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] CloseSomeSession enter [sessionSize:1],[tdt/device/src/tsd/ppc_server.cpp:80:RemoveFromPpcSessionList]30245 Msg: running ok
[INFO] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.157 [tdt/device/../common/src/log.cpp:158][TSDPPCSER] CloseSomeSession exit [sessionSize:0],[tdt/device/src/tsd/ppc_server.cpp:85:RemoveFromPpcSessionList]30245 Msg: running ok
[EVENT] TDT(8380,tsdaemon):2020-05-12-11:05:24.279.190 [tdt/device/../common/src/log.cpp:149][TSDPPCSER] [threadName:ppc_srv_recv_0] RecvData [ret=17379389] [tid=281470580584880]thread exit,[tdt/device/src/tsd/ppc_server.cpp:218:RecvData]30245
[INFO] KERNEL(8189,sklogd):2020-05-12-11:05:24.609.779 [toolchain/log/slog/sklog/device/../src/klogd.c:249][48432.973818] [devmm] [devmm_chan_close_device_h2d 1124] <kworker/0:0:55678,55678> device process exited, hostpid=40927, devpid=30223, devid=0.
@@ -0,0 +1,207 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Convenience functions for managing dataset file buffers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import atexit
import multiprocessing
import multiprocessing.dummy
import os
import tempfile
import uuid
import numpy as np
import six
import tensorflow as tf
class _GarbageCollector(object):
"""Deletes temporary buffer files at exit.
Certain tasks (such as NCF Recommendation) require writing buffers to
temporary files. (Which may be local or distributed.) It is not generally safe
to delete these files during operation, but they should be cleaned up. This
class keeps track of temporary files created, and deletes them at exit.
"""
def __init__(self):
self.temp_buffers = []
def register(self, filepath):
self.temp_buffers.append(filepath)
def purge(self):
try:
for i in self.temp_buffers:
if tf.io.gfile.exists(i):
tf.io.gfile.remove(i)
tf.compat.v1.logging.info("Buffer file {} removed".format(i))
except Exception as e:
tf.compat.v1.logging.error("Failed to cleanup buffer files: {}".format(e))
_GARBAGE_COLLECTOR = _GarbageCollector()
atexit.register(_GARBAGE_COLLECTOR.purge)
_ROWS_PER_CORE = 50000
def write_to_temp_buffer(dataframe, buffer_folder, columns):
if buffer_folder is None:
_, buffer_path = tempfile.mkstemp()
else:
tf.io.gfile.makedirs(buffer_folder)
buffer_path = os.path.join(buffer_folder, str(uuid.uuid4()))
_GARBAGE_COLLECTOR.register(buffer_path)
return write_to_buffer(dataframe, buffer_path, columns)
def iter_shard_dataframe(df, rows_per_core=1000):
"""Two way shard of a dataframe.
This function evenly shards a dataframe so that it can be mapped efficiently.
It yields a list of dataframes with length equal to the number of CPU cores,
with each dataframe having rows_per_core rows. (Except for the last batch
which may have fewer rows in the dataframes.) Passing vectorized inputs to
a pool is more effecient than iterating through a dataframe in serial and
passing a list of inputs to the pool.
Args:
df: Pandas dataframe to be sharded.
rows_per_core: Number of rows in each shard.
Returns:
A list of dataframe shards.
"""
n = len(df)
num_cores = min([multiprocessing.cpu_count(), n])
num_blocks = int(np.ceil(n / num_cores / rows_per_core))
max_batch_size = num_cores * rows_per_core
for i in range(num_blocks):
min_index = i * max_batch_size
max_index = min([(i + 1) * max_batch_size, n])
df_shard = df[min_index:max_index]
n_shard = len(df_shard)
boundaries = np.linspace(0, n_shard, num_cores + 1, dtype=np.int64)
yield [df_shard[boundaries[j]:boundaries[j+1]] for j in range(num_cores)]
def _shard_dict_to_examples(shard_dict):
"""Converts a dict of arrays into a list of example bytes."""
n = [i for i in shard_dict.values()][0].shape[0]
feature_list = [{} for _ in range(n)]
for column, values in shard_dict.items():
if len(values.shape) == 1:
values = np.reshape(values, values.shape + (1,))
if values.dtype.kind == "i":
feature_map = lambda x: tf.train.Feature(
int64_list=tf.train.Int64List(value=x))
elif values.dtype.kind == "f":
feature_map = lambda x: tf.train.Feature(
float_list=tf.train.FloatList(value=x))
else:
raise ValueError("Invalid dtype")
for i in range(n):
feature_list[i][column] = feature_map(values[i])
examples = [
tf.train.Example(features=tf.train.Features(feature=example_features))
for example_features in feature_list
]
return [e.SerializeToString() for e in examples]
def _serialize_shards(df_shards, columns, pool, writer):
"""Map sharded dataframes to bytes, and write them to a buffer.
Args:
df_shards: A list of pandas dataframes. (Should be of similar size)
columns: The dataframe columns to be serialized.
pool: A pool to serialize in parallel.
writer: A TFRecordWriter to write the serialized shards.
"""
# Pandas does not store columns of arrays as nd arrays. stack remedies this.
map_inputs = [{c: np.stack(shard[c].values, axis=0) for c in columns}
for shard in df_shards]
# Failure within pools is very irksome. Thus, it is better to thoroughly check
# inputs in the main process.
for inp in map_inputs:
# Check that all fields have the same number of rows.
assert len(set([v.shape[0] for v in inp.values()])) == 1
for val in inp.values():
assert hasattr(val, "dtype")
assert hasattr(val.dtype, "kind")
assert val.dtype.kind in ("i", "f")
assert len(val.shape) in (1, 2)
shard_bytes = pool.map(_shard_dict_to_examples, map_inputs)
for s in shard_bytes:
for example in s:
writer.write(example)
def write_to_buffer(dataframe, buffer_path, columns, expected_size=None):
"""Write a dataframe to a binary file for a dataset to consume.
Args:
dataframe: The pandas dataframe to be serialized.
buffer_path: The path where the serialized results will be written.
columns: The dataframe columns to be serialized.
expected_size: The size in bytes of the serialized results. This is used to
lazily construct the buffer.
Returns:
The path of the buffer.
"""
if (tf.io.gfile.exists(buffer_path) and
tf.io.gfile.stat(buffer_path).length > 0):
actual_size = tf.io.gfile.stat(buffer_path).length
if expected_size == actual_size:
return buffer_path
tf.compat.v1.logging.warning(
"Existing buffer {} has size {}. Expected size {}. Deleting and "
"rebuilding buffer.".format(buffer_path, actual_size, expected_size))
tf.io.gfile.remove(buffer_path)
if dataframe is None:
raise ValueError(
"dataframe was None but a valid existing buffer was not found.")
tf.io.gfile.makedirs(os.path.split(buffer_path)[0])
tf.compat.v1.logging.info("Constructing TFRecordDataset buffer: {}"
.format(buffer_path))
count = 0
pool = multiprocessing.dummy.Pool(multiprocessing.cpu_count())
try:
with tf.io.TFRecordWriter(buffer_path) as writer:
for df_shards in iter_shard_dataframe(df=dataframe,
rows_per_core=_ROWS_PER_CORE):
_serialize_shards(df_shards, columns, pool, writer)
count += sum([len(s) for s in df_shards])
tf.compat.v1.logging.info("{}/{} examples written."
.format(str(count).ljust(8), len(dataframe)))
finally:
pool.terminate()
tf.compat.v1.logging.info("Buffer write complete.")
return buffer_path
@@ -0,0 +1,199 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for binary data file utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import multiprocessing
# pylint: disable=wrong-import-order
import numpy as np
import pandas as pd
import tensorflow as tf
# pylint: enable=wrong-import-order
from official.r1.utils.data import file_io
from official.utils.misc import keras_utils
_RAW_ROW = "raw_row"
_DUMMY_COL = "column_0"
_DUMMY_VEC_COL = "column_1"
_DUMMY_VEC_LEN = 4
_ROWS_PER_CORE = 4
_TEST_CASES = [
# One batch of one
dict(row_count=1, cpu_count=1, expected=[
[[0]]
]),
dict(row_count=10, cpu_count=1, expected=[
[[0, 1, 2, 3]], [[4, 5, 6, 7]], [[8, 9]]
]),
dict(row_count=21, cpu_count=1, expected=[
[[0, 1, 2, 3]], [[4, 5, 6, 7]], [[8, 9, 10, 11]],
[[12, 13, 14, 15]], [[16, 17, 18, 19]], [[20]]
]),
dict(row_count=1, cpu_count=4, expected=[
[[0]]
]),
dict(row_count=10, cpu_count=4, expected=[
[[0, 1], [2, 3, 4], [5, 6], [7, 8, 9]]
]),
dict(row_count=21, cpu_count=4, expected=[
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],
[[16], [17], [18], [19, 20]]
]),
dict(row_count=10, cpu_count=8, expected=[
[[0], [1], [2], [3, 4], [5], [6], [7], [8, 9]]
]),
dict(row_count=40, cpu_count=8, expected=[
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15],
[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27],
[28, 29, 30, 31]],
[[32], [33], [34], [35], [36], [37], [38], [39]]
]),
]
_FEATURE_MAP = {
_RAW_ROW: tf.io.FixedLenFeature([1], dtype=tf.int64),
_DUMMY_COL: tf.io.FixedLenFeature([1], dtype=tf.int64),
_DUMMY_VEC_COL: tf.io.FixedLenFeature([_DUMMY_VEC_LEN], dtype=tf.float32)
}
@contextlib.contextmanager
def fixed_core_count(cpu_count):
"""Override CPU count.
file_io.py uses the cpu_count function to scale to the size of the instance.
However, this is not desirable for testing because it can make the test flaky.
Instead, this context manager fixes the count for more robust testing.
Args:
cpu_count: How many cores multiprocessing claims to have.
Yields:
Nothing. (for context manager only)
"""
old_count_fn = multiprocessing.cpu_count
multiprocessing.cpu_count = lambda: cpu_count
yield
multiprocessing.cpu_count = old_count_fn
class BaseTest(tf.test.TestCase):
def setUp(self):
super(BaseTest, self).setUp()
if keras_utils.is_v2_0:
tf.compat.v1.disable_eager_execution()
def _test_sharding(self, row_count, cpu_count, expected):
df = pd.DataFrame({_DUMMY_COL: list(range(row_count))})
with fixed_core_count(cpu_count):
shards = list(file_io.iter_shard_dataframe(df, _ROWS_PER_CORE))
result = [[j[_DUMMY_COL].tolist() for j in i] for i in shards]
self.assertAllEqual(expected, result)
def test_tiny_rows_low_core(self):
self._test_sharding(**_TEST_CASES[0])
def test_small_rows_low_core(self):
self._test_sharding(**_TEST_CASES[1])
def test_large_rows_low_core(self):
self._test_sharding(**_TEST_CASES[2])
def test_tiny_rows_medium_core(self):
self._test_sharding(**_TEST_CASES[3])
def test_small_rows_medium_core(self):
self._test_sharding(**_TEST_CASES[4])
def test_large_rows_medium_core(self):
self._test_sharding(**_TEST_CASES[5])
def test_small_rows_large_core(self):
self._test_sharding(**_TEST_CASES[6])
def test_large_rows_large_core(self):
self._test_sharding(**_TEST_CASES[7])
def _serialize_deserialize(self, num_cores=1, num_rows=20):
np.random.seed(1)
df = pd.DataFrame({
# Serialization order is only deterministic for num_cores=1. raw_row is
# used in validation after the deserialization.
_RAW_ROW: np.array(range(num_rows), dtype=np.int64),
_DUMMY_COL: np.random.randint(0, 35, size=(num_rows,)),
_DUMMY_VEC_COL: [
np.array([np.random.random() for _ in range(_DUMMY_VEC_LEN)])
for i in range(num_rows) # pylint: disable=unused-variable
]
})
with fixed_core_count(num_cores):
buffer_path = file_io.write_to_temp_buffer(
df, self.get_temp_dir(), [_RAW_ROW, _DUMMY_COL, _DUMMY_VEC_COL])
with self.session(graph=tf.Graph()) as sess:
dataset = tf.data.TFRecordDataset(buffer_path)
dataset = dataset.batch(1).map(
lambda x: tf.io.parse_example(serialized=x, features=_FEATURE_MAP))
data_iter = tf.compat.v1.data.make_one_shot_iterator(dataset)
seen_rows = set()
for i in range(num_rows+5):
row = data_iter.get_next()
try:
row_id, val_0, val_1 = sess.run(
[row[_RAW_ROW], row[_DUMMY_COL], row[_DUMMY_VEC_COL]])
row_id, val_0, val_1 = row_id[0][0], val_0[0][0], val_1[0]
assert row_id not in seen_rows
seen_rows.add(row_id)
self.assertEqual(val_0, df[_DUMMY_COL][row_id])
self.assertAllClose(val_1, df[_DUMMY_VEC_COL][row_id])
self.assertLess(i, num_rows, msg="Too many rows.")
except tf.errors.OutOfRangeError:
self.assertGreaterEqual(i, num_rows, msg="Too few rows.")
file_io._GARBAGE_COLLECTOR.purge()
assert not tf.io.gfile.exists(buffer_path)
def test_serialize_deserialize_0(self):
self._serialize_deserialize(num_cores=1)
def test_serialize_deserialize_1(self):
self._serialize_deserialize(num_cores=2)
def test_serialize_deserialize_2(self):
self._serialize_deserialize(num_cores=8)
if __name__ == "__main__":
tf.test.main()
@@ -0,0 +1,49 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Convenience functions for exporting models as SavedModels or other types."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
def build_tensor_serving_input_receiver_fn(shape, dtype=tf.float32,
batch_size=1):
"""Returns a input_receiver_fn that can be used during serving.
This expects examples to come through as float tensors, and simply
wraps them as TensorServingInputReceivers.
Arguably, this should live in tf.estimator.export. Testing here first.
Args:
shape: list representing target size of a single example.
dtype: the expected datatype for the input example
batch_size: number of input tensors that will be passed for prediction
Returns:
A function that itself returns a TensorServingInputReceiver.
"""
def serving_input_receiver_fn():
# Prep a placeholder where the input example will be fed in
features = tf.compat.v1.placeholder(
dtype=dtype, shape=[batch_size] + shape, name='input_tensor')
return tf.estimator.export.TensorServingInputReceiver(
features=features, receiver_tensors=features)
return serving_input_receiver_fn
@@ -0,0 +1,63 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for exporting utils."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf # pylint: disable=g-bad-import-order
from official.r1.utils import export
class ExportUtilsTest(tf.test.TestCase):
"""Tests for the ExportUtils."""
def test_build_tensor_serving_input_receiver_fn(self):
receiver_fn = export.build_tensor_serving_input_receiver_fn(shape=[4, 5])
with tf.Graph().as_default():
receiver = receiver_fn()
self.assertIsInstance(
receiver, tf.estimator.export.TensorServingInputReceiver)
self.assertIsInstance(receiver.features, tf.Tensor)
self.assertEqual(receiver.features.shape, tf.TensorShape([1, 4, 5]))
self.assertEqual(receiver.features.dtype, tf.float32)
self.assertIsInstance(receiver.receiver_tensors, dict)
# Note that Python 3 can no longer index .values() directly; cast to list.
self.assertEqual(list(receiver.receiver_tensors.values())[0].shape,
tf.TensorShape([1, 4, 5]))
def test_build_tensor_serving_input_receiver_fn_batch_dtype(self):
receiver_fn = export.build_tensor_serving_input_receiver_fn(
shape=[4, 5], dtype=tf.int8, batch_size=10)
with tf.Graph().as_default():
receiver = receiver_fn()
self.assertIsInstance(
receiver, tf.estimator.export.TensorServingInputReceiver)
self.assertIsInstance(receiver.features, tf.Tensor)
self.assertEqual(receiver.features.shape, tf.TensorShape([10, 4, 5]))
self.assertEqual(receiver.features.dtype, tf.int8)
self.assertIsInstance(receiver.receiver_tensors, dict)
# Note that Python 3 can no longer index .values() directly; cast to list.
self.assertEqual(list(receiver.receiver_tensors.values())[0].shape,
tf.TensorShape([10, 4, 5]))
if __name__ == "__main__":
tf.test.main()
@@ -0,0 +1,116 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions specific to running TensorFlow on TPUs."""
import tensorflow as tf
# "local" is a magic word in the TPU cluster resolver; it informs the resolver
# to use the local CPU as the compute device. This is useful for testing and
# debugging; the code flow is ostensibly identical, but without the need to
# actually have a TPU on the other end.
LOCAL = "local"
def construct_scalar_host_call(metric_dict, model_dir, prefix=""):
"""Construct a host call to log scalars when training on TPU.
Args:
metric_dict: A dict of the tensors to be logged.
model_dir: The location to write the summary.
prefix: The prefix (if any) to prepend to the metric names.
Returns:
A tuple of (function, args_to_be_passed_to_said_function)
"""
# type: (dict, str) -> (function, list)
metric_names = list(metric_dict.keys())
def host_call_fn(global_step, *args):
"""Training host call. Creates scalar summaries for training metrics.
This function is executed on the CPU and should not directly reference
any Tensors in the rest of the `model_fn`. To pass Tensors from the
model to the `metric_fn`, provide as part of the `host_call`. See
https://www.tensorflow.org/api_docs/python/tf/contrib/tpu/TPUEstimatorSpec
for more information.
Arguments should match the list of `Tensor` objects passed as the second
element in the tuple passed to `host_call`.
Args:
global_step: `Tensor with shape `[batch]` for the global_step
*args: Remaining tensors to log.
Returns:
List of summary ops to run on the CPU host.
"""
step = global_step[0]
with tf.compat.v1.summary.create_file_writer(
logdir=model_dir, filename_suffix=".host_call").as_default():
with tf.compat.v1.summary.always_record_summaries():
for i, name in enumerate(metric_names):
tf.compat.v1.summary.scalar(prefix + name, args[i][0], step=step)
return tf.compat.v1.summary.all_summary_ops()
# To log the current learning rate, and gradient norm for Tensorboard, the
# summary op needs to be run on the host CPU via host_call. host_call
# expects [batch_size, ...] Tensors, thus reshape to introduce a batch
# dimension. These Tensors are implicitly concatenated to
# [params['batch_size']].
global_step_tensor = tf.reshape(
tf.compat.v1.train.get_or_create_global_step(), [1])
other_tensors = [tf.reshape(metric_dict[key], [1]) for key in metric_names]
return host_call_fn, [global_step_tensor] + other_tensors
def embedding_matmul(embedding_table, values, mask, name="embedding_matmul"):
"""Performs embedding lookup via a matmul.
The matrix to be multiplied by the embedding table Tensor is constructed
via an implementation of scatter based on broadcasting embedding indices
and performing an equality comparison against a broadcasted
range(num_embedding_table_rows). All masked positions will produce an
embedding vector of zeros.
Args:
embedding_table: Tensor of embedding table.
Rank 2 (table_size x embedding dim)
values: Tensor of embedding indices. Rank 2 (batch x n_indices)
mask: Tensor of mask / weights. Rank 2 (batch x n_indices)
name: Optional name scope for created ops
Returns:
Rank 3 tensor of embedding vectors.
"""
with tf.name_scope(name):
n_embeddings = embedding_table.get_shape().as_list()[0]
batch_size, padded_size = values.shape.as_list()
emb_idcs = tf.tile(
tf.reshape(values, (batch_size, padded_size, 1)), (1, 1, n_embeddings))
emb_weights = tf.tile(
tf.reshape(mask, (batch_size, padded_size, 1)), (1, 1, n_embeddings))
col_idcs = tf.tile(
tf.reshape(tf.range(n_embeddings), (1, 1, n_embeddings)),
(batch_size, padded_size, 1))
one_hot = tf.where(
tf.equal(emb_idcs, col_idcs), emb_weights,
tf.zeros((batch_size, padded_size, n_embeddings)))
return tf.tensordot(one_hot, embedding_table, 1)
@@ -0,0 +1,108 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test TPU optimized matmul embedding."""
import numpy as np
import tensorflow as tf
from official.r1.utils import tpu as tpu_utils
TEST_CASES = [
dict(embedding_dim=256, vocab_size=1000, sequence_length=64,
batch_size=32, seed=54131),
dict(embedding_dim=8, vocab_size=15, sequence_length=12,
batch_size=256, seed=536413),
dict(embedding_dim=2048, vocab_size=512, sequence_length=50,
batch_size=8, seed=35124)
]
class TPUBaseTester(tf.test.TestCase):
def construct_embedding_and_values(self, embedding_dim, vocab_size,
sequence_length, batch_size, seed):
np.random.seed(seed)
embeddings = np.random.random(size=(vocab_size, embedding_dim))
embedding_table = tf.convert_to_tensor(value=embeddings, dtype=tf.float32)
tokens = np.random.randint(low=1, high=vocab_size-1,
size=(batch_size, sequence_length))
for i in range(batch_size):
tokens[i, np.random.randint(low=0, high=sequence_length-1):] = 0
values = tf.convert_to_tensor(value=tokens, dtype=tf.int32)
mask = tf.cast(tf.not_equal(values, 0), dtype=tf.float32)
return embedding_table, values, mask
def _test_embedding(self, embedding_dim, vocab_size,
sequence_length, batch_size, seed):
"""Test that matmul embedding matches embedding lookup (gather)."""
with self.test_session():
embedding_table, values, mask = self.construct_embedding_and_values(
embedding_dim=embedding_dim,
vocab_size=vocab_size,
sequence_length=sequence_length,
batch_size=batch_size,
seed=seed
)
embedding = (tf.nn.embedding_lookup(params=embedding_table, ids=values) *
tf.expand_dims(mask, -1))
matmul_embedding = tpu_utils.embedding_matmul(
embedding_table=embedding_table, values=values, mask=mask)
self.assertAllClose(embedding, matmul_embedding)
def _test_masking(self, embedding_dim, vocab_size,
sequence_length, batch_size, seed):
"""Test that matmul embedding properly zeros masked positions."""
with self.test_session():
embedding_table, values, mask = self.construct_embedding_and_values(
embedding_dim=embedding_dim,
vocab_size=vocab_size,
sequence_length=sequence_length,
batch_size=batch_size,
seed=seed
)
matmul_embedding = tpu_utils.embedding_matmul(
embedding_table=embedding_table, values=values, mask=mask)
self.assertAllClose(matmul_embedding,
matmul_embedding * tf.expand_dims(mask, -1))
def test_embedding_0(self):
self._test_embedding(**TEST_CASES[0])
def test_embedding_1(self):
self._test_embedding(**TEST_CASES[1])
def test_embedding_2(self):
self._test_embedding(**TEST_CASES[2])
def test_masking_0(self):
self._test_masking(**TEST_CASES[0])
def test_masking_1(self):
self._test_masking(**TEST_CASES[1])
def test_masking_2(self):
self._test_masking(**TEST_CASES[2])
if __name__ == "__main__":
tf.test.main()
@@ -0,0 +1,24 @@
six
google-api-python-client>=1.6.7
google-cloud-bigquery>=0.31.0
kaggle>=1.3.9
mlperf_compliance==0.0.10
numpy>=1.15.4
oauth2client>=4.1.2
pandas>=0.22.0
psutil>=5.4.3
py-cpuinfo>=3.3.0
scipy>=0.19.1
tensorflow-hub>=0.6.0
tensorflow-model-optimization>=0.2.1
tensorflow_datasets
dataclasses
gin-config
typing
sentencepiece
Cython
matplotlib
opencv-python-headless
pyyaml
Pillow
-e git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI
@@ -0,0 +1,97 @@
# Adding Abseil (absl) flags quickstart
## Defining a flag
absl flag definitions are similar to argparse, although they are defined on a global namespace.
For instance defining a string flag looks like:
```$xslt
from absl import flags
flags.DEFINE_string(
name="my_flag",
default="a_sensible_default",
help="Here is what this flag does."
)
```
All three arguments are required, but default may be `None`. A common optional argument is
short_name for defining abreviations. Certain `DEFINE_*` methods will have other required arguments.
For instance `DEFINE_enum` requires the `enum_values` argument to be specified.
## Key Flags
absl has the concept of a key flag. Any flag defined in `__main__` is considered a key flag by
default. Key flags are displayed in `--help`, others only appear in `--helpfull`. In order to
handle key flags that are defined outside the module in question, absl provides the
`flags.adopt_module_key_flags()` method. This adds the key flags of a different module to one's own
key flags. For example:
```$xslt
File: flag_source.py
---------------------------------------
from absl import flags
flags.DEFINE_string(name="my_flag", default="abc", help="a flag.")
```
```$xslt
File: my_module.py
---------------------------------------
from absl import app as absl_app
from absl import flags
import flag_source
flags.adopt_module_key_flags(flag_source)
def main(_):
pass
absl_app.run(main, [__file__, "-h"]
```
when `my_module.py` is run it will show the help text for `my_flag`. Because not all flags defined
in a file are equally important, `official/utils/flags/core.py` (generally imported as flags_core)
provides an abstraction for handling key flag declaration in an easy way through the
`register_key_flags_in_core()` function, which allows a module to make a single
`adopt_key_flags(flags_core)` call when using the util flag declaration functions.
## Validators
Often the constraints on a flag are complicated. absl provides the validator decorator to allow
one to mark a function as a flag validation function. Suppose we want users to provide a flag
which is a palindrome.
```$xslt
from absl import flags
flags.DEFINE_string(name="pal_flag", short_name="pf", default="", help="Give me a palindrome")
@flags.validator("pal_flag")
def _check_pal(provided_pal_flag):
return provided_pal_flag == provided_pal_flag[::-1]
```
Validators take the form that returning True (truthy) passes, and all others
(False, None, exception) fail.
## Testing
To test using absl, simply declare flags in the setupClass method of TensorFlow's TestCase.
```$xslt
from absl import flags
import tensorflow as tf
def define_flags():
flags.DEFINE_string(name="test_flag", default="abc", help="an example flag")
class BaseTester(unittest.TestCase):
@classmethod
def setUpClass(cls):
super(BaseTester, cls).setUpClass()
define_flags()
def test_trivial(self):
flags_core.parse_flags([__file__, "test_flag", "def"])
self.AssertEqual(flags.FLAGS.test_flag, "def")
```
@@ -0,0 +1,163 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Flags which will be nearly universal across models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import flags
import tensorflow as tf
from official.utils.flags._conventions import help_wrap
from official.utils.logs import hooks_helper
############## npu modify begin #############
from hccl.manage.api import get_rank_size
from hccl.manage.api import get_rank_id
############## npu modify end ###############
def define_base(data_dir=True, model_dir=True, clean=False, train_epochs=False,
epochs_between_evals=False, stop_threshold=False,
batch_size=True, num_gpu=False, hooks=False, export_dir=False,
distribution_strategy=False, run_eagerly=False):
"""Register base flags.
Args:
data_dir: Create a flag for specifying the input data directory.
model_dir: Create a flag for specifying the model file directory.
clean: Create a flag for removing the model_dir.
train_epochs: Create a flag to specify the number of training epochs.
epochs_between_evals: Create a flag to specify the frequency of testing.
stop_threshold: Create a flag to specify a threshold accuracy or other
eval metric which should trigger the end of training.
batch_size: Create a flag to specify the batch size.
num_gpu: Create a flag to specify the number of GPUs used.
hooks: Create a flag to specify hooks for logging.
export_dir: Create a flag to specify where a SavedModel should be exported.
distribution_strategy: Create a flag to specify which Distribution Strategy
to use.
run_eagerly: Create a flag to specify to run eagerly op by op.
Returns:
A list of flags for core.py to marks as key flags.
"""
key_flags = []
if data_dir:
flags.DEFINE_string(
name="data_dir", short_name="dd", default="/tmp",
help=help_wrap("The location of the input data."))
key_flags.append("data_dir")
if model_dir:
flags.DEFINE_string(
name="model_dir", short_name="md", default="/tmp",
help=help_wrap("The location of the model checkpoint files."))
key_flags.append("model_dir")
if clean:
flags.DEFINE_boolean(
name="clean", default=False,
help=help_wrap("If set, model_dir will be removed if it exists."))
key_flags.append("clean")
if train_epochs:
flags.DEFINE_integer(
name="train_epochs", short_name="te", default=1,
help=help_wrap("The number of epochs used to train."))
key_flags.append("train_epochs")
if epochs_between_evals:
flags.DEFINE_integer(
name="epochs_between_evals", short_name="ebe", default=1,
help=help_wrap("The number of training epochs to run between "
"evaluations."))
key_flags.append("epochs_between_evals")
if stop_threshold:
flags.DEFINE_float(
name="stop_threshold", short_name="st",
default=None,
help=help_wrap("If passed, training will stop at the earlier of "
"train_epochs and when the evaluation metric is "
"greater than or equal to stop_threshold."))
if batch_size:
flags.DEFINE_integer(
name="batch_size", short_name="bs", default=32,
help=help_wrap("Batch size for training and evaluation. When using "
"multiple gpus, this is the global batch size for "
"all devices. For example, if the batch size is 32 "
"and there are 4 GPUs, each GPU will get 8 examples on "
"each step."))
key_flags.append("batch_size")
if num_gpu:
flags.DEFINE_integer(
name="num_gpus", short_name="ng",
default=1,
help=help_wrap(
"How many GPUs to use at each worker with the "
"DistributionStrategies API. The default is 1."))
if run_eagerly:
flags.DEFINE_boolean(
name="run_eagerly", default=False,
help="Run the model op by op without building a model function.")
if hooks:
# Construct a pretty summary of hooks.
hook_list_str = (
u"\ufeff Hook:\n" + u"\n".join([u"\ufeff {}".format(key) for key
in hooks_helper.HOOKS]))
flags.DEFINE_list(
name="hooks", short_name="hk", default="LoggingTensorHook",
help=help_wrap(
u"A list of (case insensitive) strings to specify the names of "
u"training hooks.\n{}\n\ufeff Example: `--hooks ProfilerHook,"
u"ExamplesPerSecondHook`\n See official.utils.logs.hooks_helper "
u"for details.".format(hook_list_str))
)
key_flags.append("hooks")
if export_dir:
flags.DEFINE_string(
name="export_dir", short_name="ed", default=None,
help=help_wrap("If set, a SavedModel serialization of the model will "
"be exported to this directory at the end of training. "
"See the README for more details and relevant links.")
)
key_flags.append("export_dir")
if distribution_strategy:
flags.DEFINE_string(
name="distribution_strategy", short_name="ds", default="mirrored",
help=help_wrap("The Distribution Strategy to use for training. "
"Accepted values are 'off', 'one_device', "
"'mirrored', 'parameter_server', 'collective', "
"case insensitive. 'off' means not to use "
"Distribution Strategy; 'default' means to choose "
"from `MirroredStrategy` or `OneDeviceStrategy` "
"according to the number of GPUs.")
)
return key_flags
def get_num_gpus(flags_obj):
"""get the num npus using hccl api"""
############## npu modify begin #############
return get_rank_size()
############## npu modify end ###############
@@ -0,0 +1,109 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Flags for benchmarking models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import flags
from official.utils.flags._conventions import help_wrap
def define_log_steps():
flags.DEFINE_integer(
name="log_steps", default=100,
help="Frequency with which to log timing information with TimeHistory.")
return []
def define_benchmark(benchmark_log_dir=True, bigquery_uploader=True):
"""Register benchmarking flags.
Args:
benchmark_log_dir: Create a flag to specify location for benchmark logging.
bigquery_uploader: Create flags for uploading results to BigQuery.
Returns:
A list of flags for core.py to marks as key flags.
"""
key_flags = []
flags.DEFINE_enum(
name="benchmark_logger_type", default="BaseBenchmarkLogger",
enum_values=["BaseBenchmarkLogger", "BenchmarkFileLogger",
"BenchmarkBigQueryLogger"],
help=help_wrap("The type of benchmark logger to use. Defaults to using "
"BaseBenchmarkLogger which logs to STDOUT. Different "
"loggers will require other flags to be able to work."))
flags.DEFINE_string(
name="benchmark_test_id", short_name="bti", default=None,
help=help_wrap("The unique test ID of the benchmark run. It could be the "
"combination of key parameters. It is hardware "
"independent and could be used compare the performance "
"between different test runs. This flag is designed for "
"human consumption, and does not have any impact within "
"the system."))
define_log_steps()
if benchmark_log_dir:
flags.DEFINE_string(
name="benchmark_log_dir", short_name="bld", default=None,
help=help_wrap("The location of the benchmark logging.")
)
if bigquery_uploader:
flags.DEFINE_string(
name="gcp_project", short_name="gp", default=None,
help=help_wrap(
"The GCP project name where the benchmark will be uploaded."))
flags.DEFINE_string(
name="bigquery_data_set", short_name="bds", default="test_benchmark",
help=help_wrap(
"The Bigquery dataset name where the benchmark will be uploaded."))
flags.DEFINE_string(
name="bigquery_run_table", short_name="brt", default="benchmark_run",
help=help_wrap("The Bigquery table name where the benchmark run "
"information will be uploaded."))
flags.DEFINE_string(
name="bigquery_run_status_table", short_name="brst",
default="benchmark_run_status",
help=help_wrap("The Bigquery table name where the benchmark run "
"status information will be uploaded."))
flags.DEFINE_string(
name="bigquery_metric_table", short_name="bmt",
default="benchmark_metric",
help=help_wrap("The Bigquery table name where the benchmark metric "
"information will be uploaded."))
@flags.multi_flags_validator(
["benchmark_logger_type", "benchmark_log_dir"],
message="--benchmark_logger_type=BenchmarkFileLogger will require "
"--benchmark_log_dir being set")
def _check_benchmark_log_dir(flags_dict):
benchmark_logger_type = flags_dict["benchmark_logger_type"]
if benchmark_logger_type == "BenchmarkFileLogger":
return flags_dict["benchmark_log_dir"]
return True
return key_flags

Some files were not shown because too many files have changed in this diff Show More