[add]上传训练benchmark by z00560161
This commit is contained in:
@@ -0,0 +1,47 @@
|
||||
# MobileNet_tensorflow训练说明
|
||||
|
||||
### 1. 模型训练参数配置
|
||||
|
||||
在train/yaml/MobileNet.yaml中修改相应配置, 配置项含义:
|
||||
|
||||
```
|
||||
tensorflow_config:
|
||||
# 基本参数
|
||||
max_steps: 1000
|
||||
data_url: 数据集路径
|
||||
epoches: 跑多少个epoch
|
||||
|
||||
# 训练(train) 或 评测(evaluate)
|
||||
mode: train
|
||||
batch_size: 256
|
||||
#仅在 mode 为 evaluate 时用到
|
||||
ckpt_path: /opt/0908/benchmark-benchmark_Alpha/train/result/tf_mobilenet/trainingJob_20200905171017/0/results/model.ckpt-123125
|
||||
|
||||
# 仅多机执行需要配置: ip1:卡数量1,ip2:卡数量2
|
||||
mpirun_ip: 90.90.176.152:8,90.90.176.154:8
|
||||
|
||||
# docker 镜像名称:版本号
|
||||
docker_image: c73:b021
|
||||
|
||||
# 指定 device id, 多个 id 使用空格分隔, 数量需与 rank_size 相同
|
||||
device_group_1p: 0
|
||||
device_group_2p: 0 1
|
||||
device_group_4p: 0 1 2 3
|
||||
|
||||
profiling_mode: false
|
||||
profiling_options: training_trace
|
||||
fp_point: L2Loss
|
||||
bp_point: gradients/AddN_30
|
||||
aicpu_profiling_mode: false
|
||||
|
||||
```
|
||||
|
||||
------
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
+211
@@ -0,0 +1,211 @@
|
||||
# MobileNetv2 for Tensorflow
|
||||
|
||||
This repository provides a script and recipe to train the MobileNetv2 model to achieve state-of-the-art accuracy.
|
||||
|
||||
## Table Of Contents
|
||||
|
||||
* [Model overview](#model-overview)
|
||||
* [Model Architecture](#model-architecture)
|
||||
* [Default configuration](#default-configuration)
|
||||
* [Data augmentation](#data-augmentation)
|
||||
* [Setup](#setup)
|
||||
* [Requirements](#requirements)
|
||||
* [Quick start guide](#quick-start-guide)
|
||||
* [Advanced](#advanced)
|
||||
* [Command line arguments](#command-line-arguments)
|
||||
* [Training process](#training-process)
|
||||
* [Performance](#performance)
|
||||
* [Results](#results)
|
||||
* [Training accuracy results](#training-accuracy-results)
|
||||
* [Training performance results](#training-performance-results)
|
||||
|
||||
|
||||
|
||||
|
||||
## Model overview
|
||||
|
||||
In this repository, we implement MobileNetv2 from paper [Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." CVPR 2018.](https://arxiv.org/abs/1801.04381)
|
||||
|
||||
MobileNetv2 is a mobile architecture. It is mainly constructed based on depthwise separable convolutions, linear bottlenecks and inverted residuals.
|
||||
|
||||
### Model architecture
|
||||
|
||||
The model architecture can be found from the reference paper.
|
||||
|
||||
### Default configuration
|
||||
|
||||
The following sections introduce the default configurations and hyperparameters for MobileNetv2 model.
|
||||
|
||||
#### Optimizer
|
||||
|
||||
This model uses Momentum optimizer from Tensorflow with the following hyperparameters:
|
||||
|
||||
- Momentum : 0.9
|
||||
- Learning rate (LR) : 0.8
|
||||
- LR schedule: cosine_annealing
|
||||
- Warmup epoch: 5
|
||||
- Batch size : 256*8
|
||||
- Weight decay : 0.00004
|
||||
- Moving average decay: 0.9999
|
||||
- Label smoothing = 0.1
|
||||
- We train for:
|
||||
- 300 epochs for a standard training process using ImageNet2012
|
||||
|
||||
#### Data augmentation
|
||||
|
||||
This model uses the data augmentation from InceptionV2:
|
||||
|
||||
- For training:
|
||||
- Convert DataType and RandomResizeCrop
|
||||
- RandomHorizontalFlip, prob=0.5
|
||||
- Subtract with 0.5 and multiply with 2.0
|
||||
- For inference:
|
||||
- Convert DataType
|
||||
- CenterCrop 87.5% of the original image and resize to (224, 224)
|
||||
- Subtract with 0.5 and multiply with 2.0
|
||||
|
||||
For more details, we refer readers to read the corresponding source code in Slim.
|
||||
|
||||
## Setup
|
||||
The following section lists the requirements to start training the MobileNetv2 model.
|
||||
### Requirements
|
||||
|
||||
Tensorflow 1.15.0
|
||||
|
||||
## Quick Start Guide
|
||||
|
||||
### 1. Clone the respository
|
||||
|
||||
```shell
|
||||
git clone xxx
|
||||
cd ModelZoo_MobileNetv2_TF
|
||||
```
|
||||
|
||||
### 2. Download and preprocess the dataset
|
||||
|
||||
1. Download the ImageNet2012 dataset
|
||||
2. Generate tfrecord files following [Tensorflow-Slim](https://github.com/tensorflow/models/tree/master/research/slim).
|
||||
3. The train and validation tfrecord files are under the path/data directories.
|
||||
|
||||
### 3. Train
|
||||
- train on a single NPU
|
||||
- **edit** *train_1p.sh* (see example below)
|
||||
- bash run_1p.sh
|
||||
- train on 8 NPUs
|
||||
- **edit** *train_8p.sh* (see example below)
|
||||
- bash run_8p.sh
|
||||
|
||||
Examples:
|
||||
- Case for single NPU
|
||||
- In *train_1p.sh*, python scripts part should look like as follows. For more detailed command lines arguments, please refer to [Command line arguments](#command-line-arguments)
|
||||
```shell
|
||||
python3.7 ${currentDir}/train.py \
|
||||
--dataset_dir=/opt/npu/slimImagenet \
|
||||
--max_train_steps=500 \
|
||||
--iterations_per_loop=50 \
|
||||
--model_name="mobilenet_v2" \
|
||||
--moving_average_decay=0.9999 \
|
||||
--label_smoothing=0.1 \
|
||||
--preprocessing_name="inception_v2" \
|
||||
--weight_decay='0.00004' \
|
||||
--batch_size=256 \
|
||||
--learning_rate_decay_type='cosine_annealing' \
|
||||
--learning_rate=0.4 \
|
||||
--optimizer='momentum' \
|
||||
--momentum='0.9' \
|
||||
--warmup_epochs=5
|
||||
```
|
||||
- Run the program
|
||||
```
|
||||
bash run_1p.sh
|
||||
```
|
||||
- Case for 8 NPUs
|
||||
- In *train_8p.sh*, python scripts part should look like as follows.
|
||||
```shell
|
||||
python3.7 ${currentDir}/train.py \
|
||||
--dataset_dir=/opt/npu/slimImagenet \
|
||||
--max_epoch=300 \
|
||||
--model_name="mobilenet_v2" \
|
||||
--moving_average_decay=0.9999 \
|
||||
--label_smoothing=0.1 \
|
||||
--preprocessing_name="inception_v2" \
|
||||
--weight_decay='0.00004' \
|
||||
--batch_size=256 \
|
||||
--learning_rate_decay_type='cosine_annealing' \
|
||||
--learning_rate=0.8 \
|
||||
--optimizer='momentum' \
|
||||
--momentum='0.9' \
|
||||
--warmup_epochs=5
|
||||
```
|
||||
- Run the program
|
||||
```
|
||||
bash run_8p.sh
|
||||
```
|
||||
|
||||
### 4. Test
|
||||
- We evaluate results by using following commands:
|
||||
```shell
|
||||
python3.7 eval_image_classifier_mobilenet.py --dataset_dir=/opt/npu/slimImagenet \
|
||||
--checkpoint_path=result/8p/0/results/model.ckpt-187500
|
||||
```
|
||||
Remember to modify the dataset path and checkpoint path, then run the command.
|
||||
|
||||
|
||||
## Advanced
|
||||
### Commmand-line options
|
||||
|
||||
We list those important parameters to train this network here. For more details of all the parameters, please read *train.py* and other related files.
|
||||
|
||||
```
|
||||
--dataset_dir directory of dataset (default: /opt/npu/models/slimImagenet)
|
||||
--max_epoch number of epochs to train the model (default: 200)
|
||||
--max_train_steps max number of training steps (default: 500)
|
||||
--iterations_per_loop number of steps to run in devices each iteration (default: None)
|
||||
--model_name name of the model to train (default: mobilenet_v2_140)
|
||||
--moving_average_decay the decay to use for the moving average (default: None)
|
||||
--label_smoothing use label smooth in cross entropy (default: 0.1)
|
||||
--preprocessing_name preprocessing method for training (default: inception_v2)
|
||||
--weight_decay weight decay for regularization loss (default: 0)
|
||||
--batch_size batch size per npu (default: 96)
|
||||
--learning_rate_decay_type learning rate decay type (default: fixed)
|
||||
--learning_rate initial learning rate (default: 0.1)
|
||||
--optimizer the name of optimizer (default: sgd)
|
||||
--momentum momentum value used in optimizer (default: 0.9)
|
||||
--warmup_epochs warmup epochs for learning rate (default: 5)
|
||||
```
|
||||
|
||||
### Training process
|
||||
|
||||
All the results of the training will be stored in the directory `result`.
|
||||
|
||||
## Performance
|
||||
|
||||
### Result
|
||||
|
||||
Our result were obtained by running the applicable training script. To achieve the same results, follow the steps in the Quick Start Guide.
|
||||
|
||||
#### Training accuracy results
|
||||
|
||||
| **epochs** | Top1 |
|
||||
| :--------: | :------------: |
|
||||
| 300 | 72.47% |
|
||||
|
||||
#### Training performance results
|
||||
| **NPUs** | train performance |
|
||||
| :------: | :---------------: |
|
||||
| 1 | 1400 img/s |
|
||||
|
||||
| **NPUs** | train performance |
|
||||
| :------: | :---------------: |
|
||||
| 8 | 11000 img/s |
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
+240
@@ -0,0 +1,240 @@
|
||||
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
|
||||
"""Functions to read, decode and pre-process input data for the Model.
|
||||
"""
|
||||
import collections
|
||||
import sys
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.python.data.experimental.ops import threadpool
|
||||
|
||||
# from tensorflow.contrib import slim
|
||||
|
||||
InputEndpoints = collections.namedtuple(
|
||||
'InputEndpoints', ['images', 'images_orig', 'labels', 'labels_one_hot'])
|
||||
ShuffleBatchConfig = collections.namedtuple('ShuffleBatchConfig', [
|
||||
'num_batching_threads', 'queue_capacity', 'min_after_dequeue'
|
||||
])
|
||||
|
||||
DEFAULT_SHUFFLE_CONFIG = ShuffleBatchConfig(
|
||||
num_batching_threads=8, queue_capacity=3000, min_after_dequeue=1000)
|
||||
|
||||
|
||||
def get_data_files(data_sources):
|
||||
from tensorflow.python.platform import gfile
|
||||
if isinstance(data_sources, (list, tuple)):
|
||||
data_files = []
|
||||
for source in data_sources:
|
||||
data_files += get_data_files(source)
|
||||
else:
|
||||
if '*' in data_sources or '?' in data_sources or '[' in data_sources:
|
||||
data_files = gfile.Glob(data_sources)
|
||||
else:
|
||||
data_files = [data_sources]
|
||||
if not data_files:
|
||||
raise ValueError('No data files found in %s' % (data_sources,))
|
||||
return data_files
|
||||
|
||||
|
||||
def preprocess_image(image, location, label_one_hot, height=224, width=224):
|
||||
"""Prepare one image for evaluation.
|
||||
If height and width are specified it would output an image with that size by
|
||||
applying resize_bilinear.
|
||||
If central_fraction is specified it would cropt the central fraction of the
|
||||
input image.
|
||||
Args:
|
||||
image: 3-D Tensor of image. If dtype is tf.float32 then the range should be
|
||||
[0, 1], otherwise it would converted to tf.float32 assuming that the range
|
||||
is [0, MAX], where MAX is largest positive representable number for
|
||||
int(8/16/32) data type (see `tf.image.convert_image_dtype` for details)
|
||||
height: integer
|
||||
width: integer
|
||||
central_fraction: Optional Float, fraction of the image to crop.
|
||||
scope: Optional scope for name_scope.
|
||||
Returns:
|
||||
3-D float Tensor of prepared image.
|
||||
"""
|
||||
|
||||
# if image.dtype != tf.float32:
|
||||
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
|
||||
# Crop the central region of the image with an area containing 87.5% of
|
||||
# the original image.
|
||||
# if central_fraction:
|
||||
# image = tf.image.central_crop(image, central_fraction=central_fraction)
|
||||
|
||||
# if height and width:
|
||||
# Resize the image to the specified height and width.
|
||||
image = tf.expand_dims(image, 0)
|
||||
image = tf.image.resize_bilinear(image, [height, width], align_corners=False)
|
||||
image = tf.squeeze(image, [0])
|
||||
|
||||
# image = tf.cast(image, tf.float32)
|
||||
# image = tf.multiply(image, 1/255.)
|
||||
image = tf.subtract(image, 0.5)
|
||||
image = tf.multiply(image, 2.0)
|
||||
|
||||
return image, location, label_one_hot
|
||||
|
||||
|
||||
def _int64_feature(value):
|
||||
"""Wrapper for inserting int64 features into Example proto."""
|
||||
if not isinstance(value, list):
|
||||
value = [value]
|
||||
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
|
||||
|
||||
|
||||
def parse_example_proto(example_serialized, num_classes, labels_offset, image_preprocessing_fn):
|
||||
feature_map = {
|
||||
'image/encoded': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/class/label': tf.FixedLenFeature([1], tf.int64, -1),
|
||||
'image/class/text': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
|
||||
}
|
||||
with tf.compat.v1.name_scope('deserialize_image_record'):
|
||||
obj = tf.io.parse_single_example(serialized=example_serialized, features=feature_map)
|
||||
image = tf.image.decode_jpeg(obj['image/encoded'], channels=3, fancy_upscaling=False,
|
||||
dct_method='INTEGER_FAST')
|
||||
if image_preprocessing_fn:
|
||||
image = image_preprocessing_fn(image, 224, 224)
|
||||
else:
|
||||
image = tf.image.resize(image, [224, 224])
|
||||
|
||||
label = tf.cast(obj['image/class/label'], tf.int32)
|
||||
label = tf.squeeze(label)
|
||||
label -= labels_offset
|
||||
label = tf.one_hot(label, num_classes - labels_offset)
|
||||
return image, label
|
||||
|
||||
|
||||
def parse_example_decode(example_serialized):
|
||||
feature_map = {
|
||||
'image/encoded': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/class/label': tf.FixedLenFeature([1], tf.int64, -1),
|
||||
'image/class/text': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
|
||||
}
|
||||
with tf.compat.v1.name_scope('deserialize_image_record'):
|
||||
obj = tf.io.parse_single_example(serialized=example_serialized, features=feature_map)
|
||||
image = tf.image.decode_jpeg(obj['image/encoded'], channels=3, fancy_upscaling=False,
|
||||
dct_method='INTEGER_FAST')
|
||||
|
||||
return image, obj['image/class/label']
|
||||
|
||||
|
||||
def parse_example(image, label, num_classes, labels_offset, image_preprocessing_fn):
|
||||
with tf.compat.v1.name_scope('deserialize_image_record'):
|
||||
if image_preprocessing_fn:
|
||||
image = image_preprocessing_fn(image, 224, 224)
|
||||
else:
|
||||
image = tf.image.resize(image, [224, 224])
|
||||
|
||||
label = tf.cast(label, tf.int32)
|
||||
label = tf.squeeze(label)
|
||||
label -= labels_offset
|
||||
label = tf.one_hot(label, num_classes - labels_offset)
|
||||
return image, label
|
||||
|
||||
|
||||
def parse_example1(example_serialized, image_preprocessing_fn1):
|
||||
feature_map = {
|
||||
'image/encoded': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/class/label': tf.FixedLenFeature([1], tf.int64, -1),
|
||||
'image/class/text': tf.FixedLenFeature([], tf.string, ''),
|
||||
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
|
||||
}
|
||||
with tf.compat.v1.name_scope('deserialize_image_record'):
|
||||
obj = tf.io.parse_single_example(serialized=example_serialized, features=feature_map)
|
||||
image = tf.image.decode_jpeg(obj['image/encoded'], channels=3, fancy_upscaling=False,
|
||||
dct_method='INTEGER_FAST')
|
||||
|
||||
image = image_preprocessing_fn1(image, 224, 224)
|
||||
return image, obj['image/class/label']
|
||||
|
||||
|
||||
def parse_example2(image, label, num_classes, labels_offset, image_preprocessing_fn2):
|
||||
with tf.compat.v1.name_scope('deserialize_image_record'):
|
||||
image = image_preprocessing_fn2(image, 224, 224)
|
||||
|
||||
label = tf.cast(label, tf.int32)
|
||||
label = tf.squeeze(label)
|
||||
label -= labels_offset
|
||||
label = tf.one_hot(label, num_classes - labels_offset)
|
||||
return image, label
|
||||
|
||||
|
||||
def get_data(dataset, batch_size, num_classes, labels_offset, is_training,
|
||||
preprocessing_name=None, use_grayscale=None, add_image_summaries=False):
|
||||
return get_data_united(dataset, batch_size, num_classes, labels_offset, is_training,
|
||||
preprocessing_name, use_grayscale, add_image_summaries)
|
||||
|
||||
|
||||
def create_ds(data_sources, is_training):
|
||||
data_files = get_data_files(data_sources)
|
||||
ds = tf.data.Dataset.from_tensor_slices(data_files)
|
||||
|
||||
if is_training:
|
||||
ds = ds.shuffle(1000)
|
||||
# add for eval
|
||||
else:
|
||||
ds = ds.take(50000)
|
||||
|
||||
##### change #####
|
||||
num_readers = 10
|
||||
ds = ds.interleave(
|
||||
tf.data.TFRecordDataset, cycle_length=num_readers, block_length=1,
|
||||
num_parallel_calls=tf.data.experimental.AUTOTUNE)
|
||||
counter = tf.data.Dataset.range(sys.maxsize)
|
||||
ds = tf.data.Dataset.zip((ds, counter))
|
||||
##### change #####
|
||||
|
||||
if is_training:
|
||||
ds = ds.repeat()
|
||||
|
||||
return ds
|
||||
|
||||
|
||||
def get_data_united(dataset, batch_size, num_classes, labels_offset, is_training,
|
||||
preprocessing_name=None, use_grayscale=None, add_image_summaries=False):
|
||||
from preprocessing import preprocessing_factory
|
||||
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
|
||||
name='inception_v2',
|
||||
is_training=is_training,
|
||||
use_grayscale=use_grayscale,
|
||||
add_image_summaries=add_image_summaries
|
||||
)
|
||||
|
||||
ds = create_ds(dataset.data_sources, is_training)
|
||||
|
||||
ds = ds.map(lambda example, counter: parse_example_proto(example, num_classes, labels_offset, image_preprocessing_fn), num_parallel_calls=24)
|
||||
|
||||
ds = ds.batch(batch_size, drop_remainder=True)
|
||||
|
||||
ds = ds.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
|
||||
|
||||
iterator = ds.make_initializable_iterator()
|
||||
|
||||
ds = threadpool.override_threadpool(ds,threadpool.PrivateThreadPool(128, display_name='input_pipeline_thread_pool'))
|
||||
|
||||
return iterator, ds
|
||||
+1
@@ -0,0 +1 @@
|
||||
|
||||
+705
@@ -0,0 +1,705 @@
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Converts ImageNet data to TFRecords file format with Example protos.
|
||||
|
||||
The raw ImageNet data set is expected to reside in JPEG files located in the
|
||||
following directory structure.
|
||||
|
||||
data_dir/n01440764/ILSVRC2012_val_00000293.JPEG
|
||||
data_dir/n01440764/ILSVRC2012_val_00000543.JPEG
|
||||
...
|
||||
|
||||
where 'n01440764' is the unique synset label associated with
|
||||
these images.
|
||||
|
||||
The training data set consists of 1000 sub-directories (i.e. labels)
|
||||
each containing 1200 JPEG images for a total of 1.2M JPEG images.
|
||||
|
||||
The evaluation data set consists of 1000 sub-directories (i.e. labels)
|
||||
each containing 50 JPEG images for a total of 50K JPEG images.
|
||||
|
||||
This TensorFlow script converts the training and evaluation data into
|
||||
a sharded data set consisting of 1024 and 128 TFRecord files, respectively.
|
||||
|
||||
train_directory/train-00000-of-01024
|
||||
train_directory/train-00001-of-01024
|
||||
...
|
||||
train_directory/train-00127-of-01024
|
||||
|
||||
and
|
||||
|
||||
validation_directory/validation-00000-of-00128
|
||||
validation_directory/validation-00001-of-00128
|
||||
...
|
||||
validation_directory/validation-00127-of-00128
|
||||
|
||||
Each validation TFRecord file contains ~390 records. Each training TFREcord
|
||||
file contains ~1250 records. Each record within the TFRecord file is a
|
||||
serialized Example proto. The Example proto contains the following fields:
|
||||
|
||||
image/encoded: string containing JPEG encoded image in RGB colorspace
|
||||
image/height: integer, image height in pixels
|
||||
image/width: integer, image width in pixels
|
||||
image/colorspace: string, specifying the colorspace, always 'RGB'
|
||||
image/channels: integer, specifying the number of channels, always 3
|
||||
image/format: string, specifying the format, always'JPEG'
|
||||
|
||||
image/filename: string containing the basename of the image file
|
||||
e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG'
|
||||
image/class/label: integer specifying the index in a classification layer.
|
||||
The label ranges from [1, 1000] where 0 is not used.
|
||||
image/class/synset: string specifying the unique ID of the label,
|
||||
e.g. 'n01440764'
|
||||
image/class/text: string specifying the human-readable version of the label
|
||||
e.g. 'red fox, Vulpes vulpes'
|
||||
|
||||
image/object/bbox/xmin: list of integers specifying the 0+ human annotated
|
||||
bounding boxes
|
||||
image/object/bbox/xmax: list of integers specifying the 0+ human annotated
|
||||
bounding boxes
|
||||
image/object/bbox/ymin: list of integers specifying the 0+ human annotated
|
||||
bounding boxes
|
||||
image/object/bbox/ymax: list of integers specifying the 0+ human annotated
|
||||
bounding boxes
|
||||
image/object/bbox/label: integer specifying the index in a classification
|
||||
layer. The label ranges from [1, 1000] where 0 is not used. Note this is
|
||||
always identical to the image label.
|
||||
|
||||
Note that the length of xmin is identical to the length of xmax, ymin and ymax
|
||||
for each example.
|
||||
|
||||
Running this script using 16 threads may take around ~2.5 hours on a HP Z420.
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from datetime import datetime
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
import threading
|
||||
|
||||
import numpy as np
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
import tensorflow as tf
|
||||
|
||||
|
||||
tf.app.flags.DEFINE_string('train_directory', '/tmp/',
|
||||
'Training data directory')
|
||||
tf.app.flags.DEFINE_string('validation_directory', '/tmp/',
|
||||
'Validation data directory')
|
||||
tf.app.flags.DEFINE_string('output_directory', '/tmp/',
|
||||
'Output data directory')
|
||||
|
||||
tf.app.flags.DEFINE_integer('train_shards', 1024,
|
||||
'Number of shards in training TFRecord files.')
|
||||
tf.app.flags.DEFINE_integer('validation_shards', 128,
|
||||
'Number of shards in validation TFRecord files.')
|
||||
|
||||
tf.app.flags.DEFINE_integer('num_threads', 8,
|
||||
'Number of threads to preprocess the images.')
|
||||
|
||||
# The labels file contains a list of valid labels are held in this file.
|
||||
# Assumes that the file contains entries as such:
|
||||
# n01440764
|
||||
# n01443537
|
||||
# n01484850
|
||||
# where each line corresponds to a label expressed as a synset. We map
|
||||
# each synset contained in the file to an integer (based on the alphabetical
|
||||
# ordering). See below for details.
|
||||
tf.app.flags.DEFINE_string('labels_file',
|
||||
'imagenet_lsvrc_2015_synsets.txt',
|
||||
'Labels file')
|
||||
|
||||
# This file containing mapping from synset to human-readable label.
|
||||
# Assumes each line of the file looks like:
|
||||
#
|
||||
# n02119247 black fox
|
||||
# n02119359 silver fox
|
||||
# n02119477 red fox, Vulpes fulva
|
||||
#
|
||||
# where each line corresponds to a unique mapping. Note that each line is
|
||||
# formatted as <synset>\t<human readable label>.
|
||||
tf.app.flags.DEFINE_string('imagenet_metadata_file',
|
||||
'imagenet_metadata.txt',
|
||||
'ImageNet metadata file')
|
||||
|
||||
# This file is the output of process_bounding_box.py
|
||||
# Assumes each line of the file looks like:
|
||||
#
|
||||
# n00007846_64193.JPEG,0.0060,0.2620,0.7545,0.9940
|
||||
#
|
||||
# where each line corresponds to one bounding box annotation associated
|
||||
# with an image. Each line can be parsed as:
|
||||
#
|
||||
# <JPEG file name>, <xmin>, <ymin>, <xmax>, <ymax>
|
||||
#
|
||||
# Note that there might exist mulitple bounding box annotations associated
|
||||
# with an image file.
|
||||
tf.app.flags.DEFINE_string('bounding_box_file',
|
||||
'./imagenet_2012_bounding_boxes.csv',
|
||||
'Bounding box file')
|
||||
|
||||
FLAGS = tf.app.flags.FLAGS
|
||||
|
||||
|
||||
def _int64_feature(value):
|
||||
"""Wrapper for inserting int64 features into Example proto."""
|
||||
if not isinstance(value, list):
|
||||
value = [value]
|
||||
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
|
||||
|
||||
|
||||
def _float_feature(value):
|
||||
"""Wrapper for inserting float features into Example proto."""
|
||||
if not isinstance(value, list):
|
||||
value = [value]
|
||||
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
|
||||
|
||||
|
||||
def _bytes_feature(value):
|
||||
"""Wrapper for inserting bytes features into Example proto."""
|
||||
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
|
||||
|
||||
|
||||
def _convert_to_example(filename, image_buffer, label, synset, human, bbox,
|
||||
height, width):
|
||||
"""Build an Example proto for an example.
|
||||
|
||||
Args:
|
||||
filename: string, path to an image file, e.g., '/path/to/example.JPG'
|
||||
image_buffer: string, JPEG encoding of RGB image
|
||||
label: integer, identifier for the ground truth for the network
|
||||
synset: string, unique WordNet ID specifying the label, e.g., 'n02323233'
|
||||
human: string, human-readable label, e.g., 'red fox, Vulpes vulpes'
|
||||
bbox: list of bounding boxes; each box is a list of integers
|
||||
specifying [xmin, ymin, xmax, ymax]. All boxes are assumed to belong to
|
||||
the same label as the image label.
|
||||
height: integer, image height in pixels
|
||||
width: integer, image width in pixels
|
||||
Returns:
|
||||
Example proto
|
||||
"""
|
||||
xmin = []
|
||||
ymin = []
|
||||
xmax = []
|
||||
ymax = []
|
||||
for b in bbox:
|
||||
assert len(b) == 4
|
||||
# pylint: disable=expression-not-assigned
|
||||
[l.append(point) for l, point in zip([xmin, ymin, xmax, ymax], b)]
|
||||
# pylint: enable=expression-not-assigned
|
||||
|
||||
colorspace = 'RGB'
|
||||
channels = 3
|
||||
image_format = 'JPEG'
|
||||
|
||||
example = tf.train.Example(features=tf.train.Features(feature={
|
||||
'image/height': _int64_feature(height),
|
||||
'image/width': _int64_feature(width),
|
||||
'image/colorspace': _bytes_feature(colorspace),
|
||||
'image/channels': _int64_feature(channels),
|
||||
'image/class/label': _int64_feature(label),
|
||||
'image/class/synset': _bytes_feature(synset),
|
||||
'image/class/text': _bytes_feature(human),
|
||||
'image/object/bbox/xmin': _float_feature(xmin),
|
||||
'image/object/bbox/xmax': _float_feature(xmax),
|
||||
'image/object/bbox/ymin': _float_feature(ymin),
|
||||
'image/object/bbox/ymax': _float_feature(ymax),
|
||||
'image/object/bbox/label': _int64_feature([label] * len(xmin)),
|
||||
'image/format': _bytes_feature(image_format),
|
||||
'image/filename': _bytes_feature(os.path.basename(filename)),
|
||||
'image/encoded': _bytes_feature(image_buffer)}))
|
||||
return example
|
||||
|
||||
|
||||
class ImageCoder(object):
|
||||
"""Helper class that provides TensorFlow image coding utilities."""
|
||||
|
||||
def __init__(self):
|
||||
# Create a single Session to run all image coding calls.
|
||||
self._sess = tf.Session()
|
||||
|
||||
# Initializes function that converts PNG to JPEG data.
|
||||
self._png_data = tf.placeholder(dtype=tf.string)
|
||||
image = tf.image.decode_png(self._png_data, channels=3)
|
||||
self._png_to_jpeg = tf.image.encode_jpeg(image, format='rgb', quality=100)
|
||||
|
||||
# Initializes function that converts CMYK JPEG data to RGB JPEG data.
|
||||
self._cmyk_data = tf.placeholder(dtype=tf.string)
|
||||
image = tf.image.decode_jpeg(self._cmyk_data, channels=0)
|
||||
self._cmyk_to_rgb = tf.image.encode_jpeg(image, format='rgb', quality=100)
|
||||
|
||||
# Initializes function that decodes RGB JPEG data.
|
||||
self._decode_jpeg_data = tf.placeholder(dtype=tf.string)
|
||||
self._decode_jpeg = tf.image.decode_jpeg(self._decode_jpeg_data, channels=3)
|
||||
|
||||
def png_to_jpeg(self, image_data):
|
||||
return self._sess.run(self._png_to_jpeg,
|
||||
feed_dict={self._png_data: image_data})
|
||||
|
||||
def cmyk_to_rgb(self, image_data):
|
||||
return self._sess.run(self._cmyk_to_rgb,
|
||||
feed_dict={self._cmyk_data: image_data})
|
||||
|
||||
def decode_jpeg(self, image_data):
|
||||
image = self._sess.run(self._decode_jpeg,
|
||||
feed_dict={self._decode_jpeg_data: image_data})
|
||||
assert len(image.shape) == 3
|
||||
assert image.shape[2] == 3
|
||||
return image
|
||||
|
||||
|
||||
def _is_png(filename):
|
||||
"""Determine if a file contains a PNG format image.
|
||||
|
||||
Args:
|
||||
filename: string, path of the image file.
|
||||
|
||||
Returns:
|
||||
boolean indicating if the image is a PNG.
|
||||
"""
|
||||
# File list from:
|
||||
# https://groups.google.com/forum/embed/?place=forum/torch7#!topic/torch7/fOSTXHIESSU
|
||||
return 'n02105855_2933.JPEG' in filename
|
||||
|
||||
|
||||
def _is_cmyk(filename):
|
||||
"""Determine if file contains a CMYK JPEG format image.
|
||||
|
||||
Args:
|
||||
filename: string, path of the image file.
|
||||
|
||||
Returns:
|
||||
boolean indicating if the image is a JPEG encoded with CMYK color space.
|
||||
"""
|
||||
# File list from:
|
||||
# https://github.com/cytsai/ilsvrc-cmyk-image-list
|
||||
blacklist = ['n01739381_1309.JPEG', 'n02077923_14822.JPEG',
|
||||
'n02447366_23489.JPEG', 'n02492035_15739.JPEG',
|
||||
'n02747177_10752.JPEG', 'n03018349_4028.JPEG',
|
||||
'n03062245_4620.JPEG', 'n03347037_9675.JPEG',
|
||||
'n03467068_12171.JPEG', 'n03529860_11437.JPEG',
|
||||
'n03544143_17228.JPEG', 'n03633091_5218.JPEG',
|
||||
'n03710637_5125.JPEG', 'n03961711_5286.JPEG',
|
||||
'n04033995_2932.JPEG', 'n04258138_17003.JPEG',
|
||||
'n04264628_27969.JPEG', 'n04336792_7448.JPEG',
|
||||
'n04371774_5854.JPEG', 'n04596742_4225.JPEG',
|
||||
'n07583066_647.JPEG', 'n13037406_4650.JPEG']
|
||||
return filename.split('/')[-1] in blacklist
|
||||
|
||||
|
||||
def _process_image(filename, coder):
|
||||
"""Process a single image file.
|
||||
|
||||
Args:
|
||||
filename: string, path to an image file e.g., '/path/to/example.JPG'.
|
||||
coder: instance of ImageCoder to provide TensorFlow image coding utils.
|
||||
Returns:
|
||||
image_buffer: string, JPEG encoding of RGB image.
|
||||
height: integer, image height in pixels.
|
||||
width: integer, image width in pixels.
|
||||
"""
|
||||
# Read the image file.
|
||||
image_data = tf.gfile.GFile(filename, 'r').read()
|
||||
|
||||
# Clean the dirty data.
|
||||
if _is_png(filename):
|
||||
# 1 image is a PNG.
|
||||
print('Converting PNG to JPEG for %s' % filename)
|
||||
image_data = coder.png_to_jpeg(image_data)
|
||||
elif _is_cmyk(filename):
|
||||
# 22 JPEG images are in CMYK colorspace.
|
||||
print('Converting CMYK to RGB for %s' % filename)
|
||||
image_data = coder.cmyk_to_rgb(image_data)
|
||||
|
||||
# Decode the RGB JPEG.
|
||||
image = coder.decode_jpeg(image_data)
|
||||
|
||||
# Check that image converted to RGB
|
||||
assert len(image.shape) == 3
|
||||
height = image.shape[0]
|
||||
width = image.shape[1]
|
||||
assert image.shape[2] == 3
|
||||
|
||||
return image_data, height, width
|
||||
|
||||
|
||||
def _process_image_files_batch(coder, thread_index, ranges, name, filenames,
|
||||
synsets, labels, humans, bboxes, num_shards):
|
||||
"""Processes and saves list of images as TFRecord in 1 thread.
|
||||
|
||||
Args:
|
||||
coder: instance of ImageCoder to provide TensorFlow image coding utils.
|
||||
thread_index: integer, unique batch to run index is within [0, len(ranges)).
|
||||
ranges: list of pairs of integers specifying ranges of each batches to
|
||||
analyze in parallel.
|
||||
name: string, unique identifier specifying the data set
|
||||
filenames: list of strings; each string is a path to an image file
|
||||
synsets: list of strings; each string is a unique WordNet ID
|
||||
labels: list of integer; each integer identifies the ground truth
|
||||
humans: list of strings; each string is a human-readable label
|
||||
bboxes: list of bounding boxes for each image. Note that each entry in this
|
||||
list might contain from 0+ entries corresponding to the number of bounding
|
||||
box annotations for the image.
|
||||
num_shards: integer number of shards for this data set.
|
||||
"""
|
||||
# Each thread produces N shards where N = int(num_shards / num_threads).
|
||||
# For instance, if num_shards = 128, and the num_threads = 2, then the first
|
||||
# thread would produce shards [0, 64).
|
||||
num_threads = len(ranges)
|
||||
assert not num_shards % num_threads
|
||||
num_shards_per_batch = int(num_shards / num_threads)
|
||||
|
||||
shard_ranges = np.linspace(ranges[thread_index][0],
|
||||
ranges[thread_index][1],
|
||||
num_shards_per_batch + 1).astype(int)
|
||||
num_files_in_thread = ranges[thread_index][1] - ranges[thread_index][0]
|
||||
|
||||
counter = 0
|
||||
for s in xrange(num_shards_per_batch):
|
||||
# Generate a sharded version of the file name, e.g. 'train-00002-of-00010'
|
||||
shard = thread_index * num_shards_per_batch + s
|
||||
output_filename = '%s-%.5d-of-%.5d' % (name, shard, num_shards)
|
||||
output_file = os.path.join(FLAGS.output_directory, output_filename)
|
||||
writer = tf.python_io.TFRecordWriter(output_file)
|
||||
|
||||
shard_counter = 0
|
||||
files_in_shard = np.arange(shard_ranges[s], shard_ranges[s + 1], dtype=int)
|
||||
for i in files_in_shard:
|
||||
filename = filenames[i]
|
||||
label = labels[i]
|
||||
synset = synsets[i]
|
||||
human = humans[i]
|
||||
bbox = bboxes[i]
|
||||
|
||||
image_buffer, height, width = _process_image(filename, coder)
|
||||
|
||||
example = _convert_to_example(filename, image_buffer, label,
|
||||
synset, human, bbox,
|
||||
height, width)
|
||||
writer.write(example.SerializeToString())
|
||||
shard_counter += 1
|
||||
counter += 1
|
||||
|
||||
if not counter % 1000:
|
||||
print('%s [thread %d]: Processed %d of %d images in thread batch.' %
|
||||
(datetime.now(), thread_index, counter, num_files_in_thread))
|
||||
sys.stdout.flush()
|
||||
|
||||
writer.close()
|
||||
print('%s [thread %d]: Wrote %d images to %s' %
|
||||
(datetime.now(), thread_index, shard_counter, output_file))
|
||||
sys.stdout.flush()
|
||||
shard_counter = 0
|
||||
print('%s [thread %d]: Wrote %d images to %d shards.' %
|
||||
(datetime.now(), thread_index, counter, num_files_in_thread))
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def _process_image_files(name, filenames, synsets, labels, humans,
|
||||
bboxes, num_shards):
|
||||
"""Process and save list of images as TFRecord of Example protos.
|
||||
|
||||
Args:
|
||||
name: string, unique identifier specifying the data set
|
||||
filenames: list of strings; each string is a path to an image file
|
||||
synsets: list of strings; each string is a unique WordNet ID
|
||||
labels: list of integer; each integer identifies the ground truth
|
||||
humans: list of strings; each string is a human-readable label
|
||||
bboxes: list of bounding boxes for each image. Note that each entry in this
|
||||
list might contain from 0+ entries corresponding to the number of bounding
|
||||
box annotations for the image.
|
||||
num_shards: integer number of shards for this data set.
|
||||
"""
|
||||
assert len(filenames) == len(synsets)
|
||||
assert len(filenames) == len(labels)
|
||||
assert len(filenames) == len(humans)
|
||||
assert len(filenames) == len(bboxes)
|
||||
|
||||
# Break all images into batches with a [ranges[i][0], ranges[i][1]].
|
||||
spacing = np.linspace(0, len(filenames), FLAGS.num_threads + 1).astype(np.int)
|
||||
ranges = []
|
||||
threads = []
|
||||
for i in xrange(len(spacing) - 1):
|
||||
ranges.append([spacing[i], spacing[i+1]])
|
||||
|
||||
# Launch a thread for each batch.
|
||||
print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges))
|
||||
sys.stdout.flush()
|
||||
|
||||
# Create a mechanism for monitoring when all threads are finished.
|
||||
coord = tf.train.Coordinator()
|
||||
|
||||
# Create a generic TensorFlow-based utility for converting all image codings.
|
||||
coder = ImageCoder()
|
||||
|
||||
threads = []
|
||||
for thread_index in xrange(len(ranges)):
|
||||
args = (coder, thread_index, ranges, name, filenames,
|
||||
synsets, labels, humans, bboxes, num_shards)
|
||||
t = threading.Thread(target=_process_image_files_batch, args=args)
|
||||
t.start()
|
||||
threads.append(t)
|
||||
|
||||
# Wait for all the threads to terminate.
|
||||
coord.join(threads)
|
||||
print('%s: Finished writing all %d images in data set.' %
|
||||
(datetime.now(), len(filenames)))
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def _find_image_files(data_dir, labels_file):
|
||||
"""Build a list of all images files and labels in the data set.
|
||||
|
||||
Args:
|
||||
data_dir: string, path to the root directory of images.
|
||||
|
||||
Assumes that the ImageNet data set resides in JPEG files located in
|
||||
the following directory structure.
|
||||
|
||||
data_dir/n01440764/ILSVRC2012_val_00000293.JPEG
|
||||
data_dir/n01440764/ILSVRC2012_val_00000543.JPEG
|
||||
|
||||
where 'n01440764' is the unique synset label associated with these images.
|
||||
|
||||
labels_file: string, path to the labels file.
|
||||
|
||||
The list of valid labels are held in this file. Assumes that the file
|
||||
contains entries as such:
|
||||
n01440764
|
||||
n01443537
|
||||
n01484850
|
||||
where each line corresponds to a label expressed as a synset. We map
|
||||
each synset contained in the file to an integer (based on the alphabetical
|
||||
ordering) starting with the integer 1 corresponding to the synset
|
||||
contained in the first line.
|
||||
|
||||
The reason we start the integer labels at 1 is to reserve label 0 as an
|
||||
unused background class.
|
||||
|
||||
Returns:
|
||||
filenames: list of strings; each string is a path to an image file.
|
||||
synsets: list of strings; each string is a unique WordNet ID.
|
||||
labels: list of integer; each integer identifies the ground truth.
|
||||
"""
|
||||
print('Determining list of input files and labels from %s.' % data_dir)
|
||||
challenge_synsets = [
|
||||
l.strip() for l in tf.gfile.GFile(labels_file, 'r').readlines()
|
||||
]
|
||||
|
||||
labels = []
|
||||
filenames = []
|
||||
synsets = []
|
||||
|
||||
# Leave label index 0 empty as a background class.
|
||||
label_index = 1
|
||||
|
||||
# Construct the list of JPEG files and labels.
|
||||
for synset in challenge_synsets:
|
||||
jpeg_file_path = '%s/%s/*.JPEG' % (data_dir, synset)
|
||||
matching_files = tf.gfile.Glob(jpeg_file_path)
|
||||
|
||||
labels.extend([label_index] * len(matching_files))
|
||||
synsets.extend([synset] * len(matching_files))
|
||||
filenames.extend(matching_files)
|
||||
|
||||
if not label_index % 100:
|
||||
print('Finished finding files in %d of %d classes.' % (
|
||||
label_index, len(challenge_synsets)))
|
||||
label_index += 1
|
||||
|
||||
# Shuffle the ordering of all image files in order to guarantee
|
||||
# random ordering of the images with respect to label in the
|
||||
# saved TFRecord files. Make the randomization repeatable.
|
||||
shuffled_index = range(len(filenames))
|
||||
random.seed(12345)
|
||||
random.shuffle(shuffled_index)
|
||||
|
||||
filenames = [filenames[i] for i in shuffled_index]
|
||||
synsets = [synsets[i] for i in shuffled_index]
|
||||
labels = [labels[i] for i in shuffled_index]
|
||||
|
||||
print('Found %d JPEG files across %d labels inside %s.' %
|
||||
(len(filenames), len(challenge_synsets), data_dir))
|
||||
return filenames, synsets, labels
|
||||
|
||||
|
||||
def _find_human_readable_labels(synsets, synset_to_human):
|
||||
"""Build a list of human-readable labels.
|
||||
|
||||
Args:
|
||||
synsets: list of strings; each string is a unique WordNet ID.
|
||||
synset_to_human: dict of synset to human labels, e.g.,
|
||||
'n02119022' --> 'red fox, Vulpes vulpes'
|
||||
|
||||
Returns:
|
||||
List of human-readable strings corresponding to each synset.
|
||||
"""
|
||||
humans = []
|
||||
for s in synsets:
|
||||
assert s in synset_to_human, ('Failed to find: %s' % s)
|
||||
humans.append(synset_to_human[s])
|
||||
return humans
|
||||
|
||||
|
||||
def _find_image_bounding_boxes(filenames, image_to_bboxes):
|
||||
"""Find the bounding boxes for a given image file.
|
||||
|
||||
Args:
|
||||
filenames: list of strings; each string is a path to an image file.
|
||||
image_to_bboxes: dictionary mapping image file names to a list of
|
||||
bounding boxes. This list contains 0+ bounding boxes.
|
||||
Returns:
|
||||
List of bounding boxes for each image. Note that each entry in this
|
||||
list might contain from 0+ entries corresponding to the number of bounding
|
||||
box annotations for the image.
|
||||
"""
|
||||
num_image_bbox = 0
|
||||
bboxes = []
|
||||
for f in filenames:
|
||||
basename = os.path.basename(f)
|
||||
if basename in image_to_bboxes:
|
||||
bboxes.append(image_to_bboxes[basename])
|
||||
num_image_bbox += 1
|
||||
else:
|
||||
bboxes.append([])
|
||||
print('Found %d images with bboxes out of %d images' % (
|
||||
num_image_bbox, len(filenames)))
|
||||
return bboxes
|
||||
|
||||
|
||||
def _process_dataset(name, directory, num_shards, synset_to_human,
|
||||
image_to_bboxes):
|
||||
"""Process a complete data set and save it as a TFRecord.
|
||||
|
||||
Args:
|
||||
name: string, unique identifier specifying the data set.
|
||||
directory: string, root path to the data set.
|
||||
num_shards: integer number of shards for this data set.
|
||||
synset_to_human: dict of synset to human labels, e.g.,
|
||||
'n02119022' --> 'red fox, Vulpes vulpes'
|
||||
image_to_bboxes: dictionary mapping image file names to a list of
|
||||
bounding boxes. This list contains 0+ bounding boxes.
|
||||
"""
|
||||
filenames, synsets, labels = _find_image_files(directory, FLAGS.labels_file)
|
||||
humans = _find_human_readable_labels(synsets, synset_to_human)
|
||||
bboxes = _find_image_bounding_boxes(filenames, image_to_bboxes)
|
||||
_process_image_files(name, filenames, synsets, labels,
|
||||
humans, bboxes, num_shards)
|
||||
|
||||
|
||||
def _build_synset_lookup(imagenet_metadata_file):
|
||||
"""Build lookup for synset to human-readable label.
|
||||
|
||||
Args:
|
||||
imagenet_metadata_file: string, path to file containing mapping from
|
||||
synset to human-readable label.
|
||||
|
||||
Assumes each line of the file looks like:
|
||||
|
||||
n02119247 black fox
|
||||
n02119359 silver fox
|
||||
n02119477 red fox, Vulpes fulva
|
||||
|
||||
where each line corresponds to a unique mapping. Note that each line is
|
||||
formatted as <synset>\t<human readable label>.
|
||||
|
||||
Returns:
|
||||
Dictionary of synset to human labels, such as:
|
||||
'n02119022' --> 'red fox, Vulpes vulpes'
|
||||
"""
|
||||
lines = tf.gfile.GFile(imagenet_metadata_file, 'r').readlines()
|
||||
synset_to_human = {}
|
||||
for l in lines:
|
||||
if l:
|
||||
parts = l.strip().split('\t')
|
||||
assert len(parts) == 2
|
||||
synset = parts[0]
|
||||
human = parts[1]
|
||||
synset_to_human[synset] = human
|
||||
return synset_to_human
|
||||
|
||||
|
||||
def _build_bounding_box_lookup(bounding_box_file):
|
||||
"""Build a lookup from image file to bounding boxes.
|
||||
|
||||
Args:
|
||||
bounding_box_file: string, path to file with bounding boxes annotations.
|
||||
|
||||
Assumes each line of the file looks like:
|
||||
|
||||
n00007846_64193.JPEG,0.0060,0.2620,0.7545,0.9940
|
||||
|
||||
where each line corresponds to one bounding box annotation associated
|
||||
with an image. Each line can be parsed as:
|
||||
|
||||
<JPEG file name>, <xmin>, <ymin>, <xmax>, <ymax>
|
||||
|
||||
Note that there might exist mulitple bounding box annotations associated
|
||||
with an image file. This file is the output of process_bounding_boxes.py.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping image file names to a list of bounding boxes. This list
|
||||
contains 0+ bounding boxes.
|
||||
"""
|
||||
lines = tf.gfile.GFile(bounding_box_file, 'r').readlines()
|
||||
images_to_bboxes = {}
|
||||
num_bbox = 0
|
||||
num_image = 0
|
||||
for l in lines:
|
||||
if l:
|
||||
parts = l.split(',')
|
||||
assert len(parts) == 5, ('Failed to parse: %s' % l)
|
||||
filename = parts[0]
|
||||
xmin = float(parts[1])
|
||||
ymin = float(parts[2])
|
||||
xmax = float(parts[3])
|
||||
ymax = float(parts[4])
|
||||
box = [xmin, ymin, xmax, ymax]
|
||||
|
||||
if filename not in images_to_bboxes:
|
||||
images_to_bboxes[filename] = []
|
||||
num_image += 1
|
||||
images_to_bboxes[filename].append(box)
|
||||
num_bbox += 1
|
||||
|
||||
print('Successfully read %d bounding boxes '
|
||||
'across %d images.' % (num_bbox, num_image))
|
||||
return images_to_bboxes
|
||||
|
||||
|
||||
def main(unused_argv):
|
||||
assert not FLAGS.train_shards % FLAGS.num_threads, (
|
||||
'Please make the FLAGS.num_threads commensurate with FLAGS.train_shards')
|
||||
assert not FLAGS.validation_shards % FLAGS.num_threads, (
|
||||
'Please make the FLAGS.num_threads commensurate with '
|
||||
'FLAGS.validation_shards')
|
||||
print('Saving results to %s' % FLAGS.output_directory)
|
||||
|
||||
# Build a map from synset to human-readable label.
|
||||
synset_to_human = _build_synset_lookup(FLAGS.imagenet_metadata_file)
|
||||
image_to_bboxes = _build_bounding_box_lookup(FLAGS.bounding_box_file)
|
||||
|
||||
# Run it!
|
||||
_process_dataset('validation', FLAGS.validation_directory,
|
||||
FLAGS.validation_shards, synset_to_human, image_to_bboxes)
|
||||
_process_dataset('train', FLAGS.train_directory, FLAGS.train_shards,
|
||||
synset_to_human, image_to_bboxes)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.app.run()
|
||||
+100
@@ -0,0 +1,100 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Provides data for the Cifar10 dataset.
|
||||
|
||||
The dataset scripts used to create the dataset can be found at:
|
||||
tensorflow/models/research/slim/datasets/download_and_convert_cifar10.py
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
_FILE_PATTERN = 'cifar10_%s.tfrecord'
|
||||
|
||||
SPLITS_TO_SIZES = {'train': 50000, 'test': 10000}
|
||||
|
||||
_NUM_CLASSES = 10
|
||||
|
||||
_ITEMS_TO_DESCRIPTIONS = {
|
||||
'image': 'A [32 x 32 x 3] color image.',
|
||||
'label': 'A single integer between 0 and 9',
|
||||
}
|
||||
|
||||
|
||||
def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Gets a dataset tuple with instructions for reading cifar10.
|
||||
|
||||
Args:
|
||||
split_name: A train/test split name.
|
||||
dataset_dir: The base directory of the dataset sources.
|
||||
file_pattern: The file pattern to use when matching the dataset sources.
|
||||
It is assumed that the pattern contains a '%s' string so that the split
|
||||
name can be inserted.
|
||||
reader: The TensorFlow reader type.
|
||||
|
||||
Returns:
|
||||
A `Dataset` namedtuple.
|
||||
|
||||
Raises:
|
||||
ValueError: if `split_name` is not a valid train/test split.
|
||||
"""
|
||||
if split_name not in SPLITS_TO_SIZES:
|
||||
raise ValueError('split name %s was not recognized.' % split_name)
|
||||
|
||||
if not file_pattern:
|
||||
file_pattern = _FILE_PATTERN
|
||||
file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
|
||||
|
||||
# Allowing None in the signature so that dataset_factory can use the default.
|
||||
if not reader:
|
||||
reader = tf.TFRecordReader
|
||||
|
||||
keys_to_features = {
|
||||
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
|
||||
'image/format': tf.FixedLenFeature((), tf.string, default_value='png'),
|
||||
'image/class/label': tf.FixedLenFeature(
|
||||
[], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),
|
||||
}
|
||||
|
||||
items_to_handlers = {
|
||||
'image': slim.tfexample_decoder.Image(shape=[32, 32, 3]),
|
||||
'label': slim.tfexample_decoder.Tensor('image/class/label'),
|
||||
}
|
||||
|
||||
decoder = slim.tfexample_decoder.TFExampleDecoder(
|
||||
keys_to_features, items_to_handlers)
|
||||
|
||||
labels_to_names = None
|
||||
if dataset_utils.has_labels(dataset_dir):
|
||||
labels_to_names = dataset_utils.read_label_file(dataset_dir)
|
||||
|
||||
return slim.dataset.Dataset(
|
||||
data_sources=file_pattern,
|
||||
reader=reader,
|
||||
decoder=decoder,
|
||||
num_samples=SPLITS_TO_SIZES[split_name],
|
||||
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
|
||||
num_classes=_NUM_CLASSES,
|
||||
labels_to_names=labels_to_names,
|
||||
)
|
||||
+59
@@ -0,0 +1,59 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""A factory-pattern class which returns classification image/label pairs."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from datasets import cifar10
|
||||
from datasets import flowers
|
||||
from datasets import imagenet
|
||||
from datasets import mnist
|
||||
from datasets import visualwakewords
|
||||
|
||||
datasets_map = {
|
||||
'cifar10': cifar10,
|
||||
'flowers': flowers,
|
||||
'imagenet': imagenet,
|
||||
'mnist': mnist,
|
||||
'visualwakewords': visualwakewords,
|
||||
}
|
||||
|
||||
|
||||
def get_dataset(name, split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Given a dataset name and a split_name returns a Dataset.
|
||||
|
||||
Args:
|
||||
name: String, the name of the dataset.
|
||||
split_name: A train/test split name.
|
||||
dataset_dir: The directory where the dataset files are stored.
|
||||
file_pattern: The file pattern to use for matching the dataset source files.
|
||||
reader: The subclass of tf.ReaderBase. If left as `None`, then the default
|
||||
reader defined by each dataset is used.
|
||||
|
||||
Returns:
|
||||
A `Dataset` class.
|
||||
|
||||
Raises:
|
||||
ValueError: If the dataset `name` is unknown.
|
||||
"""
|
||||
if name not in datasets_map:
|
||||
raise ValueError('Name of dataset unknown %s' % name)
|
||||
return datasets_map[name].get_split(
|
||||
split_name,
|
||||
dataset_dir,
|
||||
file_pattern,
|
||||
reader)
|
||||
+240
@@ -0,0 +1,240 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains utilities for downloading and converting datasets."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
import tarfile
|
||||
import zipfile
|
||||
|
||||
from six.moves import urllib
|
||||
import tensorflow as tf
|
||||
|
||||
LABELS_FILENAME = 'labels.txt'
|
||||
|
||||
|
||||
def int64_feature(values):
|
||||
"""Returns a TF-Feature of int64s.
|
||||
|
||||
Args:
|
||||
values: A scalar or list of values.
|
||||
|
||||
Returns:
|
||||
A TF-Feature.
|
||||
"""
|
||||
if not isinstance(values, (tuple, list)):
|
||||
values = [values]
|
||||
return tf.train.Feature(int64_list=tf.train.Int64List(value=values))
|
||||
|
||||
|
||||
def bytes_list_feature(values):
|
||||
"""Returns a TF-Feature of list of bytes.
|
||||
|
||||
Args:
|
||||
values: A string or list of strings.
|
||||
|
||||
Returns:
|
||||
A TF-Feature.
|
||||
"""
|
||||
return tf.train.Feature(bytes_list=tf.train.BytesList(value=values))
|
||||
|
||||
|
||||
def float_list_feature(values):
|
||||
"""Returns a TF-Feature of list of floats.
|
||||
|
||||
Args:
|
||||
values: A float or list of floats.
|
||||
|
||||
Returns:
|
||||
A TF-Feature.
|
||||
"""
|
||||
return tf.train.Feature(float_list=tf.train.FloatList(value=values))
|
||||
|
||||
|
||||
def bytes_feature(values):
|
||||
"""Returns a TF-Feature of bytes.
|
||||
|
||||
Args:
|
||||
values: A string.
|
||||
|
||||
Returns:
|
||||
A TF-Feature.
|
||||
"""
|
||||
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))
|
||||
|
||||
|
||||
def float_feature(values):
|
||||
"""Returns a TF-Feature of floats.
|
||||
|
||||
Args:
|
||||
values: A scalar of list of values.
|
||||
|
||||
Returns:
|
||||
A TF-Feature.
|
||||
"""
|
||||
if not isinstance(values, (tuple, list)):
|
||||
values = [values]
|
||||
return tf.train.Feature(float_list=tf.train.FloatList(value=values))
|
||||
|
||||
|
||||
def image_to_tfexample(image_data, image_format, height, width, class_id):
|
||||
return tf.train.Example(features=tf.train.Features(feature={
|
||||
'image/encoded': bytes_feature(image_data),
|
||||
'image/format': bytes_feature(image_format),
|
||||
'image/class/label': int64_feature(class_id),
|
||||
'image/height': int64_feature(height),
|
||||
'image/width': int64_feature(width),
|
||||
}))
|
||||
|
||||
|
||||
def download_url(url, dataset_dir):
|
||||
"""Downloads the tarball or zip file from url into filepath.
|
||||
|
||||
Args:
|
||||
url: The URL of a tarball or zip file.
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
|
||||
Returns:
|
||||
filepath: path where the file is downloaded.
|
||||
"""
|
||||
filename = url.split('/')[-1]
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
|
||||
def _progress(count, block_size, total_size):
|
||||
sys.stdout.write('\r>> Downloading %s %.1f%%' % (
|
||||
filename, float(count * block_size) / float(total_size) * 100.0))
|
||||
sys.stdout.flush()
|
||||
|
||||
filepath, _ = urllib.request.urlretrieve(url, filepath, _progress)
|
||||
print()
|
||||
statinfo = os.stat(filepath)
|
||||
print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
|
||||
return filepath
|
||||
|
||||
|
||||
def download_and_uncompress_tarball(tarball_url, dataset_dir):
|
||||
"""Downloads the `tarball_url` and uncompresses it locally.
|
||||
|
||||
Args:
|
||||
tarball_url: The URL of a tarball file.
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
filepath = download_url(tarball_url, dataset_dir)
|
||||
tarfile.open(filepath, 'r:gz').extractall(dataset_dir)
|
||||
|
||||
|
||||
def download_and_uncompress_zipfile(zip_url, dataset_dir):
|
||||
"""Downloads the `zip_url` and uncompresses it locally.
|
||||
|
||||
Args:
|
||||
zip_url: The URL of a zip file.
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
filename = zip_url.split('/')[-1]
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
|
||||
if tf.gfile.Exists(filepath):
|
||||
print('File {filename} has been already downloaded at {filepath}. '
|
||||
'Unzipping it....'.format(filename=filename, filepath=filepath))
|
||||
else:
|
||||
filepath = download_url(zip_url, dataset_dir)
|
||||
|
||||
with zipfile.ZipFile(filepath, 'r') as zip_file:
|
||||
for member in zip_file.namelist():
|
||||
memberpath = os.path.join(dataset_dir, member)
|
||||
# extract only if file doesn't exist
|
||||
if not (os.path.exists(memberpath) or os.path.isfile(memberpath)):
|
||||
zip_file.extract(member, dataset_dir)
|
||||
|
||||
|
||||
def write_label_file(labels_to_class_names,
|
||||
dataset_dir,
|
||||
filename=LABELS_FILENAME):
|
||||
"""Writes a file with the list of class names.
|
||||
|
||||
Args:
|
||||
labels_to_class_names: A map of (integer) labels to class names.
|
||||
dataset_dir: The directory in which the labels file should be written.
|
||||
filename: The filename where the class names are written.
|
||||
"""
|
||||
labels_filename = os.path.join(dataset_dir, filename)
|
||||
with tf.gfile.Open(labels_filename, 'w') as f:
|
||||
for label in labels_to_class_names:
|
||||
class_name = labels_to_class_names[label]
|
||||
f.write('%d:%s\n' % (label, class_name))
|
||||
|
||||
|
||||
def has_labels(dataset_dir, filename=LABELS_FILENAME):
|
||||
"""Specifies whether or not the dataset directory contains a label map file.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory in which the labels file is found.
|
||||
filename: The filename where the class names are written.
|
||||
|
||||
Returns:
|
||||
`True` if the labels file exists and `False` otherwise.
|
||||
"""
|
||||
return tf.gfile.Exists(os.path.join(dataset_dir, filename))
|
||||
|
||||
|
||||
def read_label_file(dataset_dir, filename=LABELS_FILENAME):
|
||||
"""Reads the labels file and returns a mapping from ID to class name.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory in which the labels file is found.
|
||||
filename: The filename where the class names are written.
|
||||
|
||||
Returns:
|
||||
A map from a label (integer) to class name.
|
||||
"""
|
||||
labels_filename = os.path.join(dataset_dir, filename)
|
||||
with tf.gfile.Open(labels_filename, 'rb') as f:
|
||||
lines = f.read().decode()
|
||||
lines = lines.split('\n')
|
||||
lines = filter(None, lines)
|
||||
|
||||
labels_to_class_names = {}
|
||||
for line in lines:
|
||||
index = line.index(':')
|
||||
labels_to_class_names[int(line[:index])] = line[index+1:]
|
||||
return labels_to_class_names
|
||||
|
||||
|
||||
def open_sharded_output_tfrecords(exit_stack, base_path, num_shards):
|
||||
"""Opens all TFRecord shards for writing and adds them to an exit stack.
|
||||
|
||||
Args:
|
||||
exit_stack: A context2.ExitStack used to automatically closed the TFRecords
|
||||
opened in this function.
|
||||
base_path: The base path for all shards
|
||||
num_shards: The number of shards
|
||||
|
||||
Returns:
|
||||
The list of opened TFRecords. Position k in the list corresponds to shard k.
|
||||
"""
|
||||
tf_record_output_filenames = [
|
||||
'{}-{:05d}-of-{:05d}'.format(base_path, idx, num_shards)
|
||||
for idx in range(num_shards)
|
||||
]
|
||||
|
||||
tfrecords = [
|
||||
exit_stack.enter_context(tf.python_io.TFRecordWriter(file_name))
|
||||
for file_name in tf_record_output_filenames
|
||||
]
|
||||
|
||||
return tfrecords
|
||||
+198
@@ -0,0 +1,198 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Downloads and converts cifar10 data to TFRecords of TF-Example protos.
|
||||
|
||||
This module downloads the cifar10 data, uncompresses it, reads the files
|
||||
that make up the cifar10 data and creates two TFRecord datasets: one for train
|
||||
and one for test. Each TFRecord dataset is comprised of a set of TF-Example
|
||||
protocol buffers, each of which contain a single image and label.
|
||||
|
||||
The script should take several minutes to run.
|
||||
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
import tarfile
|
||||
|
||||
import numpy as np
|
||||
from six.moves import cPickle
|
||||
from six.moves import urllib
|
||||
import tensorflow as tf
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
# The URL where the CIFAR data can be downloaded.
|
||||
_DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'
|
||||
|
||||
# The number of training files.
|
||||
_NUM_TRAIN_FILES = 5
|
||||
|
||||
# The height and width of each image.
|
||||
_IMAGE_SIZE = 32
|
||||
|
||||
# The names of the classes.
|
||||
_CLASS_NAMES = [
|
||||
'airplane',
|
||||
'automobile',
|
||||
'bird',
|
||||
'cat',
|
||||
'deer',
|
||||
'dog',
|
||||
'frog',
|
||||
'horse',
|
||||
'ship',
|
||||
'truck',
|
||||
]
|
||||
|
||||
|
||||
def _add_to_tfrecord(filename, tfrecord_writer, offset=0):
|
||||
"""Loads data from the cifar10 pickle files and writes files to a TFRecord.
|
||||
|
||||
Args:
|
||||
filename: The filename of the cifar10 pickle file.
|
||||
tfrecord_writer: The TFRecord writer to use for writing.
|
||||
offset: An offset into the absolute number of images previously written.
|
||||
|
||||
Returns:
|
||||
The new offset.
|
||||
"""
|
||||
with tf.gfile.Open(filename, 'rb') as f:
|
||||
if sys.version_info < (3,):
|
||||
data = cPickle.load(f)
|
||||
else:
|
||||
data = cPickle.load(f, encoding='bytes')
|
||||
|
||||
images = data[b'data']
|
||||
num_images = images.shape[0]
|
||||
|
||||
images = images.reshape((num_images, 3, 32, 32))
|
||||
labels = data[b'labels']
|
||||
|
||||
with tf.Graph().as_default():
|
||||
image_placeholder = tf.placeholder(dtype=tf.uint8)
|
||||
encoded_image = tf.image.encode_png(image_placeholder)
|
||||
|
||||
with tf.Session('') as sess:
|
||||
|
||||
for j in range(num_images):
|
||||
sys.stdout.write('\r>> Reading file [%s] image %d/%d' % (
|
||||
filename, offset + j + 1, offset + num_images))
|
||||
sys.stdout.flush()
|
||||
|
||||
image = np.squeeze(images[j]).transpose((1, 2, 0))
|
||||
label = labels[j]
|
||||
|
||||
png_string = sess.run(encoded_image,
|
||||
feed_dict={image_placeholder: image})
|
||||
|
||||
example = dataset_utils.image_to_tfexample(
|
||||
png_string, b'png', _IMAGE_SIZE, _IMAGE_SIZE, label)
|
||||
tfrecord_writer.write(example.SerializeToString())
|
||||
|
||||
return offset + num_images
|
||||
|
||||
|
||||
def _get_output_filename(dataset_dir, split_name):
|
||||
"""Creates the output filename.
|
||||
|
||||
Args:
|
||||
dataset_dir: The dataset directory where the dataset is stored.
|
||||
split_name: The name of the train/test split.
|
||||
|
||||
Returns:
|
||||
An absolute file path.
|
||||
"""
|
||||
return '%s/cifar10_%s.tfrecord' % (dataset_dir, split_name)
|
||||
|
||||
|
||||
def _download_and_uncompress_dataset(dataset_dir):
|
||||
"""Downloads cifar10 and uncompresses it locally.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
filename = _DATA_URL.split('/')[-1]
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
|
||||
if not os.path.exists(filepath):
|
||||
def _progress(count, block_size, total_size):
|
||||
sys.stdout.write('\r>> Downloading %s %.1f%%' % (
|
||||
filename, float(count * block_size) / float(total_size) * 100.0))
|
||||
sys.stdout.flush()
|
||||
filepath, _ = urllib.request.urlretrieve(_DATA_URL, filepath, _progress)
|
||||
print()
|
||||
statinfo = os.stat(filepath)
|
||||
print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
|
||||
tarfile.open(filepath, 'r:gz').extractall(dataset_dir)
|
||||
|
||||
|
||||
def _clean_up_temporary_files(dataset_dir):
|
||||
"""Removes temporary files used to create the dataset.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
filename = _DATA_URL.split('/')[-1]
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
tf.gfile.Remove(filepath)
|
||||
|
||||
tmp_dir = os.path.join(dataset_dir, 'cifar-10-batches-py')
|
||||
tf.gfile.DeleteRecursively(tmp_dir)
|
||||
|
||||
|
||||
def run(dataset_dir):
|
||||
"""Runs the download and conversion operation.
|
||||
|
||||
Args:
|
||||
dataset_dir: The dataset directory where the dataset is stored.
|
||||
"""
|
||||
if not tf.gfile.Exists(dataset_dir):
|
||||
tf.gfile.MakeDirs(dataset_dir)
|
||||
|
||||
training_filename = _get_output_filename(dataset_dir, 'train')
|
||||
testing_filename = _get_output_filename(dataset_dir, 'test')
|
||||
|
||||
if tf.gfile.Exists(training_filename) and tf.gfile.Exists(testing_filename):
|
||||
print('Dataset files already exist. Exiting without re-creating them.')
|
||||
return
|
||||
|
||||
dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir)
|
||||
|
||||
# First, process the training data:
|
||||
with tf.python_io.TFRecordWriter(training_filename) as tfrecord_writer:
|
||||
offset = 0
|
||||
for i in range(_NUM_TRAIN_FILES):
|
||||
filename = os.path.join(dataset_dir,
|
||||
'cifar-10-batches-py',
|
||||
'data_batch_%d' % (i + 1)) # 1-indexed.
|
||||
offset = _add_to_tfrecord(filename, tfrecord_writer, offset)
|
||||
|
||||
# Next, process the testing data:
|
||||
with tf.python_io.TFRecordWriter(testing_filename) as tfrecord_writer:
|
||||
filename = os.path.join(dataset_dir,
|
||||
'cifar-10-batches-py',
|
||||
'test_batch')
|
||||
_add_to_tfrecord(filename, tfrecord_writer)
|
||||
|
||||
# Finally, write the labels file:
|
||||
labels_to_class_names = dict(zip(range(len(_CLASS_NAMES)), _CLASS_NAMES))
|
||||
dataset_utils.write_label_file(labels_to_class_names, dataset_dir)
|
||||
|
||||
_clean_up_temporary_files(dataset_dir)
|
||||
print('\nFinished converting the Cifar10 dataset!')
|
||||
+211
@@ -0,0 +1,211 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Downloads and converts Flowers data to TFRecords of TF-Example protos.
|
||||
|
||||
This module downloads the Flowers data, uncompresses it, reads the files
|
||||
that make up the Flowers data and creates two TFRecord datasets: one for train
|
||||
and one for test. Each TFRecord dataset is comprised of a set of TF-Example
|
||||
protocol buffers, each of which contain a single image and label.
|
||||
|
||||
The script should take about a minute to run.
|
||||
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
# The URL where the Flowers data can be downloaded.
|
||||
_DATA_URL = 'http://download.tensorflow.org/example_images/flower_photos.tgz'
|
||||
|
||||
# The number of images in the validation set.
|
||||
_NUM_VALIDATION = 350
|
||||
|
||||
# Seed for repeatability.
|
||||
_RANDOM_SEED = 0
|
||||
|
||||
# The number of shards per dataset split.
|
||||
_NUM_SHARDS = 5
|
||||
|
||||
|
||||
class ImageReader(object):
|
||||
"""Helper class that provides TensorFlow image coding utilities."""
|
||||
|
||||
def __init__(self):
|
||||
# Initializes function that decodes RGB JPEG data.
|
||||
self._decode_jpeg_data = tf.placeholder(dtype=tf.string)
|
||||
self._decode_jpeg = tf.image.decode_jpeg(self._decode_jpeg_data, channels=3)
|
||||
|
||||
def read_image_dims(self, sess, image_data):
|
||||
image = self.decode_jpeg(sess, image_data)
|
||||
return image.shape[0], image.shape[1]
|
||||
|
||||
def decode_jpeg(self, sess, image_data):
|
||||
image = sess.run(self._decode_jpeg,
|
||||
feed_dict={self._decode_jpeg_data: image_data})
|
||||
assert len(image.shape) == 3
|
||||
assert image.shape[2] == 3
|
||||
return image
|
||||
|
||||
|
||||
def _get_filenames_and_classes(dataset_dir):
|
||||
"""Returns a list of filenames and inferred class names.
|
||||
|
||||
Args:
|
||||
dataset_dir: A directory containing a set of subdirectories representing
|
||||
class names. Each subdirectory should contain PNG or JPG encoded images.
|
||||
|
||||
Returns:
|
||||
A list of image file paths, relative to `dataset_dir` and the list of
|
||||
subdirectories, representing class names.
|
||||
"""
|
||||
flower_root = os.path.join(dataset_dir, 'flower_photos')
|
||||
directories = []
|
||||
class_names = []
|
||||
for filename in os.listdir(flower_root):
|
||||
path = os.path.join(flower_root, filename)
|
||||
if os.path.isdir(path):
|
||||
directories.append(path)
|
||||
class_names.append(filename)
|
||||
|
||||
photo_filenames = []
|
||||
for directory in directories:
|
||||
for filename in os.listdir(directory):
|
||||
path = os.path.join(directory, filename)
|
||||
photo_filenames.append(path)
|
||||
|
||||
return photo_filenames, sorted(class_names)
|
||||
|
||||
|
||||
def _get_dataset_filename(dataset_dir, split_name, shard_id):
|
||||
output_filename = 'flowers_%s_%05d-of-%05d.tfrecord' % (
|
||||
split_name, shard_id, _NUM_SHARDS)
|
||||
return os.path.join(dataset_dir, output_filename)
|
||||
|
||||
|
||||
def _convert_dataset(split_name, filenames, class_names_to_ids, dataset_dir):
|
||||
"""Converts the given filenames to a TFRecord dataset.
|
||||
|
||||
Args:
|
||||
split_name: The name of the dataset, either 'train' or 'validation'.
|
||||
filenames: A list of absolute paths to png or jpg images.
|
||||
class_names_to_ids: A dictionary from class names (strings) to ids
|
||||
(integers).
|
||||
dataset_dir: The directory where the converted datasets are stored.
|
||||
"""
|
||||
assert split_name in ['train', 'validation']
|
||||
|
||||
num_per_shard = int(math.ceil(len(filenames) / float(_NUM_SHARDS)))
|
||||
|
||||
with tf.Graph().as_default():
|
||||
image_reader = ImageReader()
|
||||
|
||||
with tf.Session('') as sess:
|
||||
|
||||
for shard_id in range(_NUM_SHARDS):
|
||||
output_filename = _get_dataset_filename(
|
||||
dataset_dir, split_name, shard_id)
|
||||
|
||||
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
|
||||
start_ndx = shard_id * num_per_shard
|
||||
end_ndx = min((shard_id+1) * num_per_shard, len(filenames))
|
||||
for i in range(start_ndx, end_ndx):
|
||||
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
|
||||
i+1, len(filenames), shard_id))
|
||||
sys.stdout.flush()
|
||||
|
||||
# Read the filename:
|
||||
image_data = tf.gfile.GFile(filenames[i], 'rb').read()
|
||||
height, width = image_reader.read_image_dims(sess, image_data)
|
||||
|
||||
class_name = os.path.basename(os.path.dirname(filenames[i]))
|
||||
class_id = class_names_to_ids[class_name]
|
||||
|
||||
example = dataset_utils.image_to_tfexample(
|
||||
image_data, b'jpg', height, width, class_id)
|
||||
tfrecord_writer.write(example.SerializeToString())
|
||||
|
||||
sys.stdout.write('\n')
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def _clean_up_temporary_files(dataset_dir):
|
||||
"""Removes temporary files used to create the dataset.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
filename = _DATA_URL.split('/')[-1]
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
tf.gfile.Remove(filepath)
|
||||
|
||||
tmp_dir = os.path.join(dataset_dir, 'flower_photos')
|
||||
tf.gfile.DeleteRecursively(tmp_dir)
|
||||
|
||||
|
||||
def _dataset_exists(dataset_dir):
|
||||
for split_name in ['train', 'validation']:
|
||||
for shard_id in range(_NUM_SHARDS):
|
||||
output_filename = _get_dataset_filename(
|
||||
dataset_dir, split_name, shard_id)
|
||||
if not tf.gfile.Exists(output_filename):
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def run(dataset_dir):
|
||||
"""Runs the download and conversion operation.
|
||||
|
||||
Args:
|
||||
dataset_dir: The dataset directory where the dataset is stored.
|
||||
"""
|
||||
if not tf.gfile.Exists(dataset_dir):
|
||||
tf.gfile.MakeDirs(dataset_dir)
|
||||
|
||||
if _dataset_exists(dataset_dir):
|
||||
print('Dataset files already exist. Exiting without re-creating them.')
|
||||
return
|
||||
|
||||
dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir)
|
||||
photo_filenames, class_names = _get_filenames_and_classes(dataset_dir)
|
||||
class_names_to_ids = dict(zip(class_names, range(len(class_names))))
|
||||
|
||||
# Divide into train and test:
|
||||
random.seed(_RANDOM_SEED)
|
||||
random.shuffle(photo_filenames)
|
||||
training_filenames = photo_filenames[_NUM_VALIDATION:]
|
||||
validation_filenames = photo_filenames[:_NUM_VALIDATION]
|
||||
|
||||
# First, convert the training and validation sets.
|
||||
_convert_dataset('train', training_filenames, class_names_to_ids,
|
||||
dataset_dir)
|
||||
_convert_dataset('validation', validation_filenames, class_names_to_ids,
|
||||
dataset_dir)
|
||||
|
||||
# Finally, write the labels file:
|
||||
labels_to_class_names = dict(zip(range(len(class_names)), class_names))
|
||||
dataset_utils.write_label_file(labels_to_class_names, dataset_dir)
|
||||
|
||||
_clean_up_temporary_files(dataset_dir)
|
||||
print('\nFinished converting the Flowers dataset!')
|
||||
+103
@@ -0,0 +1,103 @@
|
||||
#!/bin/bash
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
|
||||
# Script to download and preprocess ImageNet Challenge 2012
|
||||
# training and validation data set.
|
||||
#
|
||||
# The final output of this script are sharded TFRecord files containing
|
||||
# serialized Example protocol buffers. See build_imagenet_data.py for
|
||||
# details of how the Example protocol buffers contain the ImageNet data.
|
||||
#
|
||||
# The final output of this script appears as such:
|
||||
#
|
||||
# data_dir/train-00000-of-01024
|
||||
# data_dir/train-00001-of-01024
|
||||
# ...
|
||||
# data_dir/train-00127-of-01024
|
||||
#
|
||||
# and
|
||||
#
|
||||
# data_dir/validation-00000-of-00128
|
||||
# data_dir/validation-00001-of-00128
|
||||
# ...
|
||||
# data_dir/validation-00127-of-00128
|
||||
#
|
||||
# Note that this script may take several hours to run to completion. The
|
||||
# conversion of the ImageNet data to TFRecords alone takes 2-3 hours depending
|
||||
# on the speed of your machine. Please be patient.
|
||||
#
|
||||
# **IMPORTANT**
|
||||
# To download the raw images, the user must create an account with image-net.org
|
||||
# and generate a username and access_key. The latter two are required for
|
||||
# downloading the raw images.
|
||||
#
|
||||
# usage:
|
||||
# cd research/slim
|
||||
# bazel build :download_and_convert_imagenet
|
||||
# ./bazel-bin/download_and_convert_imagenet.sh [data-dir]
|
||||
set -e
|
||||
|
||||
if [ -z "$1" ]; then
|
||||
echo "usage download_and_convert_imagenet.sh [data dir]"
|
||||
exit
|
||||
fi
|
||||
|
||||
# Create the output and temporary directories.
|
||||
DATA_DIR="${1%/}"
|
||||
SCRATCH_DIR="${DATA_DIR}/raw-data/"
|
||||
mkdir -p "${DATA_DIR}"
|
||||
mkdir -p "${SCRATCH_DIR}"
|
||||
WORK_DIR="$0.runfiles/__main__"
|
||||
|
||||
# Download the ImageNet data.
|
||||
LABELS_FILE="${WORK_DIR}/datasets/imagenet_lsvrc_2015_synsets.txt"
|
||||
DOWNLOAD_SCRIPT="${WORK_DIR}/datasets/download_imagenet.sh"
|
||||
"${DOWNLOAD_SCRIPT}" "${SCRATCH_DIR}" "${LABELS_FILE}"
|
||||
|
||||
# Note the locations of the train and validation data.
|
||||
TRAIN_DIRECTORY="${SCRATCH_DIR}train/"
|
||||
VALIDATION_DIRECTORY="${SCRATCH_DIR}validation/"
|
||||
|
||||
# Preprocess the validation data by moving the images into the appropriate
|
||||
# sub-directory based on the label (synset) of the image.
|
||||
echo "Organizing the validation data into sub-directories."
|
||||
PREPROCESS_VAL_SCRIPT="${WORK_DIR}/datasets/preprocess_imagenet_validation_data.py"
|
||||
VAL_LABELS_FILE="${WORK_DIR}/datasets/imagenet_2012_validation_synset_labels.txt"
|
||||
|
||||
"${PREPROCESS_VAL_SCRIPT}" "${VALIDATION_DIRECTORY}" "${VAL_LABELS_FILE}"
|
||||
|
||||
# Convert the XML files for bounding box annotations into a single CSV.
|
||||
echo "Extracting bounding box information from XML."
|
||||
BOUNDING_BOX_SCRIPT="${WORK_DIR}/datasets/process_bounding_boxes.py"
|
||||
BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv"
|
||||
BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/"
|
||||
|
||||
"${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \
|
||||
| sort >"${BOUNDING_BOX_FILE}"
|
||||
echo "Finished downloading and preprocessing the ImageNet data."
|
||||
|
||||
# Build the TFRecords version of the ImageNet data.
|
||||
BUILD_SCRIPT="${WORK_DIR}/build_imagenet_data"
|
||||
OUTPUT_DIRECTORY="${DATA_DIR}"
|
||||
IMAGENET_METADATA_FILE="${WORK_DIR}/datasets/imagenet_metadata.txt"
|
||||
|
||||
"${BUILD_SCRIPT}" \
|
||||
--train_directory="${TRAIN_DIRECTORY}" \
|
||||
--validation_directory="${VALIDATION_DIRECTORY}" \
|
||||
--output_directory="${OUTPUT_DIRECTORY}" \
|
||||
--imagenet_metadata_file="${IMAGENET_METADATA_FILE}" \
|
||||
--labels_file="${LABELS_FILE}" \
|
||||
--bounding_box_file="${BOUNDING_BOX_FILE}"
|
||||
+221
@@ -0,0 +1,221 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Downloads and converts MNIST data to TFRecords of TF-Example protos.
|
||||
|
||||
This module downloads the MNIST data, uncompresses it, reads the files
|
||||
that make up the MNIST data and creates two TFRecord datasets: one for train
|
||||
and one for test. Each TFRecord dataset is comprised of a set of TF-Example
|
||||
protocol buffers, each of which contain a single image and label.
|
||||
|
||||
The script should take about a minute to run.
|
||||
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import gzip
|
||||
import os
|
||||
import sys
|
||||
|
||||
import numpy as np
|
||||
from six.moves import urllib
|
||||
import tensorflow as tf
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
# The URLs where the MNIST data can be downloaded.
|
||||
_DATA_URL = 'http://yann.lecun.com/exdb/mnist/'
|
||||
_TRAIN_DATA_FILENAME = 'train-images-idx3-ubyte.gz'
|
||||
_TRAIN_LABELS_FILENAME = 'train-labels-idx1-ubyte.gz'
|
||||
_TEST_DATA_FILENAME = 't10k-images-idx3-ubyte.gz'
|
||||
_TEST_LABELS_FILENAME = 't10k-labels-idx1-ubyte.gz'
|
||||
|
||||
_IMAGE_SIZE = 28
|
||||
_NUM_CHANNELS = 1
|
||||
|
||||
# The names of the classes.
|
||||
_CLASS_NAMES = [
|
||||
'zero',
|
||||
'one',
|
||||
'two',
|
||||
'three',
|
||||
'four',
|
||||
'five',
|
||||
'size',
|
||||
'seven',
|
||||
'eight',
|
||||
'nine',
|
||||
]
|
||||
|
||||
|
||||
def _extract_images(filename, num_images):
|
||||
"""Extract the images into a numpy array.
|
||||
|
||||
Args:
|
||||
filename: The path to an MNIST images file.
|
||||
num_images: The number of images in the file.
|
||||
|
||||
Returns:
|
||||
A numpy array of shape [number_of_images, height, width, channels].
|
||||
"""
|
||||
print('Extracting images from: ', filename)
|
||||
with gzip.open(filename) as bytestream:
|
||||
bytestream.read(16)
|
||||
buf = bytestream.read(
|
||||
_IMAGE_SIZE * _IMAGE_SIZE * num_images * _NUM_CHANNELS)
|
||||
data = np.frombuffer(buf, dtype=np.uint8)
|
||||
data = data.reshape(num_images, _IMAGE_SIZE, _IMAGE_SIZE, _NUM_CHANNELS)
|
||||
return data
|
||||
|
||||
|
||||
def _extract_labels(filename, num_labels):
|
||||
"""Extract the labels into a vector of int64 label IDs.
|
||||
|
||||
Args:
|
||||
filename: The path to an MNIST labels file.
|
||||
num_labels: The number of labels in the file.
|
||||
|
||||
Returns:
|
||||
A numpy array of shape [number_of_labels]
|
||||
"""
|
||||
print('Extracting labels from: ', filename)
|
||||
with gzip.open(filename) as bytestream:
|
||||
bytestream.read(8)
|
||||
buf = bytestream.read(1 * num_labels)
|
||||
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
|
||||
return labels
|
||||
|
||||
|
||||
def _add_to_tfrecord(data_filename, labels_filename, num_images,
|
||||
tfrecord_writer):
|
||||
"""Loads data from the binary MNIST files and writes files to a TFRecord.
|
||||
|
||||
Args:
|
||||
data_filename: The filename of the MNIST images.
|
||||
labels_filename: The filename of the MNIST labels.
|
||||
num_images: The number of images in the dataset.
|
||||
tfrecord_writer: The TFRecord writer to use for writing.
|
||||
"""
|
||||
images = _extract_images(data_filename, num_images)
|
||||
labels = _extract_labels(labels_filename, num_images)
|
||||
|
||||
shape = (_IMAGE_SIZE, _IMAGE_SIZE, _NUM_CHANNELS)
|
||||
with tf.Graph().as_default():
|
||||
image = tf.placeholder(dtype=tf.uint8, shape=shape)
|
||||
encoded_png = tf.image.encode_png(image)
|
||||
|
||||
with tf.Session('') as sess:
|
||||
for j in range(num_images):
|
||||
sys.stdout.write('\r>> Converting image %d/%d' % (j + 1, num_images))
|
||||
sys.stdout.flush()
|
||||
|
||||
png_string = sess.run(encoded_png, feed_dict={image: images[j]})
|
||||
|
||||
example = dataset_utils.image_to_tfexample(
|
||||
png_string, 'png'.encode(), _IMAGE_SIZE, _IMAGE_SIZE, labels[j])
|
||||
tfrecord_writer.write(example.SerializeToString())
|
||||
|
||||
|
||||
def _get_output_filename(dataset_dir, split_name):
|
||||
"""Creates the output filename.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
split_name: The name of the train/test split.
|
||||
|
||||
Returns:
|
||||
An absolute file path.
|
||||
"""
|
||||
return '%s/mnist_%s.tfrecord' % (dataset_dir, split_name)
|
||||
|
||||
|
||||
def _download_dataset(dataset_dir):
|
||||
"""Downloads MNIST locally.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
for filename in [_TRAIN_DATA_FILENAME,
|
||||
_TRAIN_LABELS_FILENAME,
|
||||
_TEST_DATA_FILENAME,
|
||||
_TEST_LABELS_FILENAME]:
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
|
||||
if not os.path.exists(filepath):
|
||||
print('Downloading file %s...' % filename)
|
||||
def _progress(count, block_size, total_size):
|
||||
sys.stdout.write('\r>> Downloading %.1f%%' % (
|
||||
float(count * block_size) / float(total_size) * 100.0))
|
||||
sys.stdout.flush()
|
||||
filepath, _ = urllib.request.urlretrieve(_DATA_URL + filename,
|
||||
filepath,
|
||||
_progress)
|
||||
print()
|
||||
with tf.gfile.GFile(filepath) as f:
|
||||
size = f.size()
|
||||
print('Successfully downloaded', filename, size, 'bytes.')
|
||||
|
||||
|
||||
def _clean_up_temporary_files(dataset_dir):
|
||||
"""Removes temporary files used to create the dataset.
|
||||
|
||||
Args:
|
||||
dataset_dir: The directory where the temporary files are stored.
|
||||
"""
|
||||
for filename in [_TRAIN_DATA_FILENAME,
|
||||
_TRAIN_LABELS_FILENAME,
|
||||
_TEST_DATA_FILENAME,
|
||||
_TEST_LABELS_FILENAME]:
|
||||
filepath = os.path.join(dataset_dir, filename)
|
||||
tf.gfile.Remove(filepath)
|
||||
|
||||
|
||||
def run(dataset_dir):
|
||||
"""Runs the download and conversion operation.
|
||||
|
||||
Args:
|
||||
dataset_dir: The dataset directory where the dataset is stored.
|
||||
"""
|
||||
if not tf.gfile.Exists(dataset_dir):
|
||||
tf.gfile.MakeDirs(dataset_dir)
|
||||
|
||||
training_filename = _get_output_filename(dataset_dir, 'train')
|
||||
testing_filename = _get_output_filename(dataset_dir, 'test')
|
||||
|
||||
if tf.gfile.Exists(training_filename) and tf.gfile.Exists(testing_filename):
|
||||
print('Dataset files already exist. Exiting without re-creating them.')
|
||||
return
|
||||
|
||||
_download_dataset(dataset_dir)
|
||||
|
||||
# First, process the training data:
|
||||
with tf.python_io.TFRecordWriter(training_filename) as tfrecord_writer:
|
||||
data_filename = os.path.join(dataset_dir, _TRAIN_DATA_FILENAME)
|
||||
labels_filename = os.path.join(dataset_dir, _TRAIN_LABELS_FILENAME)
|
||||
_add_to_tfrecord(data_filename, labels_filename, 60000, tfrecord_writer)
|
||||
|
||||
# Next, process the testing data:
|
||||
with tf.python_io.TFRecordWriter(testing_filename) as tfrecord_writer:
|
||||
data_filename = os.path.join(dataset_dir, _TEST_DATA_FILENAME)
|
||||
labels_filename = os.path.join(dataset_dir, _TEST_LABELS_FILENAME)
|
||||
_add_to_tfrecord(data_filename, labels_filename, 10000, tfrecord_writer)
|
||||
|
||||
# Finally, write the labels file:
|
||||
labels_to_class_names = dict(zip(range(len(_CLASS_NAMES)), _CLASS_NAMES))
|
||||
dataset_utils.write_label_file(labels_to_class_names, dataset_dir)
|
||||
|
||||
_clean_up_temporary_files(dataset_dir)
|
||||
print('\nFinished converting the MNIST dataset!')
|
||||
+158
@@ -0,0 +1,158 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Downloads and converts VisualWakewords data to TFRecords of TF-Example protos.
|
||||
|
||||
This module downloads the COCO dataset, uncompresses it, derives the
|
||||
VisualWakeWords dataset to create two TFRecord datasets: one for
|
||||
train and one for test. Each TFRecord dataset is comprised of a set of
|
||||
TF-Example protocol buffers, each of which contain a single image and label.
|
||||
|
||||
The script should take several minutes to run.
|
||||
Please note that this tool creates sharded output files.
|
||||
|
||||
VisualWakeWords dataset is used to design tiny models classifying two classes,
|
||||
such as person/not-person. The two steps to generate the VisualWakeWords
|
||||
dataset from the COCO dataset are given below:
|
||||
|
||||
1. Use COCO annotations to create VisualWakeWords annotations:
|
||||
|
||||
Note: A bounding box is 'valid' if it has the foreground_class_of_interest
|
||||
(e.g. person) and it's area is greater than 0.5% of the image area.
|
||||
|
||||
The resulting annotations file has the following fields, where 'images' are
|
||||
the same as COCO dataset. 'categories' only contains information about the
|
||||
foreground_class_of_interest (e.g. person) and 'annotations' maps an image to
|
||||
objects (a list of valid bounding boxes) and label (value is 1 if it has
|
||||
atleast one valid bounding box, otherwise 0)
|
||||
|
||||
images[{
|
||||
"id", "width", "height", "file_name", "flickr_url", "coco_url",
|
||||
"license", "date_captured",
|
||||
}]
|
||||
|
||||
categories{
|
||||
"id": {"id", "name", "supercategory"}
|
||||
}
|
||||
|
||||
annotations{
|
||||
"image_id": {"objects":[{"area", "bbox" : [x,y,width,height]}], "label"}
|
||||
}
|
||||
|
||||
2. Use VisualWakeWords annotations to create TFRecords:
|
||||
|
||||
The resulting TFRecord file contains the following features:
|
||||
{ image/height, image/width, image/source_id, image/encoded,
|
||||
image/class/label_text, image/class/label,
|
||||
image/object/class/text,
|
||||
image/object/bbox/ymin, image/object/bbox/xmin, image/object/bbox/ymax,
|
||||
image/object/bbox/xmax, image/object/area
|
||||
image/filename, image/format, image/key/sha256}
|
||||
For classification models, you need the image/encoded and image/class/label.
|
||||
|
||||
Example usage:
|
||||
Run download_and_convert_data.py in the parent directory as follows:
|
||||
|
||||
python download_and_convert_visualwakewords.py --logtostderr \
|
||||
--dataset_name=visualwakewords \
|
||||
--dataset_dir="${DATASET_DIR}" \
|
||||
--small_object_area_threshold=0.005 \
|
||||
--foreground_class_of_interest='person'
|
||||
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from datasets import download_and_convert_visualwakewords_lib
|
||||
|
||||
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
|
||||
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'coco_dirname', 'coco_dataset',
|
||||
'A subdirectory in visualwakewords dataset directory'
|
||||
'containing the coco dataset')
|
||||
|
||||
FLAGS = tf.compat.v1.app.flags.FLAGS
|
||||
|
||||
|
||||
def run(dataset_dir, small_object_area_threshold, foreground_class_of_interest):
|
||||
"""Runs the download and conversion operation.
|
||||
|
||||
Args:
|
||||
dataset_dir: The dataset directory where the dataset is stored.
|
||||
small_object_area_threshold: Threshold of fraction of image area below which
|
||||
small objects are filtered
|
||||
foreground_class_of_interest: Build a binary classifier based on the
|
||||
presence or absence of this object in the image.
|
||||
"""
|
||||
# 1. Download the coco dataset into a subdirectory under the visualwakewords
|
||||
# dataset directory
|
||||
coco_dir = os.path.join(dataset_dir, FLAGS.coco_dirname)
|
||||
|
||||
if not tf.gfile.IsDirectory(coco_dir):
|
||||
tf.gfile.MakeDirs(coco_dir)
|
||||
|
||||
download_and_convert_visualwakewords_lib.download_coco_dataset(coco_dir)
|
||||
|
||||
# Path to COCO annotations
|
||||
train_annotations_file = os.path.join(coco_dir, 'annotations',
|
||||
'instances_train2014.json')
|
||||
val_annotations_file = os.path.join(coco_dir, 'annotations',
|
||||
'instances_val2014.json')
|
||||
train_image_dir = os.path.join(coco_dir, 'train2014')
|
||||
val_image_dir = os.path.join(coco_dir, 'val2014')
|
||||
|
||||
# Path to VisualWakeWords annotations
|
||||
visualwakewords_annotations_train = os.path.join(
|
||||
dataset_dir, 'instances_visualwakewords_train2014.json')
|
||||
visualwakewords_annotations_val = os.path.join(
|
||||
dataset_dir, 'instances_visualwakewords_val2014.json')
|
||||
visualwakewords_labels_filename = os.path.join(dataset_dir, 'labels.txt')
|
||||
train_output_path = os.path.join(dataset_dir, 'train.record')
|
||||
val_output_path = os.path.join(dataset_dir, 'val.record')
|
||||
|
||||
# 2. Create a labels file
|
||||
tf.logging.info('Creating a labels file...')
|
||||
download_and_convert_visualwakewords_lib.create_labels_file(
|
||||
foreground_class_of_interest, visualwakewords_labels_filename)
|
||||
|
||||
# 3. Use COCO annotations to create VisualWakeWords annotations
|
||||
tf.logging.info('Creating train VisualWakeWords annotations...')
|
||||
download_and_convert_visualwakewords_lib.create_visual_wakeword_annotations(
|
||||
train_annotations_file, visualwakewords_annotations_train,
|
||||
small_object_area_threshold, foreground_class_of_interest)
|
||||
tf.logging.info('Creating validation VisualWakeWords annotations...')
|
||||
download_and_convert_visualwakewords_lib.create_visual_wakeword_annotations(
|
||||
val_annotations_file, visualwakewords_annotations_val,
|
||||
small_object_area_threshold, foreground_class_of_interest)
|
||||
|
||||
# 4. Use VisualWakeWords annotations to create the TFRecords
|
||||
tf.logging.info('Creating train TFRecords for VisualWakeWords dataset...')
|
||||
download_and_convert_visualwakewords_lib.create_tf_record_for_visualwakewords_dataset(
|
||||
visualwakewords_annotations_train,
|
||||
train_image_dir,
|
||||
train_output_path,
|
||||
num_shards=100)
|
||||
|
||||
tf.logging.info(
|
||||
'Creating validation TFRecords for VisualWakeWords dataset...')
|
||||
download_and_convert_visualwakewords_lib.create_tf_record_for_visualwakewords_dataset(
|
||||
visualwakewords_annotations_val,
|
||||
val_image_dir,
|
||||
val_output_path,
|
||||
num_shards=10)
|
||||
+286
@@ -0,0 +1,286 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Helper functions to generate the Visual WakeWords dataset.
|
||||
|
||||
It filters raw COCO annotations file to Visual WakeWords Dataset
|
||||
annotations. The resulting annotations and COCO images are then converted
|
||||
to TF records.
|
||||
See download_and_convert_visualwakewords.py for the sample usage.
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import collections
|
||||
import hashlib
|
||||
import io
|
||||
import json
|
||||
import os
|
||||
import contextlib2
|
||||
|
||||
import PIL.Image
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
|
||||
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'coco_train_url',
|
||||
'http://images.cocodataset.org/zips/train2014.zip',
|
||||
'Link to zip file containing coco training data')
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'coco_validation_url',
|
||||
'http://images.cocodataset.org/zips/val2014.zip',
|
||||
'Link to zip file containing coco validation data')
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'coco_annotations_url',
|
||||
'http://images.cocodataset.org/annotations/annotations_trainval2014.zip',
|
||||
'Link to zip file containing coco annotation data')
|
||||
|
||||
FLAGS = tf.compat.v1.app.flags.FLAGS
|
||||
|
||||
|
||||
def download_coco_dataset(dataset_dir):
|
||||
"""Download the coco dataset.
|
||||
|
||||
Args:
|
||||
dataset_dir: Path where coco dataset should be downloaded.
|
||||
"""
|
||||
dataset_utils.download_and_uncompress_zipfile(FLAGS.coco_train_url,
|
||||
dataset_dir)
|
||||
dataset_utils.download_and_uncompress_zipfile(FLAGS.coco_validation_url,
|
||||
dataset_dir)
|
||||
dataset_utils.download_and_uncompress_zipfile(FLAGS.coco_annotations_url,
|
||||
dataset_dir)
|
||||
|
||||
|
||||
def create_labels_file(foreground_class_of_interest,
|
||||
visualwakewords_labels_file):
|
||||
"""Generate visualwakewords labels file.
|
||||
|
||||
Args:
|
||||
foreground_class_of_interest: category from COCO dataset that is filtered by
|
||||
the visualwakewords dataset
|
||||
visualwakewords_labels_file: output visualwakewords label file
|
||||
"""
|
||||
labels_to_class_names = {0: 'background', 1: foreground_class_of_interest}
|
||||
with open(visualwakewords_labels_file, 'w') as fp:
|
||||
for label in labels_to_class_names:
|
||||
fp.write(str(label) + ':' + str(labels_to_class_names[label]) + '\n')
|
||||
|
||||
|
||||
def create_visual_wakeword_annotations(annotations_file,
|
||||
visualwakewords_annotations_file,
|
||||
small_object_area_threshold,
|
||||
foreground_class_of_interest):
|
||||
"""Generate visual wakewords annotations file.
|
||||
|
||||
Loads COCO annotation json files to generate visualwakewords annotations file.
|
||||
|
||||
Args:
|
||||
annotations_file: JSON file containing COCO bounding box annotations
|
||||
visualwakewords_annotations_file: path to output annotations file
|
||||
small_object_area_threshold: threshold on fraction of image area below which
|
||||
small object bounding boxes are filtered
|
||||
foreground_class_of_interest: category from COCO dataset that is filtered by
|
||||
the visual wakewords dataset
|
||||
"""
|
||||
# default object of interest is person
|
||||
foreground_class_of_interest_id = 1
|
||||
with tf.gfile.GFile(annotations_file, 'r') as fid:
|
||||
groundtruth_data = json.load(fid)
|
||||
images = groundtruth_data['images']
|
||||
# Create category index
|
||||
category_index = {}
|
||||
for category in groundtruth_data['categories']:
|
||||
if category['name'] == foreground_class_of_interest:
|
||||
foreground_class_of_interest_id = category['id']
|
||||
category_index[category['id']] = category
|
||||
# Create annotations index, a map of image_id to it's annotations
|
||||
tf.logging.info('Building annotations index...')
|
||||
annotations_index = collections.defaultdict(
|
||||
lambda: collections.defaultdict(list))
|
||||
# structure is { "image_id": {"objects" : [list of the image annotations]}}
|
||||
for annotation in groundtruth_data['annotations']:
|
||||
annotations_index[annotation['image_id']]['objects'].append(annotation)
|
||||
missing_annotation_count = len(images) - len(annotations_index)
|
||||
tf.logging.info('%d images are missing annotations.',
|
||||
missing_annotation_count)
|
||||
# Create filtered annotations index
|
||||
annotations_index_filtered = {}
|
||||
for idx, image in enumerate(images):
|
||||
if idx % 100 == 0:
|
||||
tf.logging.info('On image %d of %d', idx, len(images))
|
||||
annotations = annotations_index[image['id']]
|
||||
annotations_filtered = _filter_annotations(
|
||||
annotations, image, small_object_area_threshold,
|
||||
foreground_class_of_interest_id)
|
||||
annotations_index_filtered[image['id']] = annotations_filtered
|
||||
|
||||
with open(visualwakewords_annotations_file, 'w') as fp:
|
||||
json.dump(
|
||||
{
|
||||
'images': images,
|
||||
'annotations': annotations_index_filtered,
|
||||
'categories': category_index
|
||||
}, fp)
|
||||
|
||||
|
||||
def _filter_annotations(annotations, image, small_object_area_threshold,
|
||||
foreground_class_of_interest_id):
|
||||
"""Filters COCO annotations to visual wakewords annotations.
|
||||
|
||||
Args:
|
||||
annotations: dicts with keys: {
|
||||
u'objects': [{u'id', u'image_id', u'category_id', u'segmentation',
|
||||
u'area', u'bbox' : [x,y,width,height], u'iscrowd'}] } Notice
|
||||
that bounding box coordinates in the official COCO dataset
|
||||
are given as [x, y, width, height] tuples using absolute
|
||||
coordinates where x, y represent the top-left (0-indexed)
|
||||
corner.
|
||||
image: dict with keys: [u'license', u'file_name', u'coco_url', u'height',
|
||||
u'width', u'date_captured', u'flickr_url', u'id']
|
||||
small_object_area_threshold: threshold on fraction of image area below which
|
||||
small objects are filtered
|
||||
foreground_class_of_interest_id: category of COCO dataset which visual
|
||||
wakewords filters
|
||||
|
||||
Returns:
|
||||
annotations_filtered: dict with keys: {
|
||||
u'objects': [{"area", "bbox" : [x,y,width,height]}],
|
||||
u'label',
|
||||
}
|
||||
"""
|
||||
objects = []
|
||||
image_area = image['height'] * image['width']
|
||||
for annotation in annotations['objects']:
|
||||
normalized_object_area = annotation['area'] / image_area
|
||||
category_id = int(annotation['category_id'])
|
||||
# Filter valid bounding boxes
|
||||
if category_id == foreground_class_of_interest_id and \
|
||||
normalized_object_area > small_object_area_threshold:
|
||||
objects.append({
|
||||
u'area': annotation['area'],
|
||||
u'bbox': annotation['bbox'],
|
||||
})
|
||||
label = 1 if objects else 0
|
||||
return {
|
||||
'objects': objects,
|
||||
'label': label,
|
||||
}
|
||||
|
||||
|
||||
def create_tf_record_for_visualwakewords_dataset(annotations_file, image_dir,
|
||||
output_path, num_shards):
|
||||
"""Loads Visual WakeWords annotations/images and converts to tf.Record format.
|
||||
|
||||
Args:
|
||||
annotations_file: JSON file containing bounding box annotations.
|
||||
image_dir: Directory containing the image files.
|
||||
output_path: Path to output tf.Record file.
|
||||
num_shards: number of output file shards.
|
||||
"""
|
||||
with contextlib2.ExitStack() as tf_record_close_stack, \
|
||||
tf.gfile.GFile(annotations_file, 'r') as fid:
|
||||
output_tfrecords = dataset_utils.open_sharded_output_tfrecords(
|
||||
tf_record_close_stack, output_path, num_shards)
|
||||
groundtruth_data = json.load(fid)
|
||||
images = groundtruth_data['images']
|
||||
annotations_index = groundtruth_data['annotations']
|
||||
annotations_index = {int(k): v for k, v in annotations_index.iteritems()}
|
||||
# convert 'unicode' key to 'int' key after we parse the json file
|
||||
|
||||
for idx, image in enumerate(images):
|
||||
if idx % 100 == 0:
|
||||
tf.logging.info('On image %d of %d', idx, len(images))
|
||||
annotations = annotations_index[image['id']]
|
||||
tf_example = _create_tf_example(image, annotations, image_dir)
|
||||
shard_idx = idx % num_shards
|
||||
output_tfrecords[shard_idx].write(tf_example.SerializeToString())
|
||||
|
||||
|
||||
def _create_tf_example(image, annotations, image_dir):
|
||||
"""Converts image and annotations to a tf.Example proto.
|
||||
|
||||
Args:
|
||||
image: dict with keys: [u'license', u'file_name', u'coco_url', u'height',
|
||||
u'width', u'date_captured', u'flickr_url', u'id']
|
||||
annotations: dict with objects (a list of image annotations) and a label.
|
||||
{u'objects':[{"area", "bbox" : [x,y,width,height}], u'label'}. Notice
|
||||
that bounding box coordinates in the COCO dataset are given as[x, y,
|
||||
width, height] tuples using absolute coordinates where x, y represent
|
||||
the top-left (0-indexed) corner. This function also converts to the format
|
||||
that can be used by the Tensorflow Object Detection API (which is [ymin,
|
||||
xmin, ymax, xmax] with coordinates normalized relative to image size).
|
||||
image_dir: directory containing the image files.
|
||||
Returns:
|
||||
tf_example: The converted tf.Example
|
||||
|
||||
Raises:
|
||||
ValueError: if the image pointed to by data['filename'] is not a valid JPEG
|
||||
"""
|
||||
image_height = image['height']
|
||||
image_width = image['width']
|
||||
filename = image['file_name']
|
||||
image_id = image['id']
|
||||
|
||||
full_path = os.path.join(image_dir, filename)
|
||||
with tf.gfile.GFile(full_path, 'rb') as fid:
|
||||
encoded_jpg = fid.read()
|
||||
encoded_jpg_io = io.BytesIO(encoded_jpg)
|
||||
image = PIL.Image.open(encoded_jpg_io)
|
||||
key = hashlib.sha256(encoded_jpg).hexdigest()
|
||||
|
||||
xmin, xmax, ymin, ymax, area = [], [], [], [], []
|
||||
for obj in annotations['objects']:
|
||||
(x, y, width, height) = tuple(obj['bbox'])
|
||||
xmin.append(float(x) / image_width)
|
||||
xmax.append(float(x + width) / image_width)
|
||||
ymin.append(float(y) / image_height)
|
||||
ymax.append(float(y + height) / image_height)
|
||||
area.append(obj['area'])
|
||||
|
||||
feature_dict = {
|
||||
'image/height':
|
||||
dataset_utils.int64_feature(image_height),
|
||||
'image/width':
|
||||
dataset_utils.int64_feature(image_width),
|
||||
'image/filename':
|
||||
dataset_utils.bytes_feature(filename.encode('utf8')),
|
||||
'image/source_id':
|
||||
dataset_utils.bytes_feature(str(image_id).encode('utf8')),
|
||||
'image/key/sha256':
|
||||
dataset_utils.bytes_feature(key.encode('utf8')),
|
||||
'image/encoded':
|
||||
dataset_utils.bytes_feature(encoded_jpg),
|
||||
'image/format':
|
||||
dataset_utils.bytes_feature('jpeg'.encode('utf8')),
|
||||
'image/class/label':
|
||||
dataset_utils.int64_feature(annotations['label']),
|
||||
'image/object/bbox/xmin':
|
||||
dataset_utils.float_list_feature(xmin),
|
||||
'image/object/bbox/xmax':
|
||||
dataset_utils.float_list_feature(xmax),
|
||||
'image/object/bbox/ymin':
|
||||
dataset_utils.float_list_feature(ymin),
|
||||
'image/object/bbox/ymax':
|
||||
dataset_utils.float_list_feature(ymax),
|
||||
'image/object/area':
|
||||
dataset_utils.float_list_feature(area),
|
||||
}
|
||||
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
|
||||
return example
|
||||
+99
@@ -0,0 +1,99 @@
|
||||
#!/bin/bash
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
|
||||
# Script to download ImageNet Challenge 2012 training and validation data set.
|
||||
#
|
||||
# Downloads and decompresses raw images and bounding boxes.
|
||||
#
|
||||
# **IMPORTANT**
|
||||
# To download the raw images, the user must create an account with image-net.org
|
||||
# and generate a username and access_key. The latter two are required for
|
||||
# downloading the raw images.
|
||||
#
|
||||
# usage:
|
||||
# ./download_imagenet.sh [dirname]
|
||||
set -e
|
||||
|
||||
if [ "x$IMAGENET_ACCESS_KEY" == x -o "x$IMAGENET_USERNAME" == x ]; then
|
||||
cat <<END
|
||||
In order to download the imagenet data, you have to create an account with
|
||||
image-net.org. This will get you a username and an access key. You can set the
|
||||
IMAGENET_USERNAME and IMAGENET_ACCESS_KEY environment variables, or you can
|
||||
enter the credentials here.
|
||||
END
|
||||
read -p "Username: " IMAGENET_USERNAME
|
||||
read -p "Access key: " IMAGENET_ACCESS_KEY
|
||||
fi
|
||||
|
||||
OUTDIR="${1:-./imagenet-data}"
|
||||
SYNSETS_FILE="${2:-./synsets.txt}"
|
||||
|
||||
echo "Saving downloaded files to $OUTDIR"
|
||||
mkdir -p "${OUTDIR}"
|
||||
CURRENT_DIR=$(pwd)
|
||||
BBOX_DIR="${OUTDIR}bounding_boxes"
|
||||
mkdir -p "${BBOX_DIR}"
|
||||
cd "${OUTDIR}"
|
||||
|
||||
# Download and process all of the ImageNet bounding boxes.
|
||||
BASE_URL="http://www.image-net.org/challenges/LSVRC/2012/nnoupb"
|
||||
|
||||
# See here for details: http://www.image-net.org/download-bboxes
|
||||
BOUNDING_BOX_ANNOTATIONS="${BASE_URL}/ILSVRC2012_bbox_train_v2.tar.gz"
|
||||
BBOX_TAR_BALL="${BBOX_DIR}/annotations.tar.gz"
|
||||
echo "Downloading bounding box annotations."
|
||||
wget "${BOUNDING_BOX_ANNOTATIONS}" -O "${BBOX_TAR_BALL}"
|
||||
echo "Uncompressing bounding box annotations ..."
|
||||
tar xzf "${BBOX_TAR_BALL}" -C "${BBOX_DIR}"
|
||||
|
||||
LABELS_ANNOTATED="${BBOX_DIR}/*"
|
||||
NUM_XML=$(ls -1 ${LABELS_ANNOTATED} | wc -l)
|
||||
echo "Identified ${NUM_XML} bounding box annotations."
|
||||
|
||||
# Download and uncompress all images from the ImageNet 2012 validation dataset.
|
||||
VALIDATION_TARBALL="ILSVRC2012_img_val.tar"
|
||||
OUTPUT_PATH="${OUTDIR}validation/"
|
||||
mkdir -p "${OUTPUT_PATH}"
|
||||
cd "${OUTDIR}/.."
|
||||
echo "Downloading ${VALIDATION_TARBALL} to ${OUTPUT_PATH}."
|
||||
wget -nd -c "${BASE_URL}/${VALIDATION_TARBALL}"
|
||||
tar xf "${VALIDATION_TARBALL}" -C "${OUTPUT_PATH}"
|
||||
|
||||
# Download all images from the ImageNet 2012 train dataset.
|
||||
TRAIN_TARBALL="ILSVRC2012_img_train.tar"
|
||||
OUTPUT_PATH="${OUTDIR}train/"
|
||||
mkdir -p "${OUTPUT_PATH}"
|
||||
cd "${OUTDIR}/.."
|
||||
echo "Downloading ${TRAIN_TARBALL} to ${OUTPUT_PATH}."
|
||||
wget -nd -c "${BASE_URL}/${TRAIN_TARBALL}"
|
||||
|
||||
# Un-compress the individual tar-files within the train tar-file.
|
||||
echo "Uncompressing individual train tar-balls in the training data."
|
||||
|
||||
while read SYNSET; do
|
||||
echo "Processing: ${SYNSET}"
|
||||
|
||||
# Create a directory and delete anything there.
|
||||
mkdir -p "${OUTPUT_PATH}/${SYNSET}"
|
||||
rm -rf "${OUTPUT_PATH}/${SYNSET}/*"
|
||||
|
||||
# Uncompress into the directory.
|
||||
tar xf "${TRAIN_TARBALL}" "${SYNSET}.tar"
|
||||
tar xf "${SYNSET}.tar" -C "${OUTPUT_PATH}/${SYNSET}/"
|
||||
rm -f "${SYNSET}.tar"
|
||||
|
||||
echo "Finished processing: ${SYNSET}"
|
||||
done < "${SYNSETS_FILE}"
|
||||
+99
@@ -0,0 +1,99 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Provides data for the flowers dataset.
|
||||
|
||||
The dataset scripts used to create the dataset can be found at:
|
||||
tensorflow/models/research/slim/datasets/download_and_convert_flowers.py
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
_FILE_PATTERN = 'flowers_%s_*.tfrecord'
|
||||
|
||||
SPLITS_TO_SIZES = {'train': 3320, 'validation': 350}
|
||||
|
||||
_NUM_CLASSES = 5
|
||||
|
||||
_ITEMS_TO_DESCRIPTIONS = {
|
||||
'image': 'A color image of varying size.',
|
||||
'label': 'A single integer between 0 and 4',
|
||||
}
|
||||
|
||||
|
||||
def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Gets a dataset tuple with instructions for reading flowers.
|
||||
|
||||
Args:
|
||||
split_name: A train/validation split name.
|
||||
dataset_dir: The base directory of the dataset sources.
|
||||
file_pattern: The file pattern to use when matching the dataset sources.
|
||||
It is assumed that the pattern contains a '%s' string so that the split
|
||||
name can be inserted.
|
||||
reader: The TensorFlow reader type.
|
||||
|
||||
Returns:
|
||||
A `Dataset` namedtuple.
|
||||
|
||||
Raises:
|
||||
ValueError: if `split_name` is not a valid train/validation split.
|
||||
"""
|
||||
if split_name not in SPLITS_TO_SIZES:
|
||||
raise ValueError('split name %s was not recognized.' % split_name)
|
||||
|
||||
if not file_pattern:
|
||||
file_pattern = _FILE_PATTERN
|
||||
file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
|
||||
|
||||
# Allowing None in the signature so that dataset_factory can use the default.
|
||||
if reader is None:
|
||||
reader = tf.TFRecordReader
|
||||
|
||||
keys_to_features = {
|
||||
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
|
||||
'image/format': tf.FixedLenFeature((), tf.string, default_value='png'),
|
||||
'image/class/label': tf.FixedLenFeature(
|
||||
[], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),
|
||||
}
|
||||
|
||||
items_to_handlers = {
|
||||
'image': slim.tfexample_decoder.Image(),
|
||||
'label': slim.tfexample_decoder.Tensor('image/class/label'),
|
||||
}
|
||||
|
||||
decoder = slim.tfexample_decoder.TFExampleDecoder(
|
||||
keys_to_features, items_to_handlers)
|
||||
|
||||
labels_to_names = None
|
||||
if dataset_utils.has_labels(dataset_dir):
|
||||
labels_to_names = dataset_utils.read_label_file(dataset_dir)
|
||||
|
||||
return slim.dataset.Dataset(
|
||||
data_sources=file_pattern,
|
||||
reader=reader,
|
||||
decoder=decoder,
|
||||
num_samples=SPLITS_TO_SIZES[split_name],
|
||||
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
|
||||
num_classes=_NUM_CLASSES,
|
||||
labels_to_names=labels_to_names)
|
||||
+199
@@ -0,0 +1,199 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Provides data for the ImageNet ILSVRC 2012 Dataset plus some bounding boxes.
|
||||
|
||||
Some images have one or more bounding boxes associated with the label of the
|
||||
image. See details here: http://image-net.org/download-bboxes
|
||||
|
||||
ImageNet is based upon WordNet 3.0. To uniquely identify a synset, we use
|
||||
"WordNet ID" (wnid), which is a concatenation of POS ( i.e. part of speech )
|
||||
and SYNSET OFFSET of WordNet. For more information, please refer to the
|
||||
WordNet documentation[http://wordnet.princeton.edu/wordnet/documentation/].
|
||||
|
||||
"There are bounding boxes for over 3000 popular synsets available.
|
||||
For each synset, there are on average 150 images with bounding boxes."
|
||||
|
||||
WARNING: Don't use for object detection, in this case all the bounding boxes
|
||||
of the image belong to just one class.
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
from six.moves import urllib
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# TODO(nsilberman): Add tfrecord file type once the script is updated.
|
||||
_FILE_PATTERN = '%s-*'
|
||||
|
||||
_SPLITS_TO_SIZES = {
|
||||
'train': 1281167,
|
||||
'validation': 50000,
|
||||
}
|
||||
|
||||
_ITEMS_TO_DESCRIPTIONS = {
|
||||
'image': 'A color image of varying height and width.',
|
||||
'label': 'The label id of the image, integer between 0 and 999',
|
||||
'label_text': 'The text of the label.',
|
||||
'object/bbox': 'A list of bounding boxes.',
|
||||
'object/label': 'A list of labels, one per each object.',
|
||||
}
|
||||
|
||||
_NUM_CLASSES = 1001
|
||||
|
||||
# If set to false, will not try to set label_to_names in dataset
|
||||
# by reading them from labels.txt or github.
|
||||
LOAD_READABLE_NAMES = True
|
||||
|
||||
|
||||
def create_readable_names_for_imagenet_labels():
|
||||
"""Create a dict mapping label id to human readable string.
|
||||
|
||||
Returns:
|
||||
labels_to_names: dictionary where keys are integers from to 1000
|
||||
and values are human-readable names.
|
||||
|
||||
We retrieve a synset file, which contains a list of valid synset labels used
|
||||
by ILSVRC competition. There is one synset one per line, eg.
|
||||
# n01440764
|
||||
# n01443537
|
||||
We also retrieve a synset_to_human_file, which contains a mapping from synsets
|
||||
to human-readable names for every synset in Imagenet. These are stored in a
|
||||
tsv format, as follows:
|
||||
# n02119247 black fox
|
||||
# n02119359 silver fox
|
||||
We assign each synset (in alphabetical order) an integer, starting from 1
|
||||
(since 0 is reserved for the background class).
|
||||
|
||||
Code is based on
|
||||
https://github.com/tensorflow/models/blob/master/research/inception/inception/data/build_imagenet_data.py#L463
|
||||
"""
|
||||
|
||||
# pylint: disable=g-line-too-long
|
||||
base_url = 'https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/'
|
||||
synset_url = '{}/imagenet_lsvrc_2015_synsets.txt'.format(base_url)
|
||||
synset_to_human_url = '{}/imagenet_metadata.txt'.format(base_url)
|
||||
|
||||
filename, _ = urllib.request.urlretrieve(synset_url)
|
||||
synset_list = [s.strip() for s in open(filename).readlines()]
|
||||
num_synsets_in_ilsvrc = len(synset_list)
|
||||
assert num_synsets_in_ilsvrc == 1000
|
||||
|
||||
filename, _ = urllib.request.urlretrieve(synset_to_human_url)
|
||||
synset_to_human_list = open(filename).readlines()
|
||||
num_synsets_in_all_imagenet = len(synset_to_human_list)
|
||||
assert num_synsets_in_all_imagenet == 21842
|
||||
|
||||
synset_to_human = {}
|
||||
for s in synset_to_human_list:
|
||||
parts = s.strip().split('\t')
|
||||
assert len(parts) == 2
|
||||
synset = parts[0]
|
||||
human = parts[1]
|
||||
synset_to_human[synset] = human
|
||||
|
||||
label_index = 1
|
||||
labels_to_names = {0: 'background'}
|
||||
for synset in synset_list:
|
||||
name = synset_to_human[synset]
|
||||
labels_to_names[label_index] = name
|
||||
label_index += 1
|
||||
|
||||
return labels_to_names
|
||||
|
||||
|
||||
def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Gets a dataset tuple with instructions for reading ImageNet.
|
||||
|
||||
Args:
|
||||
split_name: A train/test split name.
|
||||
dataset_dir: The base directory of the dataset sources.
|
||||
file_pattern: The file pattern to use when matching the dataset sources.
|
||||
It is assumed that the pattern contains a '%s' string so that the split
|
||||
name can be inserted.
|
||||
reader: The TensorFlow reader type.
|
||||
|
||||
Returns:
|
||||
A `Dataset` namedtuple.
|
||||
|
||||
Raises:
|
||||
ValueError: if `split_name` is not a valid train/test split.
|
||||
"""
|
||||
if split_name not in _SPLITS_TO_SIZES:
|
||||
raise ValueError('split name %s was not recognized.' % split_name)
|
||||
|
||||
if not file_pattern:
|
||||
file_pattern = _FILE_PATTERN
|
||||
file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
|
||||
|
||||
# Allowing None in the signature so that dataset_factory can use the default.
|
||||
if reader is None:
|
||||
reader = tf.TFRecordReader
|
||||
|
||||
keys_to_features = {
|
||||
'image/encoded': tf.FixedLenFeature(
|
||||
(), tf.string, default_value=''),
|
||||
'image/format': tf.FixedLenFeature(
|
||||
(), tf.string, default_value='jpeg'),
|
||||
'image/class/label': tf.FixedLenFeature(
|
||||
[], dtype=tf.int64, default_value=-1),
|
||||
'image/class/text': tf.FixedLenFeature(
|
||||
[], dtype=tf.string, default_value=''),
|
||||
'image/object/bbox/xmin': tf.VarLenFeature(
|
||||
dtype=tf.float32),
|
||||
'image/object/bbox/ymin': tf.VarLenFeature(
|
||||
dtype=tf.float32),
|
||||
'image/object/bbox/xmax': tf.VarLenFeature(
|
||||
dtype=tf.float32),
|
||||
'image/object/bbox/ymax': tf.VarLenFeature(
|
||||
dtype=tf.float32),
|
||||
'image/object/class/label': tf.VarLenFeature(
|
||||
dtype=tf.int64),
|
||||
}
|
||||
|
||||
items_to_handlers = {
|
||||
'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
|
||||
'label': slim.tfexample_decoder.Tensor('image/class/label'),
|
||||
'label_text': slim.tfexample_decoder.Tensor('image/class/text'),
|
||||
'object/bbox': slim.tfexample_decoder.BoundingBox(
|
||||
['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'),
|
||||
'object/label': slim.tfexample_decoder.Tensor('image/object/class/label'),
|
||||
}
|
||||
|
||||
decoder = slim.tfexample_decoder.TFExampleDecoder(
|
||||
keys_to_features, items_to_handlers)
|
||||
|
||||
labels_to_names = None
|
||||
if LOAD_READABLE_NAMES:
|
||||
if dataset_utils.has_labels(dataset_dir):
|
||||
labels_to_names = dataset_utils.read_label_file(dataset_dir)
|
||||
else:
|
||||
labels_to_names = create_readable_names_for_imagenet_labels()
|
||||
dataset_utils.write_label_file(labels_to_names, dataset_dir)
|
||||
|
||||
return slim.dataset.Dataset(
|
||||
data_sources=file_pattern,
|
||||
reader=reader,
|
||||
decoder=decoder,
|
||||
num_samples=_SPLITS_TO_SIZES[split_name],
|
||||
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
|
||||
num_classes=_NUM_CLASSES,
|
||||
labels_to_names=labels_to_names)
|
||||
+50000
File diff suppressed because it is too large
Load Diff
+1000
File diff suppressed because it is too large
Load Diff
+21842
File diff suppressed because it is too large
Load Diff
+99
@@ -0,0 +1,99 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Provides data for the MNIST dataset.
|
||||
|
||||
The dataset scripts used to create the dataset can be found at:
|
||||
tensorflow/models/research/slim/datasets/download_and_convert_mnist.py
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
_FILE_PATTERN = 'mnist_%s.tfrecord'
|
||||
|
||||
_SPLITS_TO_SIZES = {'train': 60000, 'test': 10000}
|
||||
|
||||
_NUM_CLASSES = 10
|
||||
|
||||
_ITEMS_TO_DESCRIPTIONS = {
|
||||
'image': 'A [28 x 28 x 1] grayscale image.',
|
||||
'label': 'A single integer between 0 and 9',
|
||||
}
|
||||
|
||||
|
||||
def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Gets a dataset tuple with instructions for reading MNIST.
|
||||
|
||||
Args:
|
||||
split_name: A train/test split name.
|
||||
dataset_dir: The base directory of the dataset sources.
|
||||
file_pattern: The file pattern to use when matching the dataset sources.
|
||||
It is assumed that the pattern contains a '%s' string so that the split
|
||||
name can be inserted.
|
||||
reader: The TensorFlow reader type.
|
||||
|
||||
Returns:
|
||||
A `Dataset` namedtuple.
|
||||
|
||||
Raises:
|
||||
ValueError: if `split_name` is not a valid train/test split.
|
||||
"""
|
||||
if split_name not in _SPLITS_TO_SIZES:
|
||||
raise ValueError('split name %s was not recognized.' % split_name)
|
||||
|
||||
if not file_pattern:
|
||||
file_pattern = _FILE_PATTERN
|
||||
file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
|
||||
|
||||
# Allowing None in the signature so that dataset_factory can use the default.
|
||||
if reader is None:
|
||||
reader = tf.TFRecordReader
|
||||
|
||||
keys_to_features = {
|
||||
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
|
||||
'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
|
||||
'image/class/label': tf.FixedLenFeature(
|
||||
[1], tf.int64, default_value=tf.zeros([1], dtype=tf.int64)),
|
||||
}
|
||||
|
||||
items_to_handlers = {
|
||||
'image': slim.tfexample_decoder.Image(shape=[28, 28, 1], channels=1),
|
||||
'label': slim.tfexample_decoder.Tensor('image/class/label', shape=[]),
|
||||
}
|
||||
|
||||
decoder = slim.tfexample_decoder.TFExampleDecoder(
|
||||
keys_to_features, items_to_handlers)
|
||||
|
||||
labels_to_names = None
|
||||
if dataset_utils.has_labels(dataset_dir):
|
||||
labels_to_names = dataset_utils.read_label_file(dataset_dir)
|
||||
|
||||
return slim.dataset.Dataset(
|
||||
data_sources=file_pattern,
|
||||
reader=reader,
|
||||
decoder=decoder,
|
||||
num_samples=_SPLITS_TO_SIZES[split_name],
|
||||
num_classes=_NUM_CLASSES,
|
||||
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
|
||||
labels_to_names=labels_to_names)
|
||||
+83
@@ -0,0 +1,83 @@
|
||||
#!/usr/bin/python
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Process the ImageNet Challenge bounding boxes for TensorFlow model training.
|
||||
|
||||
Associate the ImageNet 2012 Challenge validation data set with labels.
|
||||
|
||||
The raw ImageNet validation data set is expected to reside in JPEG files
|
||||
located in the following directory structure.
|
||||
|
||||
data_dir/ILSVRC2012_val_00000001.JPEG
|
||||
data_dir/ILSVRC2012_val_00000002.JPEG
|
||||
...
|
||||
data_dir/ILSVRC2012_val_00050000.JPEG
|
||||
|
||||
This script moves the files into a directory structure like such:
|
||||
data_dir/n01440764/ILSVRC2012_val_00000293.JPEG
|
||||
data_dir/n01440764/ILSVRC2012_val_00000543.JPEG
|
||||
...
|
||||
where 'n01440764' is the unique synset label associated with
|
||||
these images.
|
||||
|
||||
This directory reorganization requires a mapping from validation image
|
||||
number (i.e. suffix of the original file) to the associated label. This
|
||||
is provided in the ImageNet development kit via a Matlab file.
|
||||
|
||||
In order to make life easier and divorce ourselves from Matlab, we instead
|
||||
supply a custom text file that provides this mapping for us.
|
||||
|
||||
Sample usage:
|
||||
./preprocess_imagenet_validation_data.py ILSVRC2012_img_val \
|
||||
imagenet_2012_validation_synset_labels.txt
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
if len(sys.argv) < 3:
|
||||
print('Invalid usage\n'
|
||||
'usage: preprocess_imagenet_validation_data.py '
|
||||
'<validation data dir> <validation labels file>')
|
||||
sys.exit(-1)
|
||||
data_dir = sys.argv[1]
|
||||
validation_labels_file = sys.argv[2]
|
||||
|
||||
# Read in the 50000 synsets associated with the validation data set.
|
||||
labels = [l.strip() for l in open(validation_labels_file).readlines()]
|
||||
unique_labels = set(labels)
|
||||
|
||||
# Make all sub-directories in the validation data dir.
|
||||
for label in unique_labels:
|
||||
labeled_data_dir = os.path.join(data_dir, label)
|
||||
os.makedirs(labeled_data_dir)
|
||||
|
||||
# Move all of the image to the appropriate sub-directory.
|
||||
for i in xrange(len(labels)):
|
||||
basename = 'ILSVRC2012_val_000%.5d.JPEG' % (i + 1)
|
||||
original_filename = os.path.join(data_dir, basename)
|
||||
if not os.path.exists(original_filename):
|
||||
print('Failed to find: ', original_filename)
|
||||
sys.exit(-1)
|
||||
new_filename = os.path.join(data_dir, labels[i], basename)
|
||||
os.rename(original_filename, new_filename)
|
||||
+253
@@ -0,0 +1,253 @@
|
||||
#!/usr/bin/python
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Process the ImageNet Challenge bounding boxes for TensorFlow model training.
|
||||
|
||||
This script is called as
|
||||
|
||||
process_bounding_boxes.py <dir> [synsets-file]
|
||||
|
||||
Where <dir> is a directory containing the downloaded and unpacked bounding box
|
||||
data. If [synsets-file] is supplied, then only the bounding boxes whose
|
||||
synstes are contained within this file are returned. Note that the
|
||||
[synsets-file] file contains synset ids, one per line.
|
||||
|
||||
The script dumps out a CSV text file in which each line contains an entry.
|
||||
n00007846_64193.JPEG,0.0060,0.2620,0.7545,0.9940
|
||||
|
||||
The entry can be read as:
|
||||
<JPEG file name>, <xmin>, <ymin>, <xmax>, <ymax>
|
||||
|
||||
The bounding box for <JPEG file name> contains two points (xmin, ymin) and
|
||||
(xmax, ymax) specifying the lower-left corner and upper-right corner of a
|
||||
bounding box in *relative* coordinates.
|
||||
|
||||
The user supplies a directory where the XML files reside. The directory
|
||||
structure in the directory <dir> is assumed to look like this:
|
||||
|
||||
<dir>/nXXXXXXXX/nXXXXXXXX_YYYY.xml
|
||||
|
||||
Each XML file contains a bounding box annotation. The script:
|
||||
|
||||
(1) Parses the XML file and extracts the filename, label and bounding box info.
|
||||
|
||||
(2) The bounding box is specified in the XML files as integer (xmin, ymin) and
|
||||
(xmax, ymax) *relative* to image size displayed to the human annotator. The
|
||||
size of the image displayed to the human annotator is stored in the XML file
|
||||
as integer (height, width).
|
||||
|
||||
Note that the displayed size will differ from the actual size of the image
|
||||
downloaded from image-net.org. To make the bounding box annotation useable,
|
||||
we convert bounding box to floating point numbers relative to displayed
|
||||
height and width of the image.
|
||||
|
||||
Note that each XML file might contain N bounding box annotations.
|
||||
|
||||
Note that the points are all clamped at a range of [0.0, 1.0] because some
|
||||
human annotations extend outside the range of the supplied image.
|
||||
|
||||
See details here: http://image-net.org/download-bboxes
|
||||
|
||||
(3) By default, the script outputs all valid bounding boxes. If a
|
||||
[synsets-file] is supplied, only the subset of bounding boxes associated
|
||||
with those synsets are outputted. Importantly, one can supply a list of
|
||||
synsets in the ImageNet Challenge and output the list of bounding boxes
|
||||
associated with the training images of the ILSVRC.
|
||||
|
||||
We use these bounding boxes to inform the random distortion of images
|
||||
supplied to the network.
|
||||
|
||||
If you run this script successfully, you will see the following output
|
||||
to stderr:
|
||||
> Finished processing 544546 XML files.
|
||||
> Skipped 0 XML files not in ImageNet Challenge.
|
||||
> Skipped 0 bounding boxes not in ImageNet Challenge.
|
||||
> Wrote 615299 bounding boxes from 544546 annotated images.
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import glob
|
||||
import os.path
|
||||
import sys
|
||||
import xml.etree.ElementTree as ET
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
|
||||
|
||||
class BoundingBox(object):
|
||||
pass
|
||||
|
||||
|
||||
def GetItem(name, root, index=0):
|
||||
count = 0
|
||||
for item in root.iter(name):
|
||||
if count == index:
|
||||
return item.text
|
||||
count += 1
|
||||
# Failed to find "index" occurrence of item.
|
||||
return -1
|
||||
|
||||
|
||||
def GetInt(name, root, index=0):
|
||||
return int(GetItem(name, root, index))
|
||||
|
||||
|
||||
def FindNumberBoundingBoxes(root):
|
||||
index = 0
|
||||
while True:
|
||||
if GetInt('xmin', root, index) == -1:
|
||||
break
|
||||
index += 1
|
||||
return index
|
||||
|
||||
|
||||
def ProcessXMLAnnotation(xml_file):
|
||||
"""Process a single XML file containing a bounding box."""
|
||||
# pylint: disable=broad-except
|
||||
try:
|
||||
tree = ET.parse(xml_file)
|
||||
except Exception:
|
||||
print('Failed to parse: ' + xml_file, file=sys.stderr)
|
||||
return None
|
||||
# pylint: enable=broad-except
|
||||
root = tree.getroot()
|
||||
|
||||
num_boxes = FindNumberBoundingBoxes(root)
|
||||
boxes = []
|
||||
|
||||
for index in xrange(num_boxes):
|
||||
box = BoundingBox()
|
||||
# Grab the 'index' annotation.
|
||||
box.xmin = GetInt('xmin', root, index)
|
||||
box.ymin = GetInt('ymin', root, index)
|
||||
box.xmax = GetInt('xmax', root, index)
|
||||
box.ymax = GetInt('ymax', root, index)
|
||||
|
||||
box.width = GetInt('width', root)
|
||||
box.height = GetInt('height', root)
|
||||
box.filename = GetItem('filename', root) + '.JPEG'
|
||||
box.label = GetItem('name', root)
|
||||
|
||||
xmin = float(box.xmin) / float(box.width)
|
||||
xmax = float(box.xmax) / float(box.width)
|
||||
ymin = float(box.ymin) / float(box.height)
|
||||
ymax = float(box.ymax) / float(box.height)
|
||||
|
||||
# Some images contain bounding box annotations that
|
||||
# extend outside of the supplied image. See, e.g.
|
||||
# n03127925/n03127925_147.xml
|
||||
# Additionally, for some bounding boxes, the min > max
|
||||
# or the box is entirely outside of the image.
|
||||
min_x = min(xmin, xmax)
|
||||
max_x = max(xmin, xmax)
|
||||
box.xmin_scaled = min(max(min_x, 0.0), 1.0)
|
||||
box.xmax_scaled = min(max(max_x, 0.0), 1.0)
|
||||
|
||||
min_y = min(ymin, ymax)
|
||||
max_y = max(ymin, ymax)
|
||||
box.ymin_scaled = min(max(min_y, 0.0), 1.0)
|
||||
box.ymax_scaled = min(max(max_y, 0.0), 1.0)
|
||||
|
||||
boxes.append(box)
|
||||
|
||||
return boxes
|
||||
|
||||
if __name__ == '__main__':
|
||||
if len(sys.argv) < 2 or len(sys.argv) > 3:
|
||||
print('Invalid usage\n'
|
||||
'usage: process_bounding_boxes.py <dir> [synsets-file]',
|
||||
file=sys.stderr)
|
||||
sys.exit(-1)
|
||||
|
||||
xml_files = glob.glob(sys.argv[1] + '/*/*.xml')
|
||||
print('Identified %d XML files in %s' % (len(xml_files), sys.argv[1]),
|
||||
file=sys.stderr)
|
||||
|
||||
if len(sys.argv) == 3:
|
||||
labels = set([l.strip() for l in open(sys.argv[2]).readlines()])
|
||||
print('Identified %d synset IDs in %s' % (len(labels), sys.argv[2]),
|
||||
file=sys.stderr)
|
||||
else:
|
||||
labels = None
|
||||
|
||||
skipped_boxes = 0
|
||||
skipped_files = 0
|
||||
saved_boxes = 0
|
||||
saved_files = 0
|
||||
for file_index, one_file in enumerate(xml_files):
|
||||
# Example: <...>/n06470073/n00141669_6790.xml
|
||||
label = os.path.basename(os.path.dirname(one_file))
|
||||
|
||||
# Determine if the annotation is from an ImageNet Challenge label.
|
||||
if labels is not None and label not in labels:
|
||||
skipped_files += 1
|
||||
continue
|
||||
|
||||
bboxes = ProcessXMLAnnotation(one_file)
|
||||
assert bboxes is not None, 'No bounding boxes found in ' + one_file
|
||||
|
||||
found_box = False
|
||||
for bbox in bboxes:
|
||||
if labels is not None:
|
||||
if bbox.label != label:
|
||||
# Note: There is a slight bug in the bounding box annotation data.
|
||||
# Many of the dog labels have the human label 'Scottish_deerhound'
|
||||
# instead of the synset ID 'n02092002' in the bbox.label field. As a
|
||||
# simple hack to overcome this issue, we only exclude bbox labels
|
||||
# *which are synset ID's* that do not match original synset label for
|
||||
# the XML file.
|
||||
if bbox.label in labels:
|
||||
skipped_boxes += 1
|
||||
continue
|
||||
|
||||
# Guard against improperly specified boxes.
|
||||
if (bbox.xmin_scaled >= bbox.xmax_scaled or
|
||||
bbox.ymin_scaled >= bbox.ymax_scaled):
|
||||
skipped_boxes += 1
|
||||
continue
|
||||
|
||||
# Note bbox.filename occasionally contains '%s' in the name. This is
|
||||
# data set noise that is fixed by just using the basename of the XML file.
|
||||
image_filename = os.path.splitext(os.path.basename(one_file))[0]
|
||||
print('%s.JPEG,%.4f,%.4f,%.4f,%.4f' %
|
||||
(image_filename,
|
||||
bbox.xmin_scaled, bbox.ymin_scaled,
|
||||
bbox.xmax_scaled, bbox.ymax_scaled))
|
||||
|
||||
saved_boxes += 1
|
||||
found_box = True
|
||||
if found_box:
|
||||
saved_files += 1
|
||||
else:
|
||||
skipped_files += 1
|
||||
|
||||
if not file_index % 5000:
|
||||
print('--> processed %d of %d XML files.' %
|
||||
(file_index + 1, len(xml_files)),
|
||||
file=sys.stderr)
|
||||
print('--> skipped %d boxes and %d XML files.' %
|
||||
(skipped_boxes, skipped_files), file=sys.stderr)
|
||||
|
||||
print('Finished processing %d XML files.' % len(xml_files), file=sys.stderr)
|
||||
print('Skipped %d XML files not in ImageNet Challenge.' % skipped_files,
|
||||
file=sys.stderr)
|
||||
print('Skipped %d bounding boxes not in ImageNet Challenge.' % skipped_boxes,
|
||||
file=sys.stderr)
|
||||
print('Wrote %d bounding boxes from %d annotated images.' %
|
||||
(saved_boxes, saved_files),
|
||||
file=sys.stderr)
|
||||
print('Finished.', file=sys.stderr)
|
||||
+129
@@ -0,0 +1,129 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Provides data for Visual WakeWords Dataset with images+labels.
|
||||
|
||||
Visual WakeWords Dataset derives from the COCO dataset to design tiny models
|
||||
classifying two classes, such as person/not-person. The COCO annotations
|
||||
are filtered to two classes: person and not-person (or another user-defined
|
||||
category). Bounding boxes for small objects with area less than 5% of the image
|
||||
area are filtered out.
|
||||
See build_visualwakewords_data.py which generates the Visual WakeWords dataset
|
||||
annotations from the raw COCO dataset and converts them to TFRecord.
|
||||
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_utils
|
||||
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
_FILE_PATTERN = '%s.record-*'
|
||||
|
||||
_SPLITS_TO_SIZES = {
|
||||
'train': 82783,
|
||||
'val': 40504,
|
||||
}
|
||||
|
||||
|
||||
_ITEMS_TO_DESCRIPTIONS = {
|
||||
'image': 'A color image of varying height and width.',
|
||||
'label': 'The label id of the image, an integer in {0, 1}',
|
||||
'object/bbox': 'A list of bounding boxes.',
|
||||
}
|
||||
|
||||
_NUM_CLASSES = 2
|
||||
|
||||
# labels file
|
||||
LABELS_FILENAME = 'labels.txt'
|
||||
|
||||
|
||||
def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
|
||||
"""Gets a dataset tuple with instructions for reading ImageNet.
|
||||
|
||||
Args:
|
||||
split_name: A train/test split name.
|
||||
dataset_dir: The base directory of the dataset sources.
|
||||
file_pattern: The file pattern to use when matching the dataset sources. It
|
||||
is assumed that the pattern contains a '%s' string so that the split name
|
||||
can be inserted.
|
||||
reader: The TensorFlow reader type.
|
||||
|
||||
Returns:
|
||||
A `Dataset` namedtuple.
|
||||
|
||||
Raises:
|
||||
ValueError: if `split_name` is not a valid train/test split.
|
||||
"""
|
||||
if split_name not in _SPLITS_TO_SIZES:
|
||||
raise ValueError('split name %s was not recognized.' % split_name)
|
||||
|
||||
if not file_pattern:
|
||||
file_pattern = _FILE_PATTERN
|
||||
file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
|
||||
|
||||
# Allowing None in the signature so that dataset_factory can use the default.
|
||||
if reader is None:
|
||||
reader = tf.TFRecordReader
|
||||
|
||||
keys_to_features = {
|
||||
'image/encoded':
|
||||
tf.FixedLenFeature((), tf.string, default_value=''),
|
||||
'image/format':
|
||||
tf.FixedLenFeature((), tf.string, default_value='jpeg'),
|
||||
'image/class/label':
|
||||
tf.FixedLenFeature([], dtype=tf.int64, default_value=-1),
|
||||
'image/object/bbox/xmin':
|
||||
tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymin':
|
||||
tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/xmax':
|
||||
tf.VarLenFeature(dtype=tf.float32),
|
||||
'image/object/bbox/ymax':
|
||||
tf.VarLenFeature(dtype=tf.float32),
|
||||
}
|
||||
|
||||
items_to_handlers = {
|
||||
'image':
|
||||
slim.tfexample_decoder.Image('image/encoded', 'image/format'),
|
||||
'label':
|
||||
slim.tfexample_decoder.Tensor('image/class/label'),
|
||||
'object/bbox':
|
||||
slim.tfexample_decoder.BoundingBox(['ymin', 'xmin', 'ymax', 'xmax'],
|
||||
'image/object/bbox/'),
|
||||
}
|
||||
|
||||
decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features,
|
||||
items_to_handlers)
|
||||
|
||||
labels_to_names = None
|
||||
labels_file = os.path.join(dataset_dir, LABELS_FILENAME)
|
||||
if tf.gfile.Exists(labels_file):
|
||||
labels_to_names = dataset_utils.read_label_file(dataset_dir)
|
||||
|
||||
return slim.dataset.Dataset(
|
||||
data_sources=file_pattern,
|
||||
reader=reader,
|
||||
decoder=decoder,
|
||||
num_samples=_SPLITS_TO_SIZES[split_name],
|
||||
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
|
||||
num_classes=_NUM_CLASSES,
|
||||
labels_to_names=labels_to_names)
|
||||
+1
@@ -0,0 +1 @@
|
||||
|
||||
+677
@@ -0,0 +1,677 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Deploy Slim models across multiple clones and replicas.
|
||||
|
||||
# TODO(sguada) docstring paragraph by (a) motivating the need for the file and
|
||||
# (b) defining clones.
|
||||
|
||||
# TODO(sguada) describe the high-level components of model deployment.
|
||||
# E.g. "each model deployment is composed of several parts: a DeploymentConfig,
|
||||
# which captures A, B and C, an input_fn which loads data.. etc
|
||||
|
||||
To easily train a model on multiple GPUs or across multiple machines this
|
||||
module provides a set of helper functions: `create_clones`,
|
||||
`optimize_clones` and `deploy`.
|
||||
|
||||
Usage:
|
||||
|
||||
g = tf.Graph()
|
||||
|
||||
# Set up DeploymentConfig
|
||||
config = model_deploy.DeploymentConfig(num_clones=2, clone_on_cpu=True)
|
||||
|
||||
# Create the global step on the device storing the variables.
|
||||
with tf.device(config.variables_device()):
|
||||
global_step = slim.create_global_step()
|
||||
|
||||
# Define the inputs
|
||||
with tf.device(config.inputs_device()):
|
||||
images, labels = LoadData(...)
|
||||
inputs_queue = slim.data.prefetch_queue((images, labels))
|
||||
|
||||
# Define the optimizer.
|
||||
with tf.device(config.optimizer_device()):
|
||||
optimizer = tf.train.MomentumOptimizer(FLAGS.learning_rate, FLAGS.momentum)
|
||||
|
||||
# Define the model including the loss.
|
||||
def model_fn(inputs_queue):
|
||||
images, labels = inputs_queue.dequeue()
|
||||
predictions = CreateNetwork(images)
|
||||
slim.losses.log_loss(predictions, labels)
|
||||
|
||||
model_dp = model_deploy.deploy(config, model_fn, [inputs_queue],
|
||||
optimizer=optimizer)
|
||||
|
||||
# Run training.
|
||||
slim.learning.train(model_dp.train_op, my_log_dir,
|
||||
summary_op=model_dp.summary_op)
|
||||
|
||||
The Clone namedtuple holds together the values associated with each call to
|
||||
model_fn:
|
||||
* outputs: The return values of the calls to `model_fn()`.
|
||||
* scope: The scope used to create the clone.
|
||||
* device: The device used to create the clone.
|
||||
|
||||
DeployedModel namedtuple, holds together the values needed to train multiple
|
||||
clones:
|
||||
* train_op: An operation that run the optimizer training op and include
|
||||
all the update ops created by `model_fn`. Present only if an optimizer
|
||||
was specified.
|
||||
* summary_op: An operation that run the summaries created by `model_fn`
|
||||
and process_gradients.
|
||||
* total_loss: A `Tensor` that contains the sum of all losses created by
|
||||
`model_fn` plus the regularization losses.
|
||||
* clones: List of `Clone` tuples returned by `create_clones()`.
|
||||
|
||||
DeploymentConfig parameters:
|
||||
* num_clones: Number of model clones to deploy in each replica.
|
||||
* clone_on_cpu: True if clones should be placed on CPU.
|
||||
* replica_id: Integer. Index of the replica for which the model is
|
||||
deployed. Usually 0 for the chief replica.
|
||||
* num_replicas: Number of replicas to use.
|
||||
* num_ps_tasks: Number of tasks for the `ps` job. 0 to not use replicas.
|
||||
* worker_job_name: A name for the worker job.
|
||||
* ps_job_name: A name for the parameter server job.
|
||||
|
||||
TODO(sguada):
|
||||
- describe side effect to the graph.
|
||||
- what happens to summaries and update_ops.
|
||||
- which graph collections are altered.
|
||||
- write a tutorial on how to use this.
|
||||
- analyze the possibility of calling deploy more than once.
|
||||
|
||||
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import collections
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
__all__ = ['create_clones',
|
||||
'deploy',
|
||||
'optimize_clones',
|
||||
'DeployedModel',
|
||||
'DeploymentConfig',
|
||||
'Clone',
|
||||
]
|
||||
|
||||
# Namedtuple used to represent a clone during deployment.
|
||||
Clone = collections.namedtuple('Clone',
|
||||
['outputs', # Whatever model_fn() returned.
|
||||
'scope', # The scope used to create it.
|
||||
'device', # The device used to create.
|
||||
])
|
||||
|
||||
# Namedtuple used to represent a DeployedModel, returned by deploy().
|
||||
DeployedModel = collections.namedtuple('DeployedModel',
|
||||
['train_op', # The `train_op`
|
||||
'summary_op', # The `summary_op`
|
||||
'total_loss', # The loss `Tensor`
|
||||
'clones', # A list of `Clones` tuples.
|
||||
])
|
||||
|
||||
# Default parameters for DeploymentConfig
|
||||
_deployment_params = {'num_clones': 1,
|
||||
'clone_on_cpu': False,
|
||||
'replica_id': 0,
|
||||
'num_replicas': 1,
|
||||
'num_ps_tasks': 0,
|
||||
'worker_job_name': 'worker',
|
||||
'ps_job_name': 'ps'}
|
||||
|
||||
|
||||
def create_clones(config, model_fn, args=None, kwargs=None):
|
||||
"""Creates multiple clones according to config using a `model_fn`.
|
||||
|
||||
The returned values of `model_fn(*args, **kwargs)` are collected along with
|
||||
the scope and device used to created it in a namedtuple
|
||||
`Clone(outputs, scope, device)`
|
||||
|
||||
Note: it is assumed that any loss created by `model_fn` is collected at
|
||||
the tf.GraphKeys.LOSSES collection.
|
||||
|
||||
To recover the losses, summaries or update_ops created by the clone use:
|
||||
```python
|
||||
losses = tf.get_collection(tf.GraphKeys.LOSSES, clone.scope)
|
||||
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, clone.scope)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, clone.scope)
|
||||
```
|
||||
|
||||
The deployment options are specified by the config object and support
|
||||
deploying one or several clones on different GPUs and one or several replicas
|
||||
of such clones.
|
||||
|
||||
The argument `model_fn` is called `config.num_clones` times to create the
|
||||
model clones as `model_fn(*args, **kwargs)`.
|
||||
|
||||
If `config` specifies deployment on multiple replicas then the default
|
||||
tensorflow device is set appropriatly for each call to `model_fn` and for the
|
||||
slim variable creation functions: model and global variables will be created
|
||||
on the `ps` device, the clone operations will be on the `worker` device.
|
||||
|
||||
Args:
|
||||
config: A DeploymentConfig object.
|
||||
model_fn: A callable. Called as `model_fn(*args, **kwargs)`
|
||||
args: Optional list of arguments to pass to `model_fn`.
|
||||
kwargs: Optional list of keyword arguments to pass to `model_fn`.
|
||||
|
||||
Returns:
|
||||
A list of namedtuples `Clone`.
|
||||
"""
|
||||
clones = []
|
||||
args = args or []
|
||||
kwargs = kwargs or {}
|
||||
with slim.arg_scope([slim.model_variable, slim.variable],
|
||||
device=config.variables_device()):
|
||||
# Create clones.
|
||||
for i in range(0, config.num_clones):
|
||||
with tf.name_scope(config.clone_scope(i)) as clone_scope:
|
||||
clone_device = config.clone_device(i)
|
||||
with tf.device(clone_device):
|
||||
with tf.variable_scope(tf.get_variable_scope(),
|
||||
reuse=True if i > 0 else None):
|
||||
outputs = model_fn(*args, **kwargs)
|
||||
clones.append(Clone(outputs, clone_scope, clone_device))
|
||||
return clones
|
||||
|
||||
|
||||
def _gather_clone_loss(clone, num_clones, regularization_losses):
|
||||
"""Gather the loss for a single clone.
|
||||
|
||||
Args:
|
||||
clone: A Clone namedtuple.
|
||||
num_clones: The number of clones being deployed.
|
||||
regularization_losses: Possibly empty list of regularization_losses
|
||||
to add to the clone losses.
|
||||
|
||||
Returns:
|
||||
A tensor for the total loss for the clone. Can be None.
|
||||
"""
|
||||
# The return value.
|
||||
sum_loss = None
|
||||
# Individual components of the loss that will need summaries.
|
||||
clone_loss = None
|
||||
regularization_loss = None
|
||||
# Compute and aggregate losses on the clone device.
|
||||
with tf.device(clone.device):
|
||||
all_losses = []
|
||||
clone_losses = tf.get_collection(tf.GraphKeys.LOSSES, clone.scope)
|
||||
if clone_losses:
|
||||
clone_loss = tf.add_n(clone_losses, name='clone_loss')
|
||||
if num_clones > 1:
|
||||
clone_loss = tf.div(clone_loss, 1.0 * num_clones,
|
||||
name='scaled_clone_loss')
|
||||
all_losses.append(clone_loss)
|
||||
if regularization_losses:
|
||||
regularization_loss = tf.add_n(regularization_losses,
|
||||
name='regularization_loss')
|
||||
all_losses.append(regularization_loss)
|
||||
if all_losses:
|
||||
sum_loss = tf.add_n(all_losses)
|
||||
# Add the summaries out of the clone device block.
|
||||
if clone_loss is not None:
|
||||
tf.summary.scalar('/'.join(filter(None,
|
||||
['Losses', clone.scope, 'clone_loss'])),
|
||||
clone_loss)
|
||||
if regularization_loss is not None:
|
||||
tf.summary.scalar('Losses/regularization_loss', regularization_loss)
|
||||
return sum_loss
|
||||
|
||||
|
||||
def _optimize_clone(optimizer, clone, num_clones, regularization_losses,
|
||||
**kwargs):
|
||||
"""Compute losses and gradients for a single clone.
|
||||
|
||||
Args:
|
||||
optimizer: A tf.Optimizer object.
|
||||
clone: A Clone namedtuple.
|
||||
num_clones: The number of clones being deployed.
|
||||
regularization_losses: Possibly empty list of regularization_losses
|
||||
to add to the clone losses.
|
||||
**kwargs: Dict of kwarg to pass to compute_gradients().
|
||||
|
||||
Returns:
|
||||
A tuple (clone_loss, clone_grads_and_vars).
|
||||
- clone_loss: A tensor for the total loss for the clone. Can be None.
|
||||
- clone_grads_and_vars: List of (gradient, variable) for the clone.
|
||||
Can be empty.
|
||||
"""
|
||||
sum_loss = _gather_clone_loss(clone, num_clones, regularization_losses)
|
||||
clone_grad = None
|
||||
if sum_loss is not None:
|
||||
# with tf.device(clone.device):
|
||||
# clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
|
||||
clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
|
||||
return sum_loss, clone_grad
|
||||
|
||||
|
||||
def optimize_clones(clones, optimizer,
|
||||
regularization_losses=None,
|
||||
**kwargs):
|
||||
"""Compute clone losses and gradients for the given list of `Clones`.
|
||||
|
||||
Note: The regularization_losses are added to the first clone losses.
|
||||
|
||||
Args:
|
||||
clones: List of `Clones` created by `create_clones()`.
|
||||
optimizer: An `Optimizer` object.
|
||||
regularization_losses: Optional list of regularization losses. If None it
|
||||
will gather them from tf.GraphKeys.REGULARIZATION_LOSSES. Pass `[]` to
|
||||
exclude them.
|
||||
**kwargs: Optional list of keyword arguments to pass to `compute_gradients`.
|
||||
|
||||
Returns:
|
||||
A tuple (total_loss, grads_and_vars).
|
||||
- total_loss: A Tensor containing the average of the clone losses including
|
||||
the regularization loss.
|
||||
- grads_and_vars: A List of tuples (gradient, variable) containing the sum
|
||||
of the gradients for each variable.
|
||||
|
||||
"""
|
||||
grads_and_vars = []
|
||||
clones_losses = []
|
||||
num_clones = len(clones)
|
||||
if regularization_losses is None:
|
||||
regularization_losses = tf.get_collection(
|
||||
tf.GraphKeys.REGULARIZATION_LOSSES)
|
||||
for clone in clones:
|
||||
with tf.name_scope(clone.scope):
|
||||
clone_loss, clone_grad = _optimize_clone(
|
||||
optimizer, clone, num_clones, regularization_losses, **kwargs)
|
||||
if clone_loss is not None:
|
||||
clones_losses.append(clone_loss)
|
||||
grads_and_vars.append(clone_grad)
|
||||
# Only use regularization_losses for the first clone
|
||||
regularization_losses = None
|
||||
# Compute the total_loss summing all the clones_losses.
|
||||
total_loss = tf.add_n(clones_losses, name='total_loss')
|
||||
# Sum the gradients across clones.
|
||||
grads_and_vars = _sum_clones_gradients(grads_and_vars)
|
||||
return total_loss, grads_and_vars
|
||||
|
||||
|
||||
def deploy(config,
|
||||
model_fn,
|
||||
args=None,
|
||||
kwargs=None,
|
||||
optimizer=None,
|
||||
summarize_gradients=False):
|
||||
"""Deploys a Slim-constructed model across multiple clones.
|
||||
|
||||
The deployment options are specified by the config object and support
|
||||
deploying one or several clones on different GPUs and one or several replicas
|
||||
of such clones.
|
||||
|
||||
The argument `model_fn` is called `config.num_clones` times to create the
|
||||
model clones as `model_fn(*args, **kwargs)`.
|
||||
|
||||
The optional argument `optimizer` is an `Optimizer` object. If not `None`,
|
||||
the deployed model is configured for training with that optimizer.
|
||||
|
||||
If `config` specifies deployment on multiple replicas then the default
|
||||
tensorflow device is set appropriatly for each call to `model_fn` and for the
|
||||
slim variable creation functions: model and global variables will be created
|
||||
on the `ps` device, the clone operations will be on the `worker` device.
|
||||
|
||||
Args:
|
||||
config: A `DeploymentConfig` object.
|
||||
model_fn: A callable. Called as `model_fn(*args, **kwargs)`
|
||||
args: Optional list of arguments to pass to `model_fn`.
|
||||
kwargs: Optional list of keyword arguments to pass to `model_fn`.
|
||||
optimizer: Optional `Optimizer` object. If passed the model is deployed
|
||||
for training with that optimizer.
|
||||
summarize_gradients: Whether or not add summaries to the gradients.
|
||||
|
||||
Returns:
|
||||
A `DeployedModel` namedtuple.
|
||||
|
||||
"""
|
||||
# Gather initial summaries.
|
||||
summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
|
||||
|
||||
# Create Clones.
|
||||
clones = create_clones(config, model_fn, args, kwargs)
|
||||
first_clone = clones[0]
|
||||
|
||||
# Gather update_ops from the first clone. These contain, for example,
|
||||
# the updates for the batch_norm variables created by model_fn.
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone.scope)
|
||||
|
||||
train_op = None
|
||||
total_loss = None
|
||||
with tf.device(config.optimizer_device()):
|
||||
if optimizer:
|
||||
# Place the global step on the device storing the variables.
|
||||
with tf.device(config.variables_device()):
|
||||
global_step = slim.get_or_create_global_step()
|
||||
|
||||
# Compute the gradients for the clones.
|
||||
total_loss, clones_gradients = optimize_clones(clones, optimizer)
|
||||
|
||||
if clones_gradients:
|
||||
if summarize_gradients:
|
||||
# Add summaries to the gradients.
|
||||
summaries |= set(_add_gradients_summaries(clones_gradients))
|
||||
|
||||
# Create gradient updates.
|
||||
grad_updates = optimizer.apply_gradients(clones_gradients,
|
||||
global_step=global_step)
|
||||
update_ops.append(grad_updates)
|
||||
|
||||
update_op = tf.group(*update_ops)
|
||||
with tf.control_dependencies([update_op]):
|
||||
train_op = tf.identity(total_loss, name='train_op')
|
||||
else:
|
||||
clones_losses = []
|
||||
regularization_losses = tf.get_collection(
|
||||
tf.GraphKeys.REGULARIZATION_LOSSES)
|
||||
for clone in clones:
|
||||
with tf.name_scope(clone.scope):
|
||||
clone_loss = _gather_clone_loss(clone, len(clones),
|
||||
regularization_losses)
|
||||
if clone_loss is not None:
|
||||
clones_losses.append(clone_loss)
|
||||
# Only use regularization_losses for the first clone
|
||||
regularization_losses = None
|
||||
if clones_losses:
|
||||
total_loss = tf.add_n(clones_losses, name='total_loss')
|
||||
|
||||
# Add the summaries from the first clone. These contain the summaries
|
||||
# created by model_fn and either optimize_clones() or _gather_clone_loss().
|
||||
summaries |= set(tf.get_collection(tf.GraphKeys.SUMMARIES,
|
||||
first_clone.scope))
|
||||
|
||||
if total_loss is not None:
|
||||
# Add total_loss to summary.
|
||||
summaries.add(tf.summary.scalar('total_loss', total_loss))
|
||||
|
||||
if summaries:
|
||||
# Merge all summaries together.
|
||||
summary_op = tf.summary.merge(list(summaries), name='summary_op')
|
||||
else:
|
||||
summary_op = None
|
||||
|
||||
return DeployedModel(train_op, summary_op, total_loss, clones)
|
||||
|
||||
|
||||
def _sum_clones_gradients(clone_grads):
|
||||
"""Calculate the sum gradient for each shared variable across all clones.
|
||||
|
||||
This function assumes that the clone_grads has been scaled appropriately by
|
||||
1 / num_clones.
|
||||
|
||||
Args:
|
||||
clone_grads: A List of List of tuples (gradient, variable), one list per
|
||||
`Clone`.
|
||||
|
||||
Returns:
|
||||
List of tuples of (gradient, variable) where the gradient has been summed
|
||||
across all clones.
|
||||
"""
|
||||
sum_grads = []
|
||||
for grad_and_vars in zip(*clone_grads):
|
||||
# Note that each grad_and_vars looks like the following:
|
||||
# ((grad_var0_clone0, var0), ... (grad_varN_cloneN, varN))
|
||||
grads = []
|
||||
var = grad_and_vars[0][1]
|
||||
for g, v in grad_and_vars:
|
||||
assert v == var
|
||||
if g is not None:
|
||||
grads.append(g)
|
||||
if grads:
|
||||
if len(grads) > 1:
|
||||
sum_grad = tf.add_n(grads, name=var.op.name + '/sum_grads')
|
||||
else:
|
||||
sum_grad = grads[0]
|
||||
sum_grads.append((sum_grad, var))
|
||||
return sum_grads
|
||||
|
||||
|
||||
def _add_gradients_summaries(grads_and_vars):
|
||||
"""Add histogram summaries to gradients.
|
||||
|
||||
Note: The summaries are also added to the SUMMARIES collection.
|
||||
|
||||
Args:
|
||||
grads_and_vars: A list of gradient to variable pairs (tuples).
|
||||
|
||||
Returns:
|
||||
The _list_ of the added summaries for grads_and_vars.
|
||||
"""
|
||||
summaries = []
|
||||
for grad, var in grads_and_vars:
|
||||
if grad is not None:
|
||||
if isinstance(grad, tf.IndexedSlices):
|
||||
grad_values = grad.values
|
||||
else:
|
||||
grad_values = grad
|
||||
summaries.append(tf.summary.histogram(var.op.name + ':gradient',
|
||||
grad_values))
|
||||
summaries.append(tf.summary.histogram(var.op.name + ':gradient_norm',
|
||||
tf.global_norm([grad_values])))
|
||||
else:
|
||||
tf.logging.info('Var %s has no gradient', var.op.name)
|
||||
return summaries
|
||||
|
||||
|
||||
class DeploymentConfig(object):
|
||||
"""Configuration for deploying a model with `deploy()`.
|
||||
|
||||
You can pass an instance of this class to `deploy()` to specify exactly
|
||||
how to deploy the model to build. If you do not pass one, an instance built
|
||||
from the default deployment_hparams will be used.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
num_clones=1,
|
||||
clone_on_cpu=False,
|
||||
replica_id=0,
|
||||
num_replicas=1,
|
||||
num_ps_tasks=0,
|
||||
worker_job_name='worker',
|
||||
ps_job_name='ps'):
|
||||
"""Create a DeploymentConfig.
|
||||
|
||||
The config describes how to deploy a model across multiple clones and
|
||||
replicas. The model will be replicated `num_clones` times in each replica.
|
||||
If `clone_on_cpu` is True, each clone will placed on CPU.
|
||||
|
||||
If `num_replicas` is 1, the model is deployed via a single process. In that
|
||||
case `worker_device`, `num_ps_tasks`, and `ps_device` are ignored.
|
||||
|
||||
If `num_replicas` is greater than 1, then `worker_device` and `ps_device`
|
||||
must specify TensorFlow devices for the `worker` and `ps` jobs and
|
||||
`num_ps_tasks` must be positive.
|
||||
|
||||
Args:
|
||||
num_clones: Number of model clones to deploy in each replica.
|
||||
clone_on_cpu: If True clones would be placed on CPU.
|
||||
replica_id: Integer. Index of the replica for which the model is
|
||||
deployed. Usually 0 for the chief replica.
|
||||
num_replicas: Number of replicas to use.
|
||||
num_ps_tasks: Number of tasks for the `ps` job. 0 to not use replicas.
|
||||
worker_job_name: A name for the worker job.
|
||||
ps_job_name: A name for the parameter server job.
|
||||
|
||||
Raises:
|
||||
ValueError: If the arguments are invalid.
|
||||
"""
|
||||
if num_replicas > 1:
|
||||
if num_ps_tasks < 1:
|
||||
raise ValueError('When using replicas num_ps_tasks must be positive')
|
||||
if num_replicas > 1 or num_ps_tasks > 0:
|
||||
if not worker_job_name:
|
||||
raise ValueError('Must specify worker_job_name when using replicas')
|
||||
if not ps_job_name:
|
||||
raise ValueError('Must specify ps_job_name when using parameter server')
|
||||
if replica_id >= num_replicas:
|
||||
raise ValueError('replica_id must be less than num_replicas')
|
||||
self._num_clones = num_clones
|
||||
self._clone_on_cpu = clone_on_cpu
|
||||
self._replica_id = replica_id
|
||||
self._num_replicas = num_replicas
|
||||
self._num_ps_tasks = num_ps_tasks
|
||||
self._ps_device = '/job:' + ps_job_name if num_ps_tasks > 0 else ''
|
||||
self._worker_device = '/job:' + worker_job_name if num_ps_tasks > 0 else ''
|
||||
|
||||
@property
|
||||
def num_clones(self):
|
||||
return self._num_clones
|
||||
|
||||
@property
|
||||
def clone_on_cpu(self):
|
||||
return self._clone_on_cpu
|
||||
|
||||
@property
|
||||
def replica_id(self):
|
||||
return self._replica_id
|
||||
|
||||
@property
|
||||
def num_replicas(self):
|
||||
return self._num_replicas
|
||||
|
||||
@property
|
||||
def num_ps_tasks(self):
|
||||
return self._num_ps_tasks
|
||||
|
||||
@property
|
||||
def ps_device(self):
|
||||
return self._ps_device
|
||||
|
||||
@property
|
||||
def worker_device(self):
|
||||
return self._worker_device
|
||||
|
||||
def caching_device(self):
|
||||
"""Returns the device to use for caching variables.
|
||||
|
||||
Variables are cached on the worker CPU when using replicas.
|
||||
|
||||
Returns:
|
||||
A device string or None if the variables do not need to be cached.
|
||||
"""
|
||||
if self._num_ps_tasks > 0:
|
||||
return lambda op: op.device
|
||||
else:
|
||||
return None
|
||||
|
||||
def clone_device(self, clone_index):
|
||||
"""Device used to create the clone and all the ops inside the clone.
|
||||
|
||||
Args:
|
||||
clone_index: Int, representing the clone_index.
|
||||
|
||||
Returns:
|
||||
A value suitable for `tf.device()`.
|
||||
|
||||
Raises:
|
||||
ValueError: if `clone_index` is greater or equal to the number of clones".
|
||||
"""
|
||||
if clone_index >= self._num_clones:
|
||||
raise ValueError('clone_index must be less than num_clones')
|
||||
device = ''
|
||||
if self._num_ps_tasks > 0:
|
||||
device += self._worker_device
|
||||
if self._clone_on_cpu:
|
||||
device += '/device:CPU:0'
|
||||
else:
|
||||
device += '/device:GPU:%d' % clone_index
|
||||
return device
|
||||
|
||||
def clone_scope(self, clone_index):
|
||||
"""Name scope to create the clone.
|
||||
|
||||
Args:
|
||||
clone_index: Int, representing the clone_index.
|
||||
|
||||
Returns:
|
||||
A name_scope suitable for `tf.name_scope()`.
|
||||
|
||||
Raises:
|
||||
ValueError: if `clone_index` is greater or equal to the number of clones".
|
||||
"""
|
||||
if clone_index >= self._num_clones:
|
||||
raise ValueError('clone_index must be less than num_clones')
|
||||
scope = ''
|
||||
if self._num_clones > 1:
|
||||
scope = 'clone_%d' % clone_index
|
||||
return scope
|
||||
|
||||
def optimizer_device(self):
|
||||
"""Device to use with the optimizer.
|
||||
|
||||
Returns:
|
||||
A value suitable for `tf.device()`.
|
||||
"""
|
||||
if self._num_ps_tasks > 0 or self._num_clones > 0:
|
||||
return self._worker_device + '/device:CPU:0'
|
||||
else:
|
||||
return ''
|
||||
|
||||
def inputs_device(self):
|
||||
"""Device to use to build the inputs.
|
||||
|
||||
Returns:
|
||||
A value suitable for `tf.device()`.
|
||||
"""
|
||||
device = ''
|
||||
if self._num_ps_tasks > 0:
|
||||
device += self._worker_device
|
||||
device += '/device:CPU:0'
|
||||
return device
|
||||
|
||||
def variables_device(self):
|
||||
"""Returns the device to use for variables created inside the clone.
|
||||
|
||||
Returns:
|
||||
A value suitable for `tf.device()`.
|
||||
"""
|
||||
device = ''
|
||||
if self._num_ps_tasks > 0:
|
||||
device += self._ps_device
|
||||
device += '/device:CPU:0'
|
||||
|
||||
class _PSDeviceChooser(object):
|
||||
"""Slim device chooser for variables when using PS."""
|
||||
|
||||
def __init__(self, device, tasks):
|
||||
self._device = device
|
||||
self._tasks = tasks
|
||||
self._task = 0
|
||||
|
||||
def choose(self, op):
|
||||
if op.device:
|
||||
return op.device
|
||||
node_def = op if isinstance(op, tf.NodeDef) else op.node_def
|
||||
if node_def.op.startswith('Variable'):
|
||||
t = self._task
|
||||
self._task = (self._task + 1) % self._tasks
|
||||
d = '%s/task:%d' % (self._device, t)
|
||||
return d
|
||||
else:
|
||||
return op.device
|
||||
|
||||
if not self._num_ps_tasks:
|
||||
return device
|
||||
else:
|
||||
chooser = _PSDeviceChooser(device, self._num_ps_tasks)
|
||||
return chooser.choose
|
||||
+574
@@ -0,0 +1,574 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for model_deploy."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from deployment import model_deploy
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class DeploymentConfigTest(tf.test.TestCase):
|
||||
|
||||
def testDefaults(self):
|
||||
deploy_config = model_deploy.DeploymentConfig()
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
self.assertEqual(deploy_config.caching_device(), None)
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0), 'GPU:0')
|
||||
self.assertEqual(deploy_config.clone_scope(0), '')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.variables_device(), 'CPU:0')
|
||||
|
||||
def testCPUonly(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(clone_on_cpu=True)
|
||||
|
||||
self.assertEqual(deploy_config.caching_device(), None)
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0), 'CPU:0')
|
||||
self.assertEqual(deploy_config.clone_scope(0), '')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.variables_device(), 'CPU:0')
|
||||
|
||||
def testMultiGPU(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2)
|
||||
|
||||
self.assertEqual(deploy_config.caching_device(), None)
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0), 'GPU:0')
|
||||
self.assertDeviceEqual(deploy_config.clone_device(1), 'GPU:1')
|
||||
self.assertEqual(deploy_config.clone_scope(0), 'clone_0')
|
||||
self.assertEqual(deploy_config.clone_scope(1), 'clone_1')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(), 'CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.variables_device(), 'CPU:0')
|
||||
|
||||
def testPS(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1, num_ps_tasks=1)
|
||||
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0),
|
||||
'/job:worker/device:GPU:0')
|
||||
self.assertEqual(deploy_config.clone_scope(0), '')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
with tf.device(deploy_config.variables_device()):
|
||||
a = tf.Variable(0)
|
||||
b = tf.Variable(0)
|
||||
c = tf.no_op()
|
||||
d = slim.variable('a', [],
|
||||
caching_device=deploy_config.caching_device())
|
||||
self.assertDeviceEqual(a.device, '/job:ps/task:0/device:CPU:0')
|
||||
self.assertDeviceEqual(a.device, a.value().device)
|
||||
self.assertDeviceEqual(b.device, '/job:ps/task:0/device:CPU:0')
|
||||
self.assertDeviceEqual(b.device, b.value().device)
|
||||
self.assertDeviceEqual(c.device, '')
|
||||
self.assertDeviceEqual(d.device, '/job:ps/task:0/device:CPU:0')
|
||||
self.assertDeviceEqual(d.value().device, '')
|
||||
|
||||
def testMultiGPUPS(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2, num_ps_tasks=1)
|
||||
|
||||
self.assertEqual(deploy_config.caching_device()(tf.no_op()), '')
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0),
|
||||
'/job:worker/device:GPU:0')
|
||||
self.assertDeviceEqual(deploy_config.clone_device(1),
|
||||
'/job:worker/device:GPU:1')
|
||||
self.assertEqual(deploy_config.clone_scope(0), 'clone_0')
|
||||
self.assertEqual(deploy_config.clone_scope(1), 'clone_1')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
|
||||
def testReplicasPS(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_replicas=2,
|
||||
num_ps_tasks=2)
|
||||
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0),
|
||||
'/job:worker/device:GPU:0')
|
||||
self.assertEqual(deploy_config.clone_scope(0), '')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
|
||||
def testReplicasMultiGPUPS(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_replicas=2,
|
||||
num_clones=2,
|
||||
num_ps_tasks=2)
|
||||
self.assertDeviceEqual(deploy_config.clone_device(0),
|
||||
'/job:worker/device:GPU:0')
|
||||
self.assertDeviceEqual(deploy_config.clone_device(1),
|
||||
'/job:worker/device:GPU:1')
|
||||
self.assertEqual(deploy_config.clone_scope(0), 'clone_0')
|
||||
self.assertEqual(deploy_config.clone_scope(1), 'clone_1')
|
||||
self.assertDeviceEqual(deploy_config.optimizer_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
self.assertDeviceEqual(deploy_config.inputs_device(),
|
||||
'/job:worker/device:CPU:0')
|
||||
|
||||
def testVariablesPS(self):
|
||||
deploy_config = model_deploy.DeploymentConfig(num_ps_tasks=2)
|
||||
|
||||
with tf.device(deploy_config.variables_device()):
|
||||
a = tf.Variable(0)
|
||||
b = tf.Variable(0)
|
||||
c = tf.no_op()
|
||||
d = slim.variable('a', [],
|
||||
caching_device=deploy_config.caching_device())
|
||||
|
||||
self.assertDeviceEqual(a.device, '/job:ps/task:0/device:CPU:0')
|
||||
self.assertDeviceEqual(a.device, a.value().device)
|
||||
self.assertDeviceEqual(b.device, '/job:ps/task:1/device:CPU:0')
|
||||
self.assertDeviceEqual(b.device, b.value().device)
|
||||
self.assertDeviceEqual(c.device, '')
|
||||
self.assertDeviceEqual(d.device, '/job:ps/task:0/device:CPU:0')
|
||||
self.assertDeviceEqual(d.value().device, '')
|
||||
|
||||
|
||||
def LogisticClassifier(inputs, labels, scope=None, reuse=None):
|
||||
with tf.variable_scope(scope, 'LogisticClassifier', [inputs, labels],
|
||||
reuse=reuse):
|
||||
predictions = slim.fully_connected(inputs, 1, activation_fn=tf.sigmoid,
|
||||
scope='fully_connected')
|
||||
slim.losses.log_loss(predictions, labels)
|
||||
return predictions
|
||||
|
||||
|
||||
def BatchNormClassifier(inputs, labels, scope=None, reuse=None):
|
||||
with tf.variable_scope(scope, 'BatchNormClassifier', [inputs, labels],
|
||||
reuse=reuse):
|
||||
inputs = slim.batch_norm(inputs, decay=0.1, fused=True)
|
||||
predictions = slim.fully_connected(inputs, 1,
|
||||
activation_fn=tf.sigmoid,
|
||||
scope='fully_connected')
|
||||
slim.losses.log_loss(predictions, labels)
|
||||
return predictions
|
||||
|
||||
|
||||
class CreatecloneTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
# Create an easy training set:
|
||||
np.random.seed(0)
|
||||
|
||||
self._inputs = np.zeros((16, 4))
|
||||
self._labels = np.random.randint(0, 2, size=(16, 1)).astype(np.float32)
|
||||
self._logdir = self.get_temp_dir()
|
||||
|
||||
for i in range(16):
|
||||
j = int(2 * self._labels[i] + np.random.randint(0, 2))
|
||||
self._inputs[i, j] = 1
|
||||
|
||||
def testCreateLogisticClassifier(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = LogisticClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
clone = clones[0]
|
||||
self.assertEqual(len(slim.get_variables()), 2)
|
||||
for v in slim.get_variables():
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
self.assertDeviceEqual(v.value().device, 'CPU:0')
|
||||
self.assertEqual(clone.outputs.op.name,
|
||||
'LogisticClassifier/fully_connected/Sigmoid')
|
||||
self.assertEqual(clone.scope, '')
|
||||
self.assertDeviceEqual(clone.device, 'GPU:0')
|
||||
self.assertEqual(len(slim.losses.get_losses()), 1)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(update_ops, [])
|
||||
|
||||
def testCreateSingleclone(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
clone = clones[0]
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
for v in slim.get_variables():
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
self.assertDeviceEqual(v.value().device, 'CPU:0')
|
||||
self.assertEqual(clone.outputs.op.name,
|
||||
'BatchNormClassifier/fully_connected/Sigmoid')
|
||||
self.assertEqual(clone.scope, '')
|
||||
self.assertDeviceEqual(clone.device, 'GPU:0')
|
||||
self.assertEqual(len(slim.losses.get_losses()), 1)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), 2)
|
||||
|
||||
def testCreateMulticlone(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
num_clones = 4
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=num_clones)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
for v in slim.get_variables():
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
self.assertDeviceEqual(v.value().device, 'CPU:0')
|
||||
self.assertEqual(len(clones), num_clones)
|
||||
for i, clone in enumerate(clones):
|
||||
self.assertEqual(
|
||||
clone.outputs.op.name,
|
||||
'clone_%d/BatchNormClassifier/fully_connected/Sigmoid' % i)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, clone.scope)
|
||||
self.assertEqual(len(update_ops), 2)
|
||||
self.assertEqual(clone.scope, 'clone_%d/' % i)
|
||||
self.assertDeviceEqual(clone.device, 'GPU:%d' % i)
|
||||
|
||||
def testCreateOnecloneWithPS(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1,
|
||||
num_ps_tasks=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(clones), 1)
|
||||
clone = clones[0]
|
||||
self.assertEqual(clone.outputs.op.name,
|
||||
'BatchNormClassifier/fully_connected/Sigmoid')
|
||||
self.assertDeviceEqual(clone.device, '/job:worker/device:GPU:0')
|
||||
self.assertEqual(clone.scope, '')
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
for v in slim.get_variables():
|
||||
self.assertDeviceEqual(v.device, '/job:ps/task:0/CPU:0')
|
||||
self.assertDeviceEqual(v.device, v.value().device)
|
||||
|
||||
def testCreateMulticloneWithPS(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2,
|
||||
num_ps_tasks=2)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
for i, v in enumerate(slim.get_variables()):
|
||||
t = i % 2
|
||||
self.assertDeviceEqual(v.device, '/job:ps/task:%d/device:CPU:0' % t)
|
||||
self.assertDeviceEqual(v.device, v.value().device)
|
||||
self.assertEqual(len(clones), 2)
|
||||
for i, clone in enumerate(clones):
|
||||
self.assertEqual(
|
||||
clone.outputs.op.name,
|
||||
'clone_%d/BatchNormClassifier/fully_connected/Sigmoid' % i)
|
||||
self.assertEqual(clone.scope, 'clone_%d/' % i)
|
||||
self.assertDeviceEqual(clone.device, '/job:worker/device:GPU:%d' % i)
|
||||
|
||||
|
||||
class OptimizeclonesTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
# Create an easy training set:
|
||||
np.random.seed(0)
|
||||
|
||||
self._inputs = np.zeros((16, 4))
|
||||
self._labels = np.random.randint(0, 2, size=(16, 1)).astype(np.float32)
|
||||
self._logdir = self.get_temp_dir()
|
||||
|
||||
for i in range(16):
|
||||
j = int(2 * self._labels[i] + np.random.randint(0, 2))
|
||||
self._inputs[i, j] = 1
|
||||
|
||||
def testCreateLogisticClassifier(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = LogisticClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(slim.get_variables()), 2)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(update_ops, [])
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
total_loss, grads_and_vars = model_deploy.optimize_clones(clones,
|
||||
optimizer)
|
||||
self.assertEqual(len(grads_and_vars), len(tf.trainable_variables()))
|
||||
self.assertEqual(total_loss.op.name, 'total_loss')
|
||||
for g, v in grads_and_vars:
|
||||
self.assertDeviceEqual(g.device, 'GPU:0')
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
|
||||
def testCreateSingleclone(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), 2)
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
total_loss, grads_and_vars = model_deploy.optimize_clones(clones,
|
||||
optimizer)
|
||||
self.assertEqual(len(grads_and_vars), len(tf.trainable_variables()))
|
||||
self.assertEqual(total_loss.op.name, 'total_loss')
|
||||
for g, v in grads_and_vars:
|
||||
self.assertDeviceEqual(g.device, 'GPU:0')
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
|
||||
def testCreateMulticlone(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
clone_args = (tf_inputs, tf_labels)
|
||||
num_clones = 4
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=num_clones)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, clone_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), num_clones * 2)
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
total_loss, grads_and_vars = model_deploy.optimize_clones(clones,
|
||||
optimizer)
|
||||
self.assertEqual(len(grads_and_vars), len(tf.trainable_variables()))
|
||||
self.assertEqual(total_loss.op.name, 'total_loss')
|
||||
for g, v in grads_and_vars:
|
||||
self.assertDeviceEqual(g.device, '')
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
|
||||
def testCreateMulticloneCPU(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
model_args = (tf_inputs, tf_labels)
|
||||
num_clones = 4
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=num_clones,
|
||||
clone_on_cpu=True)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, model_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), num_clones * 2)
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
total_loss, grads_and_vars = model_deploy.optimize_clones(clones,
|
||||
optimizer)
|
||||
self.assertEqual(len(grads_and_vars), len(tf.trainable_variables()))
|
||||
self.assertEqual(total_loss.op.name, 'total_loss')
|
||||
for g, v in grads_and_vars:
|
||||
self.assertDeviceEqual(g.device, '')
|
||||
self.assertDeviceEqual(v.device, 'CPU:0')
|
||||
|
||||
def testCreateOnecloneWithPS(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
model_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=1,
|
||||
num_ps_tasks=1)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
clones = model_deploy.create_clones(deploy_config, model_fn, model_args)
|
||||
self.assertEqual(len(slim.get_variables()), 5)
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), 2)
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
total_loss, grads_and_vars = model_deploy.optimize_clones(clones,
|
||||
optimizer)
|
||||
self.assertEqual(len(grads_and_vars), len(tf.trainable_variables()))
|
||||
self.assertEqual(total_loss.op.name, 'total_loss')
|
||||
for g, v in grads_and_vars:
|
||||
self.assertDeviceEqual(g.device, '/job:worker/device:GPU:0')
|
||||
self.assertDeviceEqual(v.device, '/job:ps/task:0/CPU:0')
|
||||
|
||||
|
||||
class DeployTest(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
# Create an easy training set:
|
||||
np.random.seed(0)
|
||||
|
||||
self._inputs = np.zeros((16, 4))
|
||||
self._labels = np.random.randint(0, 2, size=(16, 1)).astype(np.float32)
|
||||
self._logdir = self.get_temp_dir()
|
||||
|
||||
for i in range(16):
|
||||
j = int(2 * self._labels[i] + np.random.randint(0, 2))
|
||||
self._inputs[i, j] = 1
|
||||
|
||||
def _addBesselsCorrection(self, sample_size, expected_var):
|
||||
correction_factor = sample_size / (sample_size - 1)
|
||||
expected_var *= correction_factor
|
||||
return expected_var
|
||||
|
||||
def testLocalTrainOp(self):
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
tf.set_random_seed(0)
|
||||
tf_inputs = tf.constant(self._inputs, dtype=tf.float32)
|
||||
tf_labels = tf.constant(self._labels, dtype=tf.float32)
|
||||
|
||||
model_fn = BatchNormClassifier
|
||||
model_args = (tf_inputs, tf_labels)
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2,
|
||||
clone_on_cpu=True)
|
||||
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
|
||||
|
||||
self.assertEqual(slim.get_variables(), [])
|
||||
model = model_deploy.deploy(deploy_config, model_fn, model_args,
|
||||
optimizer=optimizer)
|
||||
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
|
||||
self.assertEqual(len(update_ops), 4)
|
||||
self.assertEqual(len(model.clones), 2)
|
||||
self.assertEqual(model.total_loss.op.name, 'total_loss')
|
||||
self.assertEqual(model.summary_op.op.name, 'summary_op/summary_op')
|
||||
self.assertEqual(model.train_op.op.name, 'train_op')
|
||||
|
||||
with tf.Session() as sess:
|
||||
sess.run(tf.global_variables_initializer())
|
||||
moving_mean = contrib_framework.get_variables_by_name('moving_mean')[0]
|
||||
moving_variance = contrib_framework.get_variables_by_name(
|
||||
'moving_variance')[0]
|
||||
initial_loss = sess.run(model.total_loss)
|
||||
initial_mean, initial_variance = sess.run([moving_mean,
|
||||
moving_variance])
|
||||
self.assertAllClose(initial_mean, [0.0, 0.0, 0.0, 0.0])
|
||||
self.assertAllClose(initial_variance, [1.0, 1.0, 1.0, 1.0])
|
||||
for _ in range(10):
|
||||
sess.run(model.train_op)
|
||||
final_loss = sess.run(model.total_loss)
|
||||
self.assertLess(final_loss, initial_loss / 5.0)
|
||||
|
||||
final_mean, final_variance = sess.run([moving_mean,
|
||||
moving_variance])
|
||||
expected_mean = np.array([0.125, 0.25, 0.375, 0.25])
|
||||
expected_var = np.array([0.109375, 0.1875, 0.234375, 0.1875])
|
||||
expected_var = self._addBesselsCorrection(16, expected_var)
|
||||
self.assertAllClose(final_mean, expected_mean)
|
||||
self.assertAllClose(final_variance, expected_var)
|
||||
|
||||
def testNoSummariesOnGPU(self):
|
||||
with tf.Graph().as_default():
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2)
|
||||
|
||||
# clone function creates a fully_connected layer with a regularizer loss.
|
||||
def ModelFn():
|
||||
inputs = tf.constant(1.0, shape=(10, 20), dtype=tf.float32)
|
||||
reg = contrib_layers.l2_regularizer(0.001)
|
||||
contrib_layers.fully_connected(inputs, 30, weights_regularizer=reg)
|
||||
|
||||
model = model_deploy.deploy(
|
||||
deploy_config, ModelFn,
|
||||
optimizer=tf.train.GradientDescentOptimizer(1.0))
|
||||
# The model summary op should have a few summary inputs and all of them
|
||||
# should be on the CPU.
|
||||
self.assertTrue(model.summary_op.op.inputs)
|
||||
for inp in model.summary_op.op.inputs:
|
||||
self.assertEqual('/device:CPU:0', inp.device)
|
||||
|
||||
def testNoSummariesOnGPUForEvals(self):
|
||||
with tf.Graph().as_default():
|
||||
deploy_config = model_deploy.DeploymentConfig(num_clones=2)
|
||||
|
||||
# clone function creates a fully_connected layer with a regularizer loss.
|
||||
def ModelFn():
|
||||
inputs = tf.constant(1.0, shape=(10, 20), dtype=tf.float32)
|
||||
reg = contrib_layers.l2_regularizer(0.001)
|
||||
contrib_layers.fully_connected(inputs, 30, weights_regularizer=reg)
|
||||
|
||||
# No optimizer here, it's an eval.
|
||||
model = model_deploy.deploy(deploy_config, ModelFn)
|
||||
# The model summary op should have a few summary inputs and all of them
|
||||
# should be on the CPU.
|
||||
self.assertTrue(model.summary_op.op.inputs)
|
||||
for inp in model.summary_op.op.inputs:
|
||||
self.assertEqual('/device:CPU:0', inp.device)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+94
@@ -0,0 +1,94 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Downloads and converts a particular dataset.
|
||||
|
||||
Usage:
|
||||
```shell
|
||||
|
||||
$ python download_and_convert_data.py \
|
||||
--dataset_name=flowers \
|
||||
--dataset_dir=/tmp/flowers
|
||||
|
||||
$ python download_and_convert_data.py \
|
||||
--dataset_name=cifar10 \
|
||||
--dataset_dir=/tmp/cifar10
|
||||
|
||||
$ python download_and_convert_data.py \
|
||||
--dataset_name=mnist \
|
||||
--dataset_dir=/tmp/mnist
|
||||
|
||||
$ python download_and_convert_data.py \
|
||||
--dataset_name=visualwakewords \
|
||||
--dataset_dir=/tmp/visualwakewords
|
||||
|
||||
```
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from datasets import download_and_convert_cifar10
|
||||
from datasets import download_and_convert_flowers
|
||||
from datasets import download_and_convert_mnist
|
||||
from datasets import download_and_convert_visualwakewords
|
||||
|
||||
FLAGS = tf.compat.v1.app.flags.FLAGS
|
||||
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'dataset_name',
|
||||
None,
|
||||
'The name of the dataset to convert, one of "flowers", "cifar10", "mnist", "visualwakewords"'
|
||||
)
|
||||
|
||||
tf.compat.v1.app.flags.DEFINE_string(
|
||||
'dataset_dir',
|
||||
None,
|
||||
'The directory where the output TFRecords and temporary files are saved.')
|
||||
|
||||
tf.flags.DEFINE_float(
|
||||
'small_object_area_threshold', 0.005,
|
||||
'For --dataset_name=visualwakewords only. Threshold of fraction of image '
|
||||
'area below which small objects are filtered')
|
||||
|
||||
tf.flags.DEFINE_string(
|
||||
'foreground_class_of_interest', 'person',
|
||||
'For --dataset_name=visualwakewords only. Build a binary classifier based '
|
||||
'on the presence or absence of this object in the image.')
|
||||
|
||||
|
||||
def main(_):
|
||||
if not FLAGS.dataset_name:
|
||||
raise ValueError('You must supply the dataset name with --dataset_name')
|
||||
if not FLAGS.dataset_dir:
|
||||
raise ValueError('You must supply the dataset directory with --dataset_dir')
|
||||
|
||||
if FLAGS.dataset_name == 'flowers':
|
||||
download_and_convert_flowers.run(FLAGS.dataset_dir)
|
||||
elif FLAGS.dataset_name == 'cifar10':
|
||||
download_and_convert_cifar10.run(FLAGS.dataset_dir)
|
||||
elif FLAGS.dataset_name == 'mnist':
|
||||
download_and_convert_mnist.run(FLAGS.dataset_dir)
|
||||
elif FLAGS.dataset_name == 'visualwakewords':
|
||||
download_and_convert_visualwakewords.run(
|
||||
FLAGS.dataset_dir, FLAGS.small_object_area_threshold,
|
||||
FLAGS.foreground_class_of_interest)
|
||||
else:
|
||||
raise ValueError(
|
||||
'dataset_name [%s] was not recognized.' % FLAGS.dataset_name)
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.compat.v1.app.run()
|
||||
@@ -0,0 +1,182 @@
|
||||
import tensorflow as tf
|
||||
from time import gmtime, strftime
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
from gpu_helper import get_custom_getter
|
||||
import random
|
||||
import numpy as np
|
||||
import os
|
||||
|
||||
np.random.seed(0)
|
||||
random.seed(0)
|
||||
tf.set_random_seed(0)
|
||||
|
||||
|
||||
class Env:
|
||||
def __init__(self, FLAGS):
|
||||
self.FLAGS = FLAGS
|
||||
|
||||
self.slim = contrib_slim
|
||||
self.num_samples = 1281167
|
||||
|
||||
def _configure_optimizer(self, learning_rate):
|
||||
"""Configures the optimizer used for training.
|
||||
|
||||
Args:
|
||||
learning_rate: A scalar or `Tensor` learning rate.
|
||||
|
||||
Returns:
|
||||
An instance of an optimizer.
|
||||
|
||||
Raises:
|
||||
ValueError: if Initializer.FLAGS.optimizer is not recognized.
|
||||
"""
|
||||
if self.FLAGS.optimizer == 'adadelta':
|
||||
optimizer = tf.train.AdadeltaOptimizer(
|
||||
learning_rate,
|
||||
rho=self.FLAGS.adadelta_rho,
|
||||
epsilon=self.FLAGS.opt_epsilon)
|
||||
elif self.FLAGS.optimizer == 'adagrad':
|
||||
optimizer = tf.train.AdagradOptimizer(
|
||||
learning_rate,
|
||||
initial_accumulator_value=self.FLAGS.adagrad_initial_accumulator_value)
|
||||
elif self.FLAGS.optimizer == 'adam':
|
||||
optimizer = tf.train.AdamOptimizer(
|
||||
learning_rate,
|
||||
beta1=self.FLAGS.adam_beta1,
|
||||
beta2=self.FLAGS.adam_beta2,
|
||||
epsilon=self.FLAGS.opt_epsilon)
|
||||
elif self.FLAGS.optimizer == 'ftrl':
|
||||
optimizer = tf.train.FtrlOptimizer(
|
||||
learning_rate,
|
||||
learning_rate_power=self.FLAGS.ftrl_learning_rate_power,
|
||||
initial_accumulator_value=self.FLAGS.ftrl_initial_accumulator_value,
|
||||
l1_regularization_strength=self.FLAGS.ftrl_l1,
|
||||
l2_regularization_strength=self.FLAGS.ftrl_l2)
|
||||
elif self.FLAGS.optimizer == 'momentum':
|
||||
optimizer = tf.train.MomentumOptimizer(
|
||||
learning_rate,
|
||||
momentum=self.FLAGS.momentum,
|
||||
name='Momentum')
|
||||
elif self.FLAGS.optimizer == 'rmsprop':
|
||||
optimizer = tf.train.RMSPropOptimizer(
|
||||
learning_rate,
|
||||
decay=self.FLAGS.rmsprop_decay,
|
||||
momentum=self.FLAGS.rmsprop_momentum,
|
||||
epsilon=self.FLAGS.opt_epsilon)
|
||||
elif self.FLAGS.optimizer == 'sgd':
|
||||
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
|
||||
else:
|
||||
raise ValueError('Optimizer [%s] was not recognized' % self.FLAGS.optimizer)
|
||||
|
||||
return optimizer
|
||||
|
||||
def create_logdir(self):
|
||||
logdir = "results"
|
||||
os.makedirs(logdir, exist_ok=True)
|
||||
return logdir
|
||||
|
||||
def calc_logits(self, network_fn, images):
|
||||
logits, end_points = network_fn(images, reuse=tf.AUTO_REUSE)
|
||||
return logits
|
||||
|
||||
def calc_loss(self, logits_train, labels_train):
|
||||
base_loss = self.slim.losses.softmax_cross_entropy(
|
||||
logits_train, labels_train, label_smoothing=self.FLAGS.label_smoothing, weights=1.0)
|
||||
|
||||
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
|
||||
total_loss = tf.add_n([base_loss] + reg_losses, name='total_loss')
|
||||
|
||||
loss = tf.add_n([base_loss])
|
||||
loss = tf.identity(loss, name='loss')
|
||||
|
||||
return loss, total_loss
|
||||
|
||||
def calc_steps_per_epoch(self):
|
||||
return self.num_samples // (self.FLAGS.batch_size * int(os.getenv('RANK_SIZE')))
|
||||
|
||||
def _configure_learning_rate(self, global_step):
|
||||
steps_per_epoch = self.calc_steps_per_epoch()
|
||||
decay_steps = int(steps_per_epoch * self.FLAGS.num_epochs_per_decay)
|
||||
|
||||
if self.FLAGS.learning_rate_decay_type == 'exponential':
|
||||
learning_rate = tf.train.exponential_decay(
|
||||
self.FLAGS.learning_rate,
|
||||
global_step,
|
||||
decay_steps,
|
||||
self.FLAGS.learning_rate_decay_factor,
|
||||
staircase=True,
|
||||
name='exponential_decay_learning_rate')
|
||||
elif self.FLAGS.learning_rate_decay_type == 'fixed':
|
||||
learning_rate = tf.constant(self.FLAGS.learning_rate, name='fixed_learning_rate')
|
||||
elif self.FLAGS.learning_rate_decay_type == 'cosine_annealing':
|
||||
current_step_epoch = global_step // steps_per_epoch * steps_per_epoch
|
||||
learning_rate = tf.train.cosine_decay(self.FLAGS.learning_rate, current_step_epoch,
|
||||
self.FLAGS.max_number_of_steps)
|
||||
elif self.FLAGS.learning_rate_decay_type == 'polynomial':
|
||||
learning_rate = tf.train.polynomial_decay(
|
||||
self.FLAGS.learning_rate, global_step,
|
||||
decay_steps,
|
||||
self.FLAGS.end_learning_rate,
|
||||
power=1.0,
|
||||
cycle=False,
|
||||
name='polynomial_decay_learning_rate')
|
||||
else:
|
||||
raise ValueError('learning_rate_decay_type [%s] was not recognized' %
|
||||
self.FLAGS.learning_rate_decay_type)
|
||||
|
||||
if self.FLAGS.warmup_epochs:
|
||||
warmup_lr = (
|
||||
self.FLAGS.learning_rate * tf.cast(global_step, tf.float32) /
|
||||
(steps_per_epoch * self.FLAGS.warmup_epochs))
|
||||
learning_rate = tf.minimum(warmup_lr, learning_rate)
|
||||
|
||||
learning_rate = tf.identity(learning_rate, name='learning_rate')
|
||||
# tf.Print(learning_rate, [learning_rate], '*****************')
|
||||
return learning_rate
|
||||
|
||||
def create_train_op(self, global_step, summaries, loss):
|
||||
# Gather update_ops from the first clone. These contain, for example,
|
||||
# the updates for the batch_norm variables created by network_fn.
|
||||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) or []
|
||||
|
||||
#################################
|
||||
# Configure the moving averages #
|
||||
#################################
|
||||
if self.FLAGS.moving_average_decay:
|
||||
moving_average_variables = self.slim.get_model_variables()
|
||||
variable_averages = tf.train.ExponentialMovingAverage(
|
||||
self.FLAGS.moving_average_decay, global_step)
|
||||
else:
|
||||
moving_average_variables, variable_averages = None, None
|
||||
|
||||
#########################################
|
||||
# Configure the optimization procedure. #
|
||||
#########################################
|
||||
learning_rate = self._configure_learning_rate(global_step)
|
||||
summaries.add(tf.summary.scalar('learning_rate', learning_rate))
|
||||
|
||||
if self.FLAGS.moving_average_decay:
|
||||
# Update ops executed locally by trainer.
|
||||
update_ops.append(variable_averages.apply(moving_average_variables))
|
||||
|
||||
opt = self._configure_optimizer(learning_rate)
|
||||
|
||||
from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
|
||||
from npu_bridge.estimator.npu.npu_loss_scale_optimizer import NPULossScaleOptimizer
|
||||
from npu_bridge.estimator.npu.npu_loss_scale_manager import FixedLossScaleManager
|
||||
from npu_bridge.estimator.npu.npu_loss_scale_manager import ExponentialUpdateLossScaleManager
|
||||
loss_scale_manager = FixedLossScaleManager(loss_scale=4096)
|
||||
# loss_scale_manager = ExponentialUpdateLossScaleManager(init_loss_scale=1024, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, decr_ratio=0.5)
|
||||
if int(os.getenv('RANK_SIZE')) == 1:
|
||||
opt = NPULossScaleOptimizer(opt, loss_scale_manager)
|
||||
else:
|
||||
opt = NPULossScaleOptimizer(opt, loss_scale_manager, is_distributed=True)
|
||||
opt = NPUDistributedOptimizer(opt)
|
||||
|
||||
update_op = tf.group(*update_ops)
|
||||
with tf.control_dependencies([update_op]):
|
||||
gate_gradients = (tf.train.Optimizer.GATE_NONE)
|
||||
grads_and_vars = opt.compute_gradients(loss)
|
||||
train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
|
||||
|
||||
return train_op
|
||||
+133
@@ -0,0 +1,133 @@
|
||||
import tensorflow as tf
|
||||
from dataloader import data_provider
|
||||
from datasets import dataset_factory
|
||||
from nets import nets_factory
|
||||
import os
|
||||
|
||||
|
||||
class EstimatorImpl:
|
||||
def __init__(self, env):
|
||||
self.env = env
|
||||
|
||||
def model_fn(self, features, labels, mode, params):
|
||||
num_classes = 1001
|
||||
|
||||
summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
|
||||
|
||||
if mode == tf.estimator.ModeKeys.TRAIN:
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
self.env.FLAGS.model_name,
|
||||
num_classes=(num_classes - self.env.FLAGS.labels_offset),
|
||||
weight_decay=self.env.FLAGS.weight_decay,
|
||||
is_training=True)
|
||||
|
||||
logits = self.env.calc_logits(network_fn, features)
|
||||
loss, total_loss = self.env.calc_loss(logits, labels)
|
||||
|
||||
# ### accuracy ### #
|
||||
predictions = tf.argmax(logits, 1)
|
||||
accuracy_ops = tf.metrics.accuracy(tf.argmax(labels, 1), predictions)
|
||||
tf.identity(accuracy_ops[1], name='train_accuracy')
|
||||
# ### accuracy ### #
|
||||
|
||||
tf.identity(total_loss, 'train_loss')
|
||||
|
||||
global_step = tf.train.get_or_create_global_step()
|
||||
train_op = self.env.create_train_op(global_step, summaries, total_loss)
|
||||
|
||||
estimator_spec = tf.estimator.EstimatorSpec(
|
||||
mode=tf.estimator.ModeKeys.TRAIN, loss=total_loss, train_op=train_op)
|
||||
|
||||
elif mode == tf.estimator.ModeKeys.EVAL:
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
self.env.FLAGS.model_name,
|
||||
num_classes=(num_classes - self.env.FLAGS.labels_offset),
|
||||
weight_decay=self.env.FLAGS.weight_decay,
|
||||
is_training=False)
|
||||
|
||||
logits = self.env.calc_logits(network_fn, features)
|
||||
loss, total_loss = self.env.calc_loss(logits, labels)
|
||||
predictions = tf.argmax(logits, 1)
|
||||
accuracy_ops = tf.metrics.accuracy(tf.argmax(labels, 1), predictions)
|
||||
tf.identity(accuracy_ops[1], name='eval_accuracy')
|
||||
estimator_spec = tf.estimator.EstimatorSpec(
|
||||
mode=tf.estimator.ModeKeys.EVAL,
|
||||
loss=total_loss, eval_metric_ops={'accuracy': accuracy_ops})
|
||||
|
||||
return estimator_spec
|
||||
|
||||
def main(self):
|
||||
logdir = self.env.create_logdir()
|
||||
|
||||
from logger import LogSessionRunHook
|
||||
|
||||
config = {
|
||||
'num_training_samples': self.env.num_samples,
|
||||
# for 1p, just per loop print, for 8p, print each epoch
|
||||
'display_every': 1,
|
||||
'log_name': 'train_log.log',
|
||||
'log_dir': logdir,
|
||||
'global_batch_size': self.env.FLAGS.batch_size * int(os.getenv('RANK_SIZE')),
|
||||
'iterations_per_loop': self.env.FLAGS.iterations_per_loop if self.env.FLAGS.iterations_per_loop is not None else self.env.calc_steps_per_epoch()
|
||||
}
|
||||
|
||||
hooks = [LogSessionRunHook(config, warmup_steps=self.env.FLAGS.warmup_epochs * self.env.calc_steps_per_epoch())]
|
||||
|
||||
#################################################################
|
||||
from npu_bridge.estimator.npu.npu_config import NPURunConfig
|
||||
from npu_bridge.estimator.npu.npu_estimator import NPUEstimator
|
||||
|
||||
self.estimator_config = tf.ConfigProto(
|
||||
inter_op_parallelism_threads=10,
|
||||
intra_op_parallelism_threads=10,
|
||||
allow_soft_placement=True)
|
||||
|
||||
self.estimator_config.gpu_options.allow_growth = True
|
||||
|
||||
gpu_thread_count = 2
|
||||
os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private'
|
||||
os.environ['TF_GPU_THREAD_COUNT'] = str(gpu_thread_count)
|
||||
os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1'
|
||||
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
|
||||
|
||||
run_config = NPURunConfig(
|
||||
hcom_parallel=True,
|
||||
precision_mode="allow_mix_precision",
|
||||
enable_data_pre_proc=True,
|
||||
save_checkpoints_steps=self.env.calc_steps_per_epoch(),
|
||||
session_config=self.estimator_config,
|
||||
model_dir=logdir,
|
||||
iterations_per_loop=config['iterations_per_loop'],
|
||||
keep_checkpoint_max=5)
|
||||
|
||||
classifier = NPUEstimator(
|
||||
model_fn=self.model_fn,
|
||||
config=run_config
|
||||
)
|
||||
###################################################################
|
||||
|
||||
classifier.train(
|
||||
input_fn=self.train_data,
|
||||
max_steps=self.env.FLAGS.max_number_of_steps,
|
||||
hooks=hooks,
|
||||
)
|
||||
|
||||
def train_data(self):
|
||||
dataset = dataset_factory.get_dataset(self.env.FLAGS.dataset_name, 'train', self.env.FLAGS.dataset_dir)
|
||||
|
||||
preprocessing_name = self.env.FLAGS.preprocessing_name or self.env.FLAGS.model_name
|
||||
_, ds = data_provider.get_data(dataset, self.env.FLAGS.batch_size,
|
||||
dataset.num_classes, self.env.FLAGS.labels_offset, True,
|
||||
preprocessing_name, self.env.FLAGS.use_grayscale)
|
||||
|
||||
return ds
|
||||
|
||||
def eval_data(self):
|
||||
dataset = dataset_factory.get_dataset(self.env.FLAGS.dataset_name, 'validation', self.env.FLAGS.dataset_dir)
|
||||
|
||||
preprocessing_name = self.env.FLAGS.preprocessing_name or self.env.FLAGS.model_name
|
||||
_, ds = data_provider.get_data(dataset, self.env.FLAGS.batch_size,
|
||||
dataset.num_classes, self.env.FLAGS.labels_offset, False,
|
||||
preprocessing_name, self.env.FLAGS.use_grayscale)
|
||||
|
||||
return ds
|
||||
+174
@@ -0,0 +1,174 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Generic evaluation script that evaluates a model using a given dataset."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import quantize as contrib_quantize
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
from benchmark_log import hwlog
|
||||
from datasets import dataset_factory
|
||||
from nets import nets_factory
|
||||
import os
|
||||
os.environ["CUDA_VISIBLE_DEVICES"] = '4'
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'batch_size', 100, 'The number of samples in each batch.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'max_num_batches', None,
|
||||
'Max number of batches to evaluate by default use all.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'master', '', 'The address of the TensorFlow master to use.')
|
||||
|
||||
ckpt_path = './results/0526023335_train_hvdTrue_mnmobilenet_v2_augmentedTrue_mixedpFalse_lr0.4_optmomentum_me200_lrdtcosine_annealing_nepd0.3125_lrdf0.98_b256_me_param'
|
||||
# ckpt_path = './results/0523130615_train_hvdTrue_mnmobilenet_v2_augmentedTrue_mixedpFalse_lr0.4_optmomentum_me200_lrdtcosine_annealing_nepd0.3125_lrdf0.98_b256_me_param'
|
||||
tf.app.flags.DEFINE_string(
|
||||
'checkpoint_path', ckpt_path,
|
||||
'The directory where the model was written to or an absolute path to a '
|
||||
'checkpoint file.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'eval_dir', ckpt_path, 'Directory where the results are saved to.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'num_preprocessing_threads', 4,
|
||||
'The number of threads used to create the batches.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_name', 'imagenet', 'The name of the dataset to load.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_split_name', 'validation', 'The name of the train/test split.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_dir', '/data/Datasets/imagenet_TF', 'The directory where the dataset files are stored.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'labels_offset', 0,
|
||||
'An offset for the labels in the dataset. This flag is primarily used to '
|
||||
'evaluate the VGG and ResNet architectures which do not use a background '
|
||||
'class for the ImageNet dataset.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'model_name', 'mobilenet_v2', 'The name of the architecture to evaluate.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'preprocessing_name', None, 'The name of the preprocessing to use. If left '
|
||||
'as `None`, then the model_name flag is used.')
|
||||
|
||||
tf.app.flags.DEFINE_float(
|
||||
'moving_average_decay', None,
|
||||
'The decay to use for the moving average.'
|
||||
'If left as None, then moving averages are not used.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'eval_image_size', None, 'Eval image size')
|
||||
|
||||
tf.app.flags.DEFINE_bool(
|
||||
'quantize', False, 'whether to use quantized graph or not.')
|
||||
|
||||
tf.app.flags.DEFINE_bool('use_grayscale', False,
|
||||
'Whether to convert input images to grayscale.')
|
||||
|
||||
FLAGS = tf.app.flags.FLAGS
|
||||
|
||||
|
||||
def main(_):
|
||||
if not FLAGS.dataset_dir:
|
||||
raise ValueError('You must supply the dataset directory with --dataset_dir')
|
||||
|
||||
tf.logging.set_verbosity(tf.logging.INFO)
|
||||
with tf.Graph().as_default():
|
||||
tf_global_step = slim.get_or_create_global_step()
|
||||
|
||||
######################
|
||||
# Select the dataset #
|
||||
######################
|
||||
dataset = dataset_factory.get_dataset(
|
||||
FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
|
||||
|
||||
####################
|
||||
# Select the model #
|
||||
####################
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
FLAGS.model_name,
|
||||
num_classes=(dataset.num_classes - FLAGS.labels_offset),
|
||||
is_training=False)
|
||||
|
||||
from dataloader import data_provider
|
||||
preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
|
||||
iterator, _ = data_provider.get_data(dataset, FLAGS.batch_size,
|
||||
dataset.num_classes, FLAGS.labels_offset, is_training=False,
|
||||
preprocessing_name=preprocessing_name,
|
||||
use_grayscale=FLAGS.use_grayscale,
|
||||
hvd=None, enable_hvd=None)
|
||||
images, labels = iterator.get_next() # label: [100, 1001]
|
||||
images = tf.reshape(images, [FLAGS.batch_size, 224, 224, 3]) # (100, 224, 224, 3), float32
|
||||
labels = tf.argmax(labels, axis=1) # [100]
|
||||
logits, _ = network_fn(images)
|
||||
|
||||
if FLAGS.quantize:
|
||||
contrib_quantize.create_eval_graph()
|
||||
|
||||
predictions = tf.argmax(logits, 1)
|
||||
labels = tf.squeeze(labels)
|
||||
eval_accuracy, metric_update_op = tf.metrics.accuracy(labels, predictions)
|
||||
|
||||
# tf.summary.scalar('top1_acc', top1_accu)
|
||||
# summaries_op = tf.summary.merge_all()
|
||||
|
||||
# TODO(sguada) use num_epochs=1
|
||||
if FLAGS.max_num_batches:
|
||||
num_batches = FLAGS.max_num_batches
|
||||
else:
|
||||
# This ensures that we make a single pass over all of the data.
|
||||
num_batches = math.ceil(dataset.num_samples / float(FLAGS.batch_size))
|
||||
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
|
||||
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
|
||||
else:
|
||||
checkpoint_path = FLAGS.checkpoint_path
|
||||
|
||||
##### evaluate #####
|
||||
tf.logging.info('Evaluating %s' % checkpoint_path)
|
||||
saver = tf.train.Saver()
|
||||
from time import gmtime, strftime
|
||||
logdir = "results/%s" % strftime("%m%d%H%M%S_evel", gmtime())
|
||||
# summary_writer = tf.summary.FileWriter(logdir=logdir, graph=tf.get_default_graph())
|
||||
with tf.Session() as sess:
|
||||
sess.run(iterator.initializer)
|
||||
sess.run(tf.global_variables_initializer())
|
||||
sess.run(tf.local_variables_initializer())
|
||||
saver.restore(sess, f'{checkpoint_path}')
|
||||
tf.train.write_graph(sess.graph, logdir, 'graph.pbtxt')
|
||||
|
||||
for step in range(num_batches):
|
||||
_metric_update_op = sess.run([metric_update_op])
|
||||
print(f'{step}, _metric_update_op: {_metric_update_op}')
|
||||
|
||||
acc = sess.run([eval_accuracy])
|
||||
print(f'acc: {acc}')
|
||||
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP1, value=f'{acc}')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.app.run()
|
||||
+181
@@ -0,0 +1,181 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Generic evaluation script that evaluates a model using a given dataset."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import quantize as contrib_quantize
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_factory
|
||||
from nets import nets_factory
|
||||
from benchmark_log import hwlog
|
||||
import os
|
||||
os.environ["CUDA_VISIBLE_DEVICES"] = '4'
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'batch_size', 256, 'The number of samples in each batch.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'max_num_batches', None,
|
||||
'Max number of batches to evaluate by default use all.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'master', '', 'The address of the TensorFlow master to use.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'checkpoint_path', 'ckpt_path',
|
||||
'The directory where the model was written to or an absolute path to a '
|
||||
'checkpoint file.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'eval_dir', 'ckpt_path', 'Directory where the results are saved to.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'num_preprocessing_threads', 4,
|
||||
'The number of threads used to create the batches.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_name', 'imagenet', 'The name of the dataset to load.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_split_name', 'validation', 'The name of the train/test split.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_dir', '/opt/npu/slimImagenet', 'The directory where the dataset files are stored.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'labels_offset', 0,
|
||||
'An offset for the labels in the dataset. This flag is primarily used to '
|
||||
'evaluate the VGG and ResNet architectures which do not use a background '
|
||||
'class for the ImageNet dataset.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'model_name', 'mobilenet_v2', 'The name of the architecture to evaluate.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'preprocessing_name', None, 'The name of the preprocessing to use. If left '
|
||||
'as `None`, then the model_name flag is used.')
|
||||
|
||||
tf.app.flags.DEFINE_float(
|
||||
'moving_average_decay', None,
|
||||
'The decay to use for the moving average.'
|
||||
'If left as None, then moving averages are not used.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'eval_image_size', None, 'Eval image size')
|
||||
|
||||
tf.app.flags.DEFINE_bool(
|
||||
'quantize', False, 'whether to use quantized graph or not.')
|
||||
|
||||
tf.app.flags.DEFINE_bool('use_grayscale', False,
|
||||
'Whether to convert input images to grayscale.')
|
||||
|
||||
FLAGS = tf.app.flags.FLAGS
|
||||
|
||||
|
||||
def main(_):
|
||||
if not FLAGS.dataset_dir:
|
||||
raise ValueError('You must supply the dataset directory with --dataset_dir')
|
||||
|
||||
tf.logging.set_verbosity(tf.logging.INFO)
|
||||
with tf.Graph().as_default():
|
||||
tf_global_step = slim.get_or_create_global_step()
|
||||
|
||||
######################
|
||||
# Select the dataset #
|
||||
######################
|
||||
dataset = dataset_factory.get_dataset(
|
||||
FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
|
||||
|
||||
####################
|
||||
# Select the model #
|
||||
####################
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
FLAGS.model_name,
|
||||
num_classes=(dataset.num_classes - FLAGS.labels_offset),
|
||||
is_training=False)
|
||||
|
||||
from dataloader import data_provider
|
||||
preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
|
||||
|
||||
iterator, _ = data_provider.get_data(dataset, FLAGS.batch_size,
|
||||
dataset.num_classes, FLAGS.labels_offset, is_training=False,
|
||||
preprocessing_name=preprocessing_name,
|
||||
use_grayscale=FLAGS.use_grayscale)
|
||||
#tf.logging.info('iterator %s' % iterator)
|
||||
images, labels = iterator.get_next() # label: [100, 1001]
|
||||
images = tf.reshape(images, [FLAGS.batch_size, 224, 224, 3]) # (100, 224, 224, 3), float32
|
||||
labels = tf.argmax(labels, axis=1) # [100]
|
||||
logits, _ = network_fn(images)
|
||||
|
||||
if FLAGS.quantize:
|
||||
contrib_quantize.create_eval_graph()
|
||||
|
||||
predictions = tf.argmax(logits, 1)
|
||||
labels = tf.squeeze(labels)
|
||||
eval_accuracy, metric_update_op = tf.metrics.accuracy(labels, predictions)
|
||||
#hwlog.remark_print(key=hwlog.EVAL_ACCURACY, value="".format(eval_accuracy))
|
||||
|
||||
# tf.summary.scalar('top1_acc', top1_accu)
|
||||
# summaries_op = tf.summary.merge_all()
|
||||
|
||||
# TODO(sguada) use num_epochs=1
|
||||
if FLAGS.max_num_batches:
|
||||
num_batches = FLAGS.max_num_batches
|
||||
else:
|
||||
# This ensures that we make a single pass over all of the data.
|
||||
num_batches = math.ceil(dataset.num_samples / float(FLAGS.batch_size)) - 1
|
||||
|
||||
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
|
||||
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
|
||||
else:
|
||||
checkpoint_path = FLAGS.checkpoint_path
|
||||
# checkpoint_path = '/opt/npu/models/mobilenetv2_v0.1/ckpt/model.ckpt'
|
||||
print(dataset.num_samples)
|
||||
print(FLAGS.batch_size)
|
||||
hwlog.remark_print(key=hwlog.GLOBAL_BATCH_SIZE, value=FLAGS.batch_size)
|
||||
##### evaluate #####
|
||||
tf.logging.info('Evaluating %s' % checkpoint_path)
|
||||
saver = tf.train.Saver()
|
||||
from time import gmtime, strftime
|
||||
logdir = "ckpt/%s" % strftime("%m%d%H%M%S_evel", gmtime())
|
||||
# summary_writer = tf.summary.FileWriter(logdir=logdir, graph=tf.get_default_graph())
|
||||
with tf.Session() as sess:
|
||||
sess.run(iterator.initializer)
|
||||
sess.run(tf.global_variables_initializer())
|
||||
sess.run(tf.local_variables_initializer())
|
||||
saver.restore(sess, f'{checkpoint_path}')
|
||||
# saver.restore(sess, 'result/8p/2/results/model.ckpt-3750')
|
||||
tf.train.write_graph(sess.graph, logdir, 'graph.pbtxt')
|
||||
|
||||
for step in range(num_batches):
|
||||
_metric_update_op = sess.run([metric_update_op])
|
||||
print(f'{step}, _metric_update_op: {_metric_update_op}')
|
||||
hwlog.remark_print(key=hwlog.GLOBAL_STEP, value=f'{step}')
|
||||
hwlog.remark_print(key=hwlog.EVAL_ACCURACY, value=f'{_metric_update_op}')
|
||||
acc = sess.run([eval_accuracy])
|
||||
print(f'acc: {acc}')
|
||||
hwlog.remark_print(key=hwlog.EVAL_ACCURACY_TOP1, value=f'{acc}')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.app.run()
|
||||
+164
@@ -0,0 +1,164 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
r"""Saves out a GraphDef containing the architecture of the model.
|
||||
|
||||
To use it, run something like this, with a model name defined by slim:
|
||||
|
||||
bazel build tensorflow_models/research/slim:export_inference_graph
|
||||
bazel-bin/tensorflow_models/research/slim/export_inference_graph \
|
||||
--model_name=inception_v3 --output_file=/tmp/inception_v3_inf_graph.pb
|
||||
|
||||
If you then want to use the resulting model with your own or pretrained
|
||||
checkpoints as part of a mobile model, you can run freeze_graph to get a graph
|
||||
def with the variables inlined as constants using:
|
||||
|
||||
bazel build tensorflow/python/tools:freeze_graph
|
||||
bazel-bin/tensorflow/python/tools/freeze_graph \
|
||||
--input_graph=/tmp/inception_v3_inf_graph.pb \
|
||||
--input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \
|
||||
--input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \
|
||||
--output_node_names=InceptionV3/Predictions/Reshape_1
|
||||
|
||||
The output node names will vary depending on the model, but you can inspect and
|
||||
estimate them using the summarize_graph tool:
|
||||
|
||||
bazel build tensorflow/tools/graph_transforms:summarize_graph
|
||||
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
|
||||
--in_graph=/tmp/inception_v3_inf_graph.pb
|
||||
|
||||
To run the resulting graph in C++, you can look at the label_image sample code:
|
||||
|
||||
bazel build tensorflow/examples/label_image:label_image
|
||||
bazel-bin/tensorflow/examples/label_image/label_image \
|
||||
--image=${HOME}/Pictures/flowers.jpg \
|
||||
--input_layer=input \
|
||||
--output_layer=InceptionV3/Predictions/Reshape_1 \
|
||||
--graph=/tmp/frozen_inception_v3.pb \
|
||||
--labels=/tmp/imagenet_slim_labels.txt \
|
||||
--input_mean=0 \
|
||||
--input_std=255
|
||||
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
import os
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import quantize as contrib_quantize
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from tensorflow.python.platform import gfile
|
||||
from datasets import dataset_factory
|
||||
from nets import nets_factory
|
||||
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'model_name', 'inception_v3', 'The name of the architecture to save.')
|
||||
|
||||
tf.app.flags.DEFINE_boolean(
|
||||
'is_training', False,
|
||||
'Whether to save out a training-focused version of the model.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'image_size', None,
|
||||
'The image size to use, otherwise use the model default_image_size.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'batch_size', None,
|
||||
'Batch size for the exported model. Defaulted to "None" so batch size can '
|
||||
'be specified at model runtime.')
|
||||
|
||||
tf.app.flags.DEFINE_string('dataset_name', 'imagenet',
|
||||
'The name of the dataset to use with the model.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'labels_offset', 0,
|
||||
'An offset for the labels in the dataset. This flag is primarily used to '
|
||||
'evaluate the VGG and ResNet architectures which do not use a background '
|
||||
'class for the ImageNet dataset.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'output_file', '', 'Where to save the resulting file to.')
|
||||
|
||||
tf.app.flags.DEFINE_string(
|
||||
'dataset_dir', '', 'Directory to save intermediate dataset files to')
|
||||
|
||||
tf.app.flags.DEFINE_bool(
|
||||
'quantize', False, 'whether to use quantized graph or not.')
|
||||
|
||||
tf.app.flags.DEFINE_bool(
|
||||
'is_video_model', False, 'whether to use 5-D inputs for video model.')
|
||||
|
||||
tf.app.flags.DEFINE_integer(
|
||||
'num_frames', None,
|
||||
'The number of frames to use. Only used if is_video_model is True.')
|
||||
|
||||
tf.app.flags.DEFINE_bool('write_text_graphdef', False,
|
||||
'Whether to write a text version of graphdef.')
|
||||
|
||||
tf.app.flags.DEFINE_bool('use_grayscale', False,
|
||||
'Whether to convert input images to grayscale.')
|
||||
|
||||
FLAGS = tf.app.flags.FLAGS
|
||||
|
||||
|
||||
def main(_):
|
||||
if not FLAGS.output_file:
|
||||
raise ValueError('You must supply the path to save to with --output_file')
|
||||
if FLAGS.is_video_model and not FLAGS.num_frames:
|
||||
raise ValueError(
|
||||
'Number of frames must be specified for video models with --num_frames')
|
||||
tf.logging.set_verbosity(tf.logging.INFO)
|
||||
with tf.Graph().as_default() as graph:
|
||||
dataset = dataset_factory.get_dataset(FLAGS.dataset_name, 'train',
|
||||
FLAGS.dataset_dir)
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
FLAGS.model_name,
|
||||
num_classes=(dataset.num_classes - FLAGS.labels_offset),
|
||||
is_training=FLAGS.is_training)
|
||||
image_size = FLAGS.image_size or network_fn.default_image_size
|
||||
num_channels = 1 if FLAGS.use_grayscale else 3
|
||||
if FLAGS.is_video_model:
|
||||
input_shape = [
|
||||
FLAGS.batch_size, FLAGS.num_frames, image_size, image_size,
|
||||
num_channels
|
||||
]
|
||||
else:
|
||||
input_shape = [FLAGS.batch_size, image_size, image_size, num_channels]
|
||||
placeholder = tf.placeholder(name='input', dtype=tf.float32,
|
||||
shape=input_shape)
|
||||
network_fn(placeholder)
|
||||
|
||||
if FLAGS.quantize:
|
||||
contrib_quantize.create_eval_graph()
|
||||
|
||||
graph_def = graph.as_graph_def()
|
||||
if FLAGS.write_text_graphdef:
|
||||
tf.io.write_graph(
|
||||
graph_def,
|
||||
os.path.dirname(FLAGS.output_file),
|
||||
os.path.basename(FLAGS.output_file),
|
||||
as_text=True)
|
||||
else:
|
||||
with gfile.GFile(FLAGS.output_file, 'wb') as f:
|
||||
f.write(graph_def.SerializeToString())
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.app.run()
|
||||
+44
@@ -0,0 +1,44 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
|
||||
"""Tests for export_inference_graph."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.python.platform import gfile
|
||||
import export_inference_graph
|
||||
|
||||
|
||||
class ExportInferenceGraphTest(tf.test.TestCase):
|
||||
|
||||
def testExportInferenceGraph(self):
|
||||
tmpdir = self.get_temp_dir()
|
||||
output_file = os.path.join(tmpdir, 'inception_v3.pb')
|
||||
flags = tf.app.flags.FLAGS
|
||||
flags.output_file = output_file
|
||||
flags.model_name = 'inception_v3'
|
||||
flags.dataset_dir = tmpdir
|
||||
export_inference_graph.main(None)
|
||||
self.assertTrue(gfile.Exists(output_file))
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+44
@@ -0,0 +1,44 @@
|
||||
import tensorflow as tf
|
||||
import numpy as np
|
||||
|
||||
def float32_variable_storage_getter(getter, name, shape=None, dtype=None,
|
||||
initializer=None, regularizer=None,
|
||||
trainable=True,
|
||||
*args, **kwargs):
|
||||
"""Custom variable getter that forces trainable variables to be stored in
|
||||
float32 precision and then casts them to the training precision.
|
||||
"""
|
||||
storage_dtype = tf.float32 if trainable else dtype
|
||||
variable = getter(name, shape, dtype=storage_dtype,
|
||||
initializer=initializer, regularizer=regularizer,
|
||||
trainable=trainable,
|
||||
*args, **kwargs)
|
||||
if trainable and dtype != tf.float32:
|
||||
variable = tf.cast(variable, dtype)
|
||||
return variable
|
||||
|
||||
def get_custom_getter(compute_type):
|
||||
return float32_variable_storage_getter if compute_type == tf.float16 else None
|
||||
|
||||
|
||||
|
||||
def float32_variable_storage_getter_1(getter, name, shape=None, dtype=None,
|
||||
initializer=None, regularizer=None,
|
||||
trainable=True,
|
||||
*args, **kwargs):
|
||||
"""Custom variable getter that forces trainable variables to be stored in
|
||||
float32 precision and then casts them to the training precision.
|
||||
"""
|
||||
dtype = tf.float16
|
||||
storage_dtype = tf.float32 if trainable else dtype
|
||||
variable = getter(name, shape, dtype=storage_dtype,
|
||||
initializer=initializer, regularizer=regularizer,
|
||||
trainable=trainable,
|
||||
*args, **kwargs)
|
||||
if trainable and dtype != tf.float32:
|
||||
variable = tf.cast(variable, dtype)
|
||||
return variable
|
||||
|
||||
def get_custom_getter_1(compute_type):
|
||||
return float32_variable_storage_getter_1 if compute_type == tf.float16 else None
|
||||
|
||||
@@ -0,0 +1,86 @@
|
||||
from __future__ import print_function
|
||||
|
||||
import datetime
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from benchmark_log import hwlog
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
|
||||
class LogSessionRunHook(tf.train.SessionRunHook):
|
||||
def __init__(self, config, warmup_steps=5):
|
||||
self.global_batch_size = config['global_batch_size']
|
||||
self.iterations_per_loop = config['iterations_per_loop']
|
||||
self.warmup_steps = warmup_steps
|
||||
self.iter_times = []
|
||||
self.num_records = config['num_training_samples']
|
||||
self.display_every = config['display_every']
|
||||
self.logger = get_logger(config['log_name'], config['log_dir'])
|
||||
rank0log(self.logger, 'PY' + str(sys.version) + 'TF' + str(tf.__version__))
|
||||
|
||||
def after_create_session(self, session, coord):
|
||||
rank0log(self.logger, 'Step Epoch Speed Loss FinLoss LR')
|
||||
self.elapsed_secs = 0.
|
||||
self.count = 0
|
||||
|
||||
def before_run(self, run_context):
|
||||
self.t0 = time.time()
|
||||
return tf.train.SessionRunArgs(
|
||||
fetches=[tf.train.get_global_step(), 'loss:0', 'total_loss:0', 'learning_rate:0',
|
||||
'train_accuracy:0'])
|
||||
|
||||
def after_run(self, run_context, run_values):
|
||||
batch_time = time.time() - self.t0
|
||||
self.iter_times.append(batch_time)
|
||||
self.elapsed_secs += batch_time
|
||||
self.count += 1
|
||||
global_step, loss, total_loss, lr, train_accuracy = run_values.results
|
||||
if global_step == 1 or global_step % self.display_every == 0:
|
||||
dt = self.elapsed_secs / self.count
|
||||
img_per_sec = self.global_batch_size * self.iterations_per_loop / dt
|
||||
epoch = global_step * self.global_batch_size / self.num_records
|
||||
self.logger.info(f'step:{global_step} epoch:{epoch} ips:{img_per_sec} '
|
||||
f'loss:{loss} total_loss:{total_loss} lr:{lr}, '
|
||||
f'train_accuracy:{train_accuracy}')
|
||||
|
||||
hwlog.remark_print(key=hwlog.GLOBAL_STEP, value=f"{global_step}")
|
||||
hwlog.remark_print(key=hwlog.CURRENT_EPOCH, value=f"{epoch}")
|
||||
hwlog.remark_print(key=hwlog.TRAIN_ACCURACY, value=f"{train_accuracy}")
|
||||
hwlog.remark_print(key=hwlog.FPS, value=f"{img_per_sec}")
|
||||
self.elapsed_secs = 0.
|
||||
self.count = 0
|
||||
|
||||
def get_average_speed(self):
|
||||
avg_time = np.mean(self.iter_times[self.warmup_steps:])
|
||||
speed = self.global_batch_size / avg_time
|
||||
return speed
|
||||
|
||||
|
||||
def rank0log(logger, *args, **kwargs):
|
||||
if logger:
|
||||
logger.info(''.join([str(x) for x in list(args)]))
|
||||
else:
|
||||
print(*args, **kwargs)
|
||||
|
||||
|
||||
def get_logger(log_name, log_dir):
|
||||
logger = logging.getLogger(log_name)
|
||||
logger.setLevel(logging.INFO) # INFO, ERROR
|
||||
if not os.path.isdir(log_dir):
|
||||
try:
|
||||
os.makedirs(log_dir)
|
||||
except FileExistsError:
|
||||
pass
|
||||
ch = logging.StreamHandler()
|
||||
ch.setLevel(logging.INFO)
|
||||
formatter = logging.Formatter('%(message)s')
|
||||
ch.setFormatter(formatter)
|
||||
logger.addHandler(ch)
|
||||
fh = logging.FileHandler(os.path.join(log_dir, log_name))
|
||||
fh.setLevel(logging.DEBUG)
|
||||
fh.setFormatter(formatter)
|
||||
logger.addHandler(fh)
|
||||
return logger
|
||||
+1
@@ -0,0 +1 @@
|
||||
|
||||
+148
@@ -0,0 +1,148 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains a model definition for AlexNet.
|
||||
|
||||
This work was first described in:
|
||||
ImageNet Classification with Deep Convolutional Neural Networks
|
||||
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton
|
||||
|
||||
and later refined in:
|
||||
One weird trick for parallelizing convolutional neural networks
|
||||
Alex Krizhevsky, 2014
|
||||
|
||||
Here we provide the implementation proposed in "One weird trick" and not
|
||||
"ImageNet Classification", as per the paper, the LRN layers have been removed.
|
||||
|
||||
Usage:
|
||||
with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
|
||||
outputs, end_points = alexnet.alexnet_v2(inputs)
|
||||
|
||||
@@alexnet_v2
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
|
||||
|
||||
def alexnet_v2_arg_scope(weight_decay=0.0005):
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected],
|
||||
activation_fn=tf.nn.relu,
|
||||
biases_initializer=tf.compat.v1.constant_initializer(0.1),
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay)):
|
||||
with slim.arg_scope([slim.conv2d], padding='SAME'):
|
||||
with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
|
||||
return arg_sc
|
||||
|
||||
|
||||
def alexnet_v2(inputs,
|
||||
num_classes=1000,
|
||||
is_training=True,
|
||||
dropout_keep_prob=0.5,
|
||||
spatial_squeeze=True,
|
||||
scope='alexnet_v2',
|
||||
global_pool=False):
|
||||
"""AlexNet version 2.
|
||||
|
||||
Described in: http://arxiv.org/pdf/1404.5997v2.pdf
|
||||
Parameters from:
|
||||
github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
|
||||
layers-imagenet-1gpu.cfg
|
||||
|
||||
Note: All the fully_connected layers have been transformed to conv2d layers.
|
||||
To use in classification mode, resize input to 224x224 or set
|
||||
global_pool=True. To use in fully convolutional mode, set
|
||||
spatial_squeeze to false.
|
||||
The LRN layers have been removed and change the initializers from
|
||||
random_normal_initializer to xavier_initializer.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
num_classes: the number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer are returned instead.
|
||||
is_training: whether or not the model is being trained.
|
||||
dropout_keep_prob: the probability that activations are kept in the dropout
|
||||
layers during training.
|
||||
spatial_squeeze: whether or not should squeeze the spatial dimensions of the
|
||||
logits. Useful to remove unnecessary dimensions for classification.
|
||||
scope: Optional scope for the variables.
|
||||
global_pool: Optional boolean flag. If True, the input to the classification
|
||||
layer is avgpooled to size 1x1, for any input size. (This is not part
|
||||
of the original AlexNet.)
|
||||
|
||||
Returns:
|
||||
net: the output of the logits layer (if num_classes is a non-zero integer),
|
||||
or the non-dropped-out input to the logits layer (if num_classes is 0
|
||||
or None).
|
||||
end_points: a dict of tensors with intermediate activations.
|
||||
"""
|
||||
with tf.compat.v1.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
|
||||
end_points_collection = sc.original_name_scope + '_end_points'
|
||||
# Collect outputs for conv2d, fully_connected and max_pool2d.
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
|
||||
outputs_collections=[end_points_collection]):
|
||||
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
|
||||
scope='conv1')
|
||||
net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
|
||||
net = slim.conv2d(net, 192, [5, 5], scope='conv2')
|
||||
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
|
||||
net = slim.conv2d(net, 384, [3, 3], scope='conv3')
|
||||
net = slim.conv2d(net, 384, [3, 3], scope='conv4')
|
||||
net = slim.conv2d(net, 256, [3, 3], scope='conv5')
|
||||
net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')
|
||||
|
||||
# Use conv2d instead of fully_connected layers.
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d],
|
||||
weights_initializer=trunc_normal(0.005),
|
||||
biases_initializer=tf.compat.v1.constant_initializer(0.1)):
|
||||
net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
|
||||
scope='fc6')
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='dropout6')
|
||||
net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
|
||||
# Convert end_points_collection into a end_point dict.
|
||||
end_points = slim.utils.convert_collection_to_dict(
|
||||
end_points_collection)
|
||||
if global_pool:
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
if num_classes:
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='dropout7')
|
||||
net = slim.conv2d(
|
||||
net,
|
||||
num_classes, [1, 1],
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
biases_initializer=tf.compat.v1.zeros_initializer(),
|
||||
scope='fc8')
|
||||
if spatial_squeeze:
|
||||
net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
|
||||
end_points[sc.name + '/fc8'] = net
|
||||
return net, end_points
|
||||
|
||||
|
||||
alexnet_v2.default_image_size = 224
|
||||
+181
@@ -0,0 +1,181 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.nets.alexnet."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import alexnet
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class AlexnetV2Test(tf.test.TestCase):
|
||||
|
||||
def testBuild(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(inputs, num_classes)
|
||||
self.assertEquals(logits.op.name, 'alexnet_v2/fc8/squeezed')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testFullyConvolutional(self):
|
||||
batch_size = 1
|
||||
height, width = 300, 400
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(inputs, num_classes, spatial_squeeze=False)
|
||||
self.assertEquals(logits.op.name, 'alexnet_v2/fc8/BiasAdd')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, 4, 7, num_classes])
|
||||
|
||||
def testGlobalPool(self):
|
||||
batch_size = 1
|
||||
height, width = 256, 256
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(inputs, num_classes, spatial_squeeze=False,
|
||||
global_pool=True)
|
||||
self.assertEquals(logits.op.name, 'alexnet_v2/fc8/BiasAdd')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, 1, 1, num_classes])
|
||||
|
||||
def testEndPoints(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = alexnet.alexnet_v2(inputs, num_classes)
|
||||
expected_names = ['alexnet_v2/conv1',
|
||||
'alexnet_v2/pool1',
|
||||
'alexnet_v2/conv2',
|
||||
'alexnet_v2/pool2',
|
||||
'alexnet_v2/conv3',
|
||||
'alexnet_v2/conv4',
|
||||
'alexnet_v2/conv5',
|
||||
'alexnet_v2/pool5',
|
||||
'alexnet_v2/fc6',
|
||||
'alexnet_v2/fc7',
|
||||
'alexnet_v2/fc8'
|
||||
]
|
||||
self.assertSetEqual(set(end_points.keys()), set(expected_names))
|
||||
|
||||
def testNoClasses(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = alexnet.alexnet_v2(inputs, num_classes)
|
||||
expected_names = ['alexnet_v2/conv1',
|
||||
'alexnet_v2/pool1',
|
||||
'alexnet_v2/conv2',
|
||||
'alexnet_v2/pool2',
|
||||
'alexnet_v2/conv3',
|
||||
'alexnet_v2/conv4',
|
||||
'alexnet_v2/conv5',
|
||||
'alexnet_v2/pool5',
|
||||
'alexnet_v2/fc6',
|
||||
'alexnet_v2/fc7'
|
||||
]
|
||||
self.assertSetEqual(set(end_points.keys()), set(expected_names))
|
||||
self.assertTrue(net.op.name.startswith('alexnet_v2/fc7'))
|
||||
self.assertListEqual(net.get_shape().as_list(),
|
||||
[batch_size, 1, 1, 4096])
|
||||
|
||||
def testModelVariables(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
alexnet.alexnet_v2(inputs, num_classes)
|
||||
expected_names = ['alexnet_v2/conv1/weights',
|
||||
'alexnet_v2/conv1/biases',
|
||||
'alexnet_v2/conv2/weights',
|
||||
'alexnet_v2/conv2/biases',
|
||||
'alexnet_v2/conv3/weights',
|
||||
'alexnet_v2/conv3/biases',
|
||||
'alexnet_v2/conv4/weights',
|
||||
'alexnet_v2/conv4/biases',
|
||||
'alexnet_v2/conv5/weights',
|
||||
'alexnet_v2/conv5/biases',
|
||||
'alexnet_v2/fc6/weights',
|
||||
'alexnet_v2/fc6/biases',
|
||||
'alexnet_v2/fc7/weights',
|
||||
'alexnet_v2/fc7/biases',
|
||||
'alexnet_v2/fc8/weights',
|
||||
'alexnet_v2/fc8/biases',
|
||||
]
|
||||
model_variables = [v.op.name for v in slim.get_model_variables()]
|
||||
self.assertSetEqual(set(model_variables), set(expected_names))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(eval_inputs, is_training=False)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
self.assertListEqual(predictions.get_shape().as_list(), [batch_size])
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 2
|
||||
eval_batch_size = 1
|
||||
train_height, train_width = 224, 224
|
||||
eval_height, eval_width = 300, 400
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
train_inputs = tf.random.uniform(
|
||||
(train_batch_size, train_height, train_width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(train_inputs)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[train_batch_size, num_classes])
|
||||
tf.compat.v1.get_variable_scope().reuse_variables()
|
||||
eval_inputs = tf.random.uniform(
|
||||
(eval_batch_size, eval_height, eval_width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(eval_inputs, is_training=False,
|
||||
spatial_squeeze=False)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[eval_batch_size, 4, 7, num_classes])
|
||||
logits = tf.reduce_mean(input_tensor=logits, axis=[1, 2])
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
self.assertEquals(predictions.get_shape().as_list(), [eval_batch_size])
|
||||
|
||||
def testForward(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = alexnet.alexnet_v2(inputs)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits)
|
||||
self.assertTrue(output.any())
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+123
@@ -0,0 +1,123 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains a variant of the CIFAR-10 model definition."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
stddev=stddev)
|
||||
|
||||
|
||||
def cifarnet(images, num_classes=10, is_training=False,
|
||||
dropout_keep_prob=0.5,
|
||||
prediction_fn=slim.softmax,
|
||||
scope='CifarNet'):
|
||||
"""Creates a variant of the CifarNet model.
|
||||
|
||||
Note that since the output is a set of 'logits', the values fall in the
|
||||
interval of (-infinity, infinity). Consequently, to convert the outputs to a
|
||||
probability distribution over the characters, one will need to convert them
|
||||
using the softmax function:
|
||||
|
||||
logits = cifarnet.cifarnet(images, is_training=False)
|
||||
probabilities = tf.nn.softmax(logits)
|
||||
predictions = tf.argmax(logits, 1)
|
||||
|
||||
Args:
|
||||
images: A batch of `Tensors` of size [batch_size, height, width, channels].
|
||||
num_classes: the number of classes in the dataset. If 0 or None, the logits
|
||||
layer is omitted and the input features to the logits layer are returned
|
||||
instead.
|
||||
is_training: specifies whether or not we're currently training the model.
|
||||
This variable will determine the behaviour of the dropout layer.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
net: a 2D Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the input to the logits layer if num_classes
|
||||
is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
"""
|
||||
end_points = {}
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'CifarNet', [images]):
|
||||
net = slim.conv2d(images, 64, [5, 5], scope='conv1')
|
||||
end_points['conv1'] = net
|
||||
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
|
||||
end_points['pool1'] = net
|
||||
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm1')
|
||||
net = slim.conv2d(net, 64, [5, 5], scope='conv2')
|
||||
end_points['conv2'] = net
|
||||
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm2')
|
||||
net = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
|
||||
end_points['pool2'] = net
|
||||
net = slim.flatten(net)
|
||||
end_points['Flatten'] = net
|
||||
net = slim.fully_connected(net, 384, scope='fc3')
|
||||
end_points['fc3'] = net
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='dropout3')
|
||||
net = slim.fully_connected(net, 192, scope='fc4')
|
||||
end_points['fc4'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
logits = slim.fully_connected(
|
||||
net,
|
||||
num_classes,
|
||||
biases_initializer=tf.compat.v1.zeros_initializer(),
|
||||
weights_initializer=trunc_normal(1 / 192.0),
|
||||
weights_regularizer=None,
|
||||
activation_fn=None,
|
||||
scope='logits')
|
||||
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
|
||||
return logits, end_points
|
||||
cifarnet.default_image_size = 32
|
||||
|
||||
|
||||
def cifarnet_arg_scope(weight_decay=0.004):
|
||||
"""Defines the default cifarnet argument scope.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the inception v3 model.
|
||||
"""
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d],
|
||||
weights_initializer=tf.compat.v1.truncated_normal_initializer(
|
||||
stddev=5e-2),
|
||||
activation_fn=tf.nn.relu):
|
||||
with slim.arg_scope(
|
||||
[slim.fully_connected],
|
||||
biases_initializer=tf.compat.v1.constant_initializer(0.1),
|
||||
weights_initializer=trunc_normal(0.04),
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
activation_fn=tf.nn.relu) as sc:
|
||||
return sc
|
||||
+280
@@ -0,0 +1,280 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Defines the CycleGAN generator and discriminator networks."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
from tensorflow.contrib import util as contrib_util
|
||||
|
||||
layers = contrib_layers
|
||||
|
||||
|
||||
def cyclegan_arg_scope(instance_norm_center=True,
|
||||
instance_norm_scale=True,
|
||||
instance_norm_epsilon=0.001,
|
||||
weights_init_stddev=0.02,
|
||||
weight_decay=0.0):
|
||||
"""Returns a default argument scope for all generators and discriminators.
|
||||
|
||||
Args:
|
||||
instance_norm_center: Whether instance normalization applies centering.
|
||||
instance_norm_scale: Whether instance normalization applies scaling.
|
||||
instance_norm_epsilon: Small float added to the variance in the instance
|
||||
normalization to avoid dividing by zero.
|
||||
weights_init_stddev: Standard deviation of the random values to initialize
|
||||
the convolution kernels with.
|
||||
weight_decay: Magnitude of weight decay applied to all convolution kernel
|
||||
variables of the generator.
|
||||
|
||||
Returns:
|
||||
An arg-scope.
|
||||
"""
|
||||
instance_norm_params = {
|
||||
'center': instance_norm_center,
|
||||
'scale': instance_norm_scale,
|
||||
'epsilon': instance_norm_epsilon,
|
||||
}
|
||||
|
||||
weights_regularizer = None
|
||||
if weight_decay and weight_decay > 0.0:
|
||||
weights_regularizer = layers.l2_regularizer(weight_decay)
|
||||
|
||||
with contrib_framework.arg_scope(
|
||||
[layers.conv2d],
|
||||
normalizer_fn=layers.instance_norm,
|
||||
normalizer_params=instance_norm_params,
|
||||
weights_initializer=tf.compat.v1.random_normal_initializer(
|
||||
0, weights_init_stddev),
|
||||
weights_regularizer=weights_regularizer) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def cyclegan_upsample(net, num_outputs, stride, method='conv2d_transpose',
|
||||
pad_mode='REFLECT', align_corners=False):
|
||||
"""Upsamples the given inputs.
|
||||
|
||||
Args:
|
||||
net: A Tensor of size [batch_size, height, width, filters].
|
||||
num_outputs: The number of output filters.
|
||||
stride: A list of 2 scalars or a 1x2 Tensor indicating the scale,
|
||||
relative to the inputs, of the output dimensions. For example, if kernel
|
||||
size is [2, 3], then the output height and width will be twice and three
|
||||
times the input size.
|
||||
method: The upsampling method: 'nn_upsample_conv', 'bilinear_upsample_conv',
|
||||
or 'conv2d_transpose'.
|
||||
pad_mode: mode for tf.pad, one of "CONSTANT", "REFLECT", or "SYMMETRIC".
|
||||
align_corners: option for method, 'bilinear_upsample_conv'. If true, the
|
||||
centers of the 4 corner pixels of the input and output tensors are
|
||||
aligned, preserving the values at the corner pixels.
|
||||
|
||||
Returns:
|
||||
A Tensor which was upsampled using the specified method.
|
||||
|
||||
Raises:
|
||||
ValueError: if `method` is not recognized.
|
||||
"""
|
||||
with tf.compat.v1.variable_scope('upconv'):
|
||||
net_shape = tf.shape(input=net)
|
||||
height = net_shape[1]
|
||||
width = net_shape[2]
|
||||
|
||||
# Reflection pad by 1 in spatial dimensions (axes 1, 2 = h, w) to make a 3x3
|
||||
# 'valid' convolution produce an output with the same dimension as the
|
||||
# input.
|
||||
spatial_pad_1 = np.array([[0, 0], [1, 1], [1, 1], [0, 0]])
|
||||
|
||||
if method == 'nn_upsample_conv':
|
||||
net = tf.image.resize(
|
||||
net, [stride[0] * height, stride[1] * width],
|
||||
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
|
||||
net = tf.pad(tensor=net, paddings=spatial_pad_1, mode=pad_mode)
|
||||
net = layers.conv2d(net, num_outputs, kernel_size=[3, 3], padding='valid')
|
||||
elif method == 'bilinear_upsample_conv':
|
||||
net = tf.compat.v1.image.resize_bilinear(
|
||||
net, [stride[0] * height, stride[1] * width],
|
||||
align_corners=align_corners)
|
||||
net = tf.pad(tensor=net, paddings=spatial_pad_1, mode=pad_mode)
|
||||
net = layers.conv2d(net, num_outputs, kernel_size=[3, 3], padding='valid')
|
||||
elif method == 'conv2d_transpose':
|
||||
# This corrects 1 pixel offset for images with even width and height.
|
||||
# conv2d is left aligned and conv2d_transpose is right aligned for even
|
||||
# sized images (while doing 'SAME' padding).
|
||||
# Note: This doesn't reflect actual model in paper.
|
||||
net = layers.conv2d_transpose(
|
||||
net, num_outputs, kernel_size=[3, 3], stride=stride, padding='valid')
|
||||
net = net[:, 1:, 1:, :]
|
||||
else:
|
||||
raise ValueError('Unknown method: [%s]' % method)
|
||||
|
||||
return net
|
||||
|
||||
|
||||
def _dynamic_or_static_shape(tensor):
|
||||
shape = tf.shape(input=tensor)
|
||||
static_shape = contrib_util.constant_value(shape)
|
||||
return static_shape if static_shape is not None else shape
|
||||
|
||||
|
||||
def cyclegan_generator_resnet(images,
|
||||
arg_scope_fn=cyclegan_arg_scope,
|
||||
num_resnet_blocks=6,
|
||||
num_filters=64,
|
||||
upsample_fn=cyclegan_upsample,
|
||||
kernel_size=3,
|
||||
tanh_linear_slope=0.0,
|
||||
is_training=False):
|
||||
"""Defines the cyclegan resnet network architecture.
|
||||
|
||||
As closely as possible following
|
||||
https://github.com/junyanz/CycleGAN/blob/master/models/architectures.lua#L232
|
||||
|
||||
FYI: This network requires input height and width to be divisible by 4 in
|
||||
order to generate an output with shape equal to input shape. Assertions will
|
||||
catch this if input dimensions are known at graph construction time, but
|
||||
there's no protection if unknown at graph construction time (you'll see an
|
||||
error).
|
||||
|
||||
Args:
|
||||
images: Input image tensor of shape [batch_size, h, w, 3].
|
||||
arg_scope_fn: Function to create the global arg_scope for the network.
|
||||
num_resnet_blocks: Number of ResNet blocks in the middle of the generator.
|
||||
num_filters: Number of filters of the first hidden layer.
|
||||
upsample_fn: Upsampling function for the decoder part of the generator.
|
||||
kernel_size: Size w or list/tuple [h, w] of the filter kernels for all inner
|
||||
layers.
|
||||
tanh_linear_slope: Slope of the linear function to add to the tanh over the
|
||||
logits.
|
||||
is_training: Whether the network is created in training mode or inference
|
||||
only mode. Not actually needed, just for compliance with other generator
|
||||
network functions.
|
||||
|
||||
Returns:
|
||||
A `Tensor` representing the model output and a dictionary of model end
|
||||
points.
|
||||
|
||||
Raises:
|
||||
ValueError: If the input height or width is known at graph construction time
|
||||
and not a multiple of 4.
|
||||
"""
|
||||
# Neither dropout nor batch norm -> dont need is_training
|
||||
del is_training
|
||||
|
||||
end_points = {}
|
||||
|
||||
input_size = images.shape.as_list()
|
||||
height, width = input_size[1], input_size[2]
|
||||
if height and height % 4 != 0:
|
||||
raise ValueError('The input height must be a multiple of 4.')
|
||||
if width and width % 4 != 0:
|
||||
raise ValueError('The input width must be a multiple of 4.')
|
||||
num_outputs = input_size[3]
|
||||
|
||||
if not isinstance(kernel_size, (list, tuple)):
|
||||
kernel_size = [kernel_size, kernel_size]
|
||||
|
||||
kernel_height = kernel_size[0]
|
||||
kernel_width = kernel_size[1]
|
||||
pad_top = (kernel_height - 1) // 2
|
||||
pad_bottom = kernel_height // 2
|
||||
pad_left = (kernel_width - 1) // 2
|
||||
pad_right = kernel_width // 2
|
||||
paddings = np.array(
|
||||
[[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]],
|
||||
dtype=np.int32)
|
||||
spatial_pad_3 = np.array([[0, 0], [3, 3], [3, 3], [0, 0]])
|
||||
|
||||
with contrib_framework.arg_scope(arg_scope_fn()):
|
||||
|
||||
###########
|
||||
# Encoder #
|
||||
###########
|
||||
with tf.compat.v1.variable_scope('input'):
|
||||
# 7x7 input stage
|
||||
net = tf.pad(tensor=images, paddings=spatial_pad_3, mode='REFLECT')
|
||||
net = layers.conv2d(net, num_filters, kernel_size=[7, 7], padding='VALID')
|
||||
end_points['encoder_0'] = net
|
||||
|
||||
with tf.compat.v1.variable_scope('encoder'):
|
||||
with contrib_framework.arg_scope([layers.conv2d],
|
||||
kernel_size=kernel_size,
|
||||
stride=2,
|
||||
activation_fn=tf.nn.relu,
|
||||
padding='VALID'):
|
||||
|
||||
net = tf.pad(tensor=net, paddings=paddings, mode='REFLECT')
|
||||
net = layers.conv2d(net, num_filters * 2)
|
||||
end_points['encoder_1'] = net
|
||||
net = tf.pad(tensor=net, paddings=paddings, mode='REFLECT')
|
||||
net = layers.conv2d(net, num_filters * 4)
|
||||
end_points['encoder_2'] = net
|
||||
|
||||
###################
|
||||
# Residual Blocks #
|
||||
###################
|
||||
with tf.compat.v1.variable_scope('residual_blocks'):
|
||||
with contrib_framework.arg_scope([layers.conv2d],
|
||||
kernel_size=kernel_size,
|
||||
stride=1,
|
||||
activation_fn=tf.nn.relu,
|
||||
padding='VALID'):
|
||||
for block_id in xrange(num_resnet_blocks):
|
||||
with tf.compat.v1.variable_scope('block_{}'.format(block_id)):
|
||||
res_net = tf.pad(tensor=net, paddings=paddings, mode='REFLECT')
|
||||
res_net = layers.conv2d(res_net, num_filters * 4)
|
||||
res_net = tf.pad(tensor=res_net, paddings=paddings, mode='REFLECT')
|
||||
res_net = layers.conv2d(res_net, num_filters * 4,
|
||||
activation_fn=None)
|
||||
net += res_net
|
||||
|
||||
end_points['resnet_block_%d' % block_id] = net
|
||||
|
||||
###########
|
||||
# Decoder #
|
||||
###########
|
||||
with tf.compat.v1.variable_scope('decoder'):
|
||||
|
||||
with contrib_framework.arg_scope([layers.conv2d],
|
||||
kernel_size=kernel_size,
|
||||
stride=1,
|
||||
activation_fn=tf.nn.relu):
|
||||
|
||||
with tf.compat.v1.variable_scope('decoder1'):
|
||||
net = upsample_fn(net, num_outputs=num_filters * 2, stride=[2, 2])
|
||||
end_points['decoder1'] = net
|
||||
|
||||
with tf.compat.v1.variable_scope('decoder2'):
|
||||
net = upsample_fn(net, num_outputs=num_filters, stride=[2, 2])
|
||||
end_points['decoder2'] = net
|
||||
|
||||
with tf.compat.v1.variable_scope('output'):
|
||||
net = tf.pad(tensor=net, paddings=spatial_pad_3, mode='REFLECT')
|
||||
logits = layers.conv2d(
|
||||
net,
|
||||
num_outputs, [7, 7],
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
padding='valid')
|
||||
logits = tf.reshape(logits, _dynamic_or_static_shape(images))
|
||||
|
||||
end_points['logits'] = logits
|
||||
end_points['predictions'] = tf.tanh(logits) + logits * tanh_linear_slope
|
||||
|
||||
return end_points['predictions'], end_points
|
||||
+110
@@ -0,0 +1,110 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for tensorflow.contrib.slim.nets.cyclegan."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from nets import cyclegan
|
||||
|
||||
|
||||
# TODO(joelshor): Add a test to check generator endpoints.
|
||||
class CycleganTest(tf.test.TestCase):
|
||||
|
||||
def test_generator_inference(self):
|
||||
"""Check one inference step."""
|
||||
img_batch = tf.zeros([2, 32, 32, 3])
|
||||
model_output, _ = cyclegan.cyclegan_generator_resnet(img_batch)
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
sess.run(model_output)
|
||||
|
||||
def _test_generator_graph_helper(self, shape):
|
||||
"""Check that generator can take small and non-square inputs."""
|
||||
output_imgs, _ = cyclegan.cyclegan_generator_resnet(tf.ones(shape))
|
||||
self.assertAllEqual(shape, output_imgs.shape.as_list())
|
||||
|
||||
def test_generator_graph_small(self):
|
||||
self._test_generator_graph_helper([4, 32, 32, 3])
|
||||
|
||||
def test_generator_graph_medium(self):
|
||||
self._test_generator_graph_helper([3, 128, 128, 3])
|
||||
|
||||
def test_generator_graph_nonsquare(self):
|
||||
self._test_generator_graph_helper([2, 80, 400, 3])
|
||||
|
||||
def test_generator_unknown_batch_dim(self):
|
||||
"""Check that generator can take unknown batch dimension inputs."""
|
||||
img = tf.compat.v1.placeholder(tf.float32, shape=[None, 32, None, 3])
|
||||
output_imgs, _ = cyclegan.cyclegan_generator_resnet(img)
|
||||
|
||||
self.assertAllEqual([None, 32, None, 3], output_imgs.shape.as_list())
|
||||
|
||||
def _input_and_output_same_shape_helper(self, kernel_size):
|
||||
img_batch = tf.compat.v1.placeholder(tf.float32, shape=[None, 32, 32, 3])
|
||||
output_img_batch, _ = cyclegan.cyclegan_generator_resnet(
|
||||
img_batch, kernel_size=kernel_size)
|
||||
|
||||
self.assertAllEqual(img_batch.shape.as_list(),
|
||||
output_img_batch.shape.as_list())
|
||||
|
||||
def input_and_output_same_shape_kernel3(self):
|
||||
self._input_and_output_same_shape_helper(3)
|
||||
|
||||
def input_and_output_same_shape_kernel4(self):
|
||||
self._input_and_output_same_shape_helper(4)
|
||||
|
||||
def input_and_output_same_shape_kernel5(self):
|
||||
self._input_and_output_same_shape_helper(5)
|
||||
|
||||
def input_and_output_same_shape_kernel6(self):
|
||||
self._input_and_output_same_shape_helper(6)
|
||||
|
||||
def _error_if_height_not_multiple_of_four_helper(self, height):
|
||||
self.assertRaisesRegexp(
|
||||
ValueError, 'The input height must be a multiple of 4.',
|
||||
cyclegan.cyclegan_generator_resnet,
|
||||
tf.compat.v1.placeholder(tf.float32, shape=[None, height, 32, 3]))
|
||||
|
||||
def test_error_if_height_not_multiple_of_four_height29(self):
|
||||
self._error_if_height_not_multiple_of_four_helper(29)
|
||||
|
||||
def test_error_if_height_not_multiple_of_four_height30(self):
|
||||
self._error_if_height_not_multiple_of_four_helper(30)
|
||||
|
||||
def test_error_if_height_not_multiple_of_four_height31(self):
|
||||
self._error_if_height_not_multiple_of_four_helper(31)
|
||||
|
||||
def _error_if_width_not_multiple_of_four_helper(self, width):
|
||||
self.assertRaisesRegexp(
|
||||
ValueError, 'The input width must be a multiple of 4.',
|
||||
cyclegan.cyclegan_generator_resnet,
|
||||
tf.compat.v1.placeholder(tf.float32, shape=[None, 32, width, 3]))
|
||||
|
||||
def test_error_if_width_not_multiple_of_four_width29(self):
|
||||
self._error_if_width_not_multiple_of_four_helper(29)
|
||||
|
||||
def test_error_if_width_not_multiple_of_four_width30(self):
|
||||
self._error_if_width_not_multiple_of_four_helper(30)
|
||||
|
||||
def test_error_if_width_not_multiple_of_four_width31(self):
|
||||
self._error_if_width_not_multiple_of_four_helper(31)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+205
@@ -0,0 +1,205 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""DCGAN generator and discriminator from https://arxiv.org/abs/1511.06434."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from math import log
|
||||
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def _validate_image_inputs(inputs):
|
||||
inputs.get_shape().assert_has_rank(4)
|
||||
inputs.get_shape()[1:3].assert_is_fully_defined()
|
||||
if inputs.get_shape()[1] != inputs.get_shape()[2]:
|
||||
raise ValueError('Input tensor does not have equal width and height: ',
|
||||
inputs.get_shape()[1:3])
|
||||
width = inputs.get_shape().as_list()[1]
|
||||
if log(width, 2) != int(log(width, 2)):
|
||||
raise ValueError('Input tensor `width` is not a power of 2: ', width)
|
||||
|
||||
|
||||
# TODO(joelshor): Use fused batch norm by default. Investigate why some GAN
|
||||
# setups need the gradient of gradient FusedBatchNormGrad.
|
||||
def discriminator(inputs,
|
||||
depth=64,
|
||||
is_training=True,
|
||||
reuse=None,
|
||||
scope='Discriminator',
|
||||
fused_batch_norm=False):
|
||||
"""Discriminator network for DCGAN.
|
||||
|
||||
Construct discriminator network from inputs to the final endpoint.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch_size, height, width, channels]. Must be
|
||||
floating point.
|
||||
depth: Number of channels in first convolution layer.
|
||||
is_training: Whether the network is for training or not.
|
||||
reuse: Whether or not the network variables should be reused. `scope`
|
||||
must be given to be reused.
|
||||
scope: Optional variable_scope.
|
||||
fused_batch_norm: If `True`, use a faster, fused implementation of
|
||||
batch norm.
|
||||
|
||||
Returns:
|
||||
logits: The pre-softmax activations, a tensor of size [batch_size, 1]
|
||||
end_points: a dictionary from components of the network to their activation.
|
||||
|
||||
Raises:
|
||||
ValueError: If the input image shape is not 4-dimensional, if the spatial
|
||||
dimensions aren't defined at graph construction time, if the spatial
|
||||
dimensions aren't square, or if the spatial dimensions aren't a power of
|
||||
two.
|
||||
"""
|
||||
|
||||
normalizer_fn = slim.batch_norm
|
||||
normalizer_fn_args = {
|
||||
'is_training': is_training,
|
||||
'zero_debias_moving_mean': True,
|
||||
'fused': fused_batch_norm,
|
||||
}
|
||||
|
||||
_validate_image_inputs(inputs)
|
||||
inp_shape = inputs.get_shape().as_list()[1]
|
||||
|
||||
end_points = {}
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, values=[inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
|
||||
with slim.arg_scope([slim.conv2d],
|
||||
stride=2,
|
||||
kernel_size=4,
|
||||
activation_fn=tf.nn.leaky_relu):
|
||||
net = inputs
|
||||
for i in xrange(int(log(inp_shape, 2))):
|
||||
scope = 'conv%i' % (i + 1)
|
||||
current_depth = depth * 2**i
|
||||
normalizer_fn_ = None if i == 0 else normalizer_fn
|
||||
net = slim.conv2d(
|
||||
net, current_depth, normalizer_fn=normalizer_fn_, scope=scope)
|
||||
end_points[scope] = net
|
||||
|
||||
logits = slim.conv2d(net, 1, kernel_size=1, stride=1, padding='VALID',
|
||||
normalizer_fn=None, activation_fn=None)
|
||||
logits = tf.reshape(logits, [-1, 1])
|
||||
end_points['logits'] = logits
|
||||
|
||||
return logits, end_points
|
||||
|
||||
|
||||
# TODO(joelshor): Use fused batch norm by default. Investigate why some GAN
|
||||
# setups need the gradient of gradient FusedBatchNormGrad.
|
||||
def generator(inputs,
|
||||
depth=64,
|
||||
final_size=32,
|
||||
num_outputs=3,
|
||||
is_training=True,
|
||||
reuse=None,
|
||||
scope='Generator',
|
||||
fused_batch_norm=False):
|
||||
"""Generator network for DCGAN.
|
||||
|
||||
Construct generator network from inputs to the final endpoint.
|
||||
|
||||
Args:
|
||||
inputs: A tensor with any size N. [batch_size, N]
|
||||
depth: Number of channels in last deconvolution layer.
|
||||
final_size: The shape of the final output.
|
||||
num_outputs: Number of output features. For images, this is the number of
|
||||
channels.
|
||||
is_training: whether is training or not.
|
||||
reuse: Whether or not the network has its variables should be reused. scope
|
||||
must be given to be reused.
|
||||
scope: Optional variable_scope.
|
||||
fused_batch_norm: If `True`, use a faster, fused implementation of
|
||||
batch norm.
|
||||
|
||||
Returns:
|
||||
logits: the pre-softmax activations, a tensor of size
|
||||
[batch_size, 32, 32, channels]
|
||||
end_points: a dictionary from components of the network to their activation.
|
||||
|
||||
Raises:
|
||||
ValueError: If `inputs` is not 2-dimensional.
|
||||
ValueError: If `final_size` isn't a power of 2 or is less than 8.
|
||||
"""
|
||||
normalizer_fn = slim.batch_norm
|
||||
normalizer_fn_args = {
|
||||
'is_training': is_training,
|
||||
'zero_debias_moving_mean': True,
|
||||
'fused': fused_batch_norm,
|
||||
}
|
||||
|
||||
inputs.get_shape().assert_has_rank(2)
|
||||
if log(final_size, 2) != int(log(final_size, 2)):
|
||||
raise ValueError('`final_size` (%i) must be a power of 2.' % final_size)
|
||||
if final_size < 8:
|
||||
raise ValueError('`final_size` (%i) must be greater than 8.' % final_size)
|
||||
|
||||
end_points = {}
|
||||
num_layers = int(log(final_size, 2)) - 1
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, values=[inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
|
||||
with slim.arg_scope([slim.conv2d_transpose],
|
||||
normalizer_fn=normalizer_fn,
|
||||
stride=2,
|
||||
kernel_size=4):
|
||||
net = tf.expand_dims(tf.expand_dims(inputs, 1), 1)
|
||||
|
||||
# First upscaling is different because it takes the input vector.
|
||||
current_depth = depth * 2 ** (num_layers - 1)
|
||||
scope = 'deconv1'
|
||||
net = slim.conv2d_transpose(
|
||||
net, current_depth, stride=1, padding='VALID', scope=scope)
|
||||
end_points[scope] = net
|
||||
|
||||
for i in xrange(2, num_layers):
|
||||
scope = 'deconv%i' % (i)
|
||||
current_depth = depth * 2 ** (num_layers - i)
|
||||
net = slim.conv2d_transpose(net, current_depth, scope=scope)
|
||||
end_points[scope] = net
|
||||
|
||||
# Last layer has different normalizer and activation.
|
||||
scope = 'deconv%i' % (num_layers)
|
||||
net = slim.conv2d_transpose(
|
||||
net, depth, normalizer_fn=None, activation_fn=None, scope=scope)
|
||||
end_points[scope] = net
|
||||
|
||||
# Convert to proper channels.
|
||||
scope = 'logits'
|
||||
logits = slim.conv2d(
|
||||
net,
|
||||
num_outputs,
|
||||
normalizer_fn=None,
|
||||
activation_fn=None,
|
||||
kernel_size=1,
|
||||
stride=1,
|
||||
padding='VALID',
|
||||
scope=scope)
|
||||
end_points[scope] = logits
|
||||
|
||||
logits.get_shape().assert_has_rank(4)
|
||||
logits.get_shape().assert_is_compatible_with(
|
||||
[None, final_size, final_size, num_outputs])
|
||||
|
||||
return logits, end_points
|
||||
+121
@@ -0,0 +1,121 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for dcgan."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from six.moves import xrange # pylint: disable=redefined-builtin
|
||||
import tensorflow as tf
|
||||
|
||||
from nets import dcgan
|
||||
|
||||
|
||||
class DCGANTest(tf.test.TestCase):
|
||||
|
||||
def test_generator_run(self):
|
||||
tf.compat.v1.set_random_seed(1234)
|
||||
noise = tf.random.normal([100, 64])
|
||||
image, _ = dcgan.generator(noise)
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
image.eval()
|
||||
|
||||
def test_generator_graph(self):
|
||||
tf.compat.v1.set_random_seed(1234)
|
||||
# Check graph construction for a number of image size/depths and batch
|
||||
# sizes.
|
||||
for i, batch_size in zip(xrange(3, 7), xrange(3, 8)):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
final_size = 2 ** i
|
||||
noise = tf.random.normal([batch_size, 64])
|
||||
image, end_points = dcgan.generator(
|
||||
noise,
|
||||
depth=32,
|
||||
final_size=final_size)
|
||||
|
||||
self.assertAllEqual([batch_size, final_size, final_size, 3],
|
||||
image.shape.as_list())
|
||||
|
||||
expected_names = ['deconv%i' % j for j in xrange(1, i)] + ['logits']
|
||||
self.assertSetEqual(set(expected_names), set(end_points.keys()))
|
||||
|
||||
# Check layer depths.
|
||||
for j in range(1, i):
|
||||
layer = end_points['deconv%i' % j]
|
||||
self.assertEqual(32 * 2**(i-j-1), layer.get_shape().as_list()[-1])
|
||||
|
||||
def test_generator_invalid_input(self):
|
||||
wrong_dim_input = tf.zeros([5, 32, 32])
|
||||
with self.assertRaises(ValueError):
|
||||
dcgan.generator(wrong_dim_input)
|
||||
|
||||
correct_input = tf.zeros([3, 2])
|
||||
with self.assertRaisesRegexp(ValueError, 'must be a power of 2'):
|
||||
dcgan.generator(correct_input, final_size=30)
|
||||
|
||||
with self.assertRaisesRegexp(ValueError, 'must be greater than 8'):
|
||||
dcgan.generator(correct_input, final_size=4)
|
||||
|
||||
def test_discriminator_run(self):
|
||||
image = tf.random.uniform([5, 32, 32, 3], -1, 1)
|
||||
output, _ = dcgan.discriminator(image)
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output.eval()
|
||||
|
||||
def test_discriminator_graph(self):
|
||||
# Check graph construction for a number of image size/depths and batch
|
||||
# sizes.
|
||||
for i, batch_size in zip(xrange(1, 6), xrange(3, 8)):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
img_w = 2 ** i
|
||||
image = tf.random.uniform([batch_size, img_w, img_w, 3], -1, 1)
|
||||
output, end_points = dcgan.discriminator(
|
||||
image,
|
||||
depth=32)
|
||||
|
||||
self.assertAllEqual([batch_size, 1], output.get_shape().as_list())
|
||||
|
||||
expected_names = ['conv%i' % j for j in xrange(1, i+1)] + ['logits']
|
||||
self.assertSetEqual(set(expected_names), set(end_points.keys()))
|
||||
|
||||
# Check layer depths.
|
||||
for j in range(1, i+1):
|
||||
layer = end_points['conv%i' % j]
|
||||
self.assertEqual(32 * 2**(j-1), layer.get_shape().as_list()[-1])
|
||||
|
||||
def test_discriminator_invalid_input(self):
|
||||
wrong_dim_img = tf.zeros([5, 32, 32])
|
||||
with self.assertRaises(ValueError):
|
||||
dcgan.discriminator(wrong_dim_img)
|
||||
|
||||
spatially_undefined_shape = tf.compat.v1.placeholder(
|
||||
tf.float32, [5, 32, None, 3])
|
||||
with self.assertRaises(ValueError):
|
||||
dcgan.discriminator(spatially_undefined_shape)
|
||||
|
||||
not_square = tf.zeros([5, 32, 16, 3])
|
||||
with self.assertRaisesRegexp(ValueError, 'not have equal width and height'):
|
||||
dcgan.discriminator(not_square)
|
||||
|
||||
not_power_2 = tf.zeros([5, 30, 30, 3])
|
||||
with self.assertRaisesRegexp(ValueError, 'not a power of 2'):
|
||||
dcgan.discriminator(not_power_2)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+181
@@ -0,0 +1,181 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for Inflated 3D Inception V1 (I3D).
|
||||
|
||||
The network architecture is proposed by:
|
||||
Joao Carreira and Andrew Zisserman,
|
||||
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.
|
||||
https://arxiv.org/abs/1705.07750
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import i3d_utils
|
||||
from nets import s3dg
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
conv3d_spatiotemporal = i3d_utils.conv3d_spatiotemporal
|
||||
|
||||
|
||||
def i3d_arg_scope(weight_decay=1e-7,
|
||||
batch_norm_decay=0.999,
|
||||
batch_norm_epsilon=0.001,
|
||||
use_renorm=False,
|
||||
separable_conv3d=False):
|
||||
"""Defines default arg_scope for I3D.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
use_renorm: Whether to use batch renormalization or not.
|
||||
separable_conv3d: Whether to use separable 3d Convs.
|
||||
|
||||
Returns:
|
||||
sc: An arg_scope to use for the models.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
# Decay for the moving averages.
|
||||
'decay': batch_norm_decay,
|
||||
# epsilon to prevent 0s in variance.
|
||||
'epsilon': batch_norm_epsilon,
|
||||
# Turns off fused batch norm.
|
||||
'fused': False,
|
||||
'renorm': use_renorm,
|
||||
# collection containing the moving mean and moving variance.
|
||||
'variables_collections': {
|
||||
'beta': None,
|
||||
'gamma': None,
|
||||
'moving_mean': ['moving_vars'],
|
||||
'moving_variance': ['moving_vars'],
|
||||
}
|
||||
}
|
||||
|
||||
with slim.arg_scope(
|
||||
[slim.conv3d, conv3d_spatiotemporal],
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
activation_fn=tf.nn.relu,
|
||||
normalizer_fn=slim.batch_norm,
|
||||
normalizer_params=batch_norm_params):
|
||||
with slim.arg_scope(
|
||||
[conv3d_spatiotemporal], separable=separable_conv3d) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def i3d_base(inputs, final_endpoint='Mixed_5c',
|
||||
scope='InceptionV1'):
|
||||
"""Defines the I3D base architecture.
|
||||
|
||||
Note that we use the names as defined in Inception V1 to facilitate checkpoint
|
||||
conversion from an image-trained Inception V1 checkpoint to I3D checkpoint.
|
||||
|
||||
Args:
|
||||
inputs: A 5-D float tensor of size [batch_size, num_frames, height, width,
|
||||
channels].
|
||||
final_endpoint: Specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
|
||||
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
|
||||
'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b', 'Mixed_5c']
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
A dictionary from components of the network to the corresponding activation.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values.
|
||||
"""
|
||||
|
||||
return s3dg.s3dg_base(
|
||||
inputs,
|
||||
first_temporal_kernel_size=7,
|
||||
temporal_conv_startat='Conv2d_2c_3x3',
|
||||
gating_startat=None,
|
||||
final_endpoint=final_endpoint,
|
||||
min_depth=16,
|
||||
depth_multiplier=1.0,
|
||||
data_format='NDHWC',
|
||||
scope=scope)
|
||||
|
||||
|
||||
def i3d(inputs,
|
||||
num_classes=1000,
|
||||
dropout_keep_prob=0.8,
|
||||
is_training=True,
|
||||
prediction_fn=slim.softmax,
|
||||
spatial_squeeze=True,
|
||||
reuse=None,
|
||||
scope='InceptionV1'):
|
||||
"""Defines the I3D architecture.
|
||||
|
||||
The default image size used to train this network is 224x224.
|
||||
|
||||
Args:
|
||||
inputs: A 5-D float tensor of size [batch_size, num_frames, height, width,
|
||||
channels].
|
||||
num_classes: number of predicted classes.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
is_training: whether is training or not.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
|
||||
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
logits: the pre-softmax activations, a tensor of size
|
||||
[batch_size, num_classes]
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
"""
|
||||
# Final pooling and prediction
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionV1', [inputs, num_classes], reuse=reuse) as scope:
|
||||
with slim.arg_scope(
|
||||
[slim.batch_norm, slim.dropout], is_training=is_training):
|
||||
net, end_points = i3d_base(inputs, scope=scope)
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
kernel_size = i3d_utils.reduced_kernel_size_3d(net, [2, 7, 7])
|
||||
net = slim.avg_pool3d(
|
||||
net, kernel_size, stride=1, scope='AvgPool_0a_7x7')
|
||||
net = slim.dropout(net, dropout_keep_prob, scope='Dropout_0b')
|
||||
logits = slim.conv3d(
|
||||
net,
|
||||
num_classes, [1, 1, 1],
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
scope='Conv2d_0c_1x1')
|
||||
# Temporal average pooling.
|
||||
logits = tf.reduce_mean(input_tensor=logits, axis=1)
|
||||
if spatial_squeeze:
|
||||
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
|
||||
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
return logits, end_points
|
||||
|
||||
|
||||
i3d.default_image_size = 224
|
||||
+149
@@ -0,0 +1,149 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for networks.i3d."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from nets import i3d
|
||||
|
||||
|
||||
class I3DTest(tf.test.TestCase):
|
||||
|
||||
def testBuildClassificationNetwork(self):
|
||||
batch_size = 5
|
||||
num_frames = 64
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
logits, end_points = i3d.i3d(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Predictions' in end_points)
|
||||
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
num_frames = 64
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
mixed_6c, end_points = i3d.i3d_base(inputs)
|
||||
self.assertTrue(mixed_6c.op.name.startswith('InceptionV1/Mixed_5c'))
|
||||
self.assertListEqual(mixed_6c.get_shape().as_list(),
|
||||
[batch_size, 8, 7, 7, 1024])
|
||||
expected_endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b',
|
||||
'Mixed_3c', 'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c',
|
||||
'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2',
|
||||
'Mixed_5b', 'Mixed_5c']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
num_frames = 64
|
||||
height, width = 224, 224
|
||||
endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
|
||||
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d',
|
||||
'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b',
|
||||
'Mixed_5c']
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
out_tensor, end_points = i3d.i3d_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionV1/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index+1], end_points)
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoMixed5c(self):
|
||||
batch_size = 5
|
||||
num_frames = 64
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
_, end_points = i3d.i3d_base(inputs,
|
||||
final_endpoint='Mixed_5c')
|
||||
endpoints_shapes = {'Conv2d_1a_7x7': [5, 32, 112, 112, 64],
|
||||
'MaxPool_2a_3x3': [5, 32, 56, 56, 64],
|
||||
'Conv2d_2b_1x1': [5, 32, 56, 56, 64],
|
||||
'Conv2d_2c_3x3': [5, 32, 56, 56, 192],
|
||||
'MaxPool_3a_3x3': [5, 32, 28, 28, 192],
|
||||
'Mixed_3b': [5, 32, 28, 28, 256],
|
||||
'Mixed_3c': [5, 32, 28, 28, 480],
|
||||
'MaxPool_4a_3x3': [5, 16, 14, 14, 480],
|
||||
'Mixed_4b': [5, 16, 14, 14, 512],
|
||||
'Mixed_4c': [5, 16, 14, 14, 512],
|
||||
'Mixed_4d': [5, 16, 14, 14, 512],
|
||||
'Mixed_4e': [5, 16, 14, 14, 528],
|
||||
'Mixed_4f': [5, 16, 14, 14, 832],
|
||||
'MaxPool_5a_2x2': [5, 8, 7, 7, 832],
|
||||
'Mixed_5b': [5, 8, 7, 7, 832],
|
||||
'Mixed_5c': [5, 8, 7, 7, 1024]}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.iteritems():
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
num_frames = 64
|
||||
height, width = 112, 112
|
||||
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
mixed_5c, _ = i3d.i3d_base(inputs)
|
||||
self.assertTrue(mixed_5c.op.name.startswith('InceptionV1/Mixed_5c'))
|
||||
self.assertListEqual(mixed_5c.get_shape().as_list(),
|
||||
[batch_size, 8, 4, 4, 1024])
|
||||
|
||||
def testTenFrames(self):
|
||||
batch_size = 5
|
||||
num_frames = 10
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
mixed_5c, _ = i3d.i3d_base(inputs)
|
||||
self.assertTrue(mixed_5c.op.name.startswith('InceptionV1/Mixed_5c'))
|
||||
self.assertListEqual(mixed_5c.get_shape().as_list(),
|
||||
[batch_size, 2, 7, 7, 1024])
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
num_frames = 64
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
eval_inputs = tf.random.uniform((batch_size, num_frames, height, width, 3))
|
||||
logits, _ = i3d.i3d(eval_inputs, num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+289
@@ -0,0 +1,289 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Utilities for building I3D network models."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
|
||||
|
||||
# Orignaly, add_arg_scope = slim.add_arg_scope and layers = slim, now switch to
|
||||
# more update-to-date tf.contrib.* API.
|
||||
add_arg_scope = contrib_framework.add_arg_scope
|
||||
layers = contrib_layers
|
||||
|
||||
|
||||
def center_initializer():
|
||||
"""Centering Initializer for I3D.
|
||||
|
||||
This initializer allows identity mapping for temporal convolution at the
|
||||
initialization, which is critical for a desired convergence behavior
|
||||
for training a seprable I3D model.
|
||||
|
||||
The centering behavior of this initializer requires an odd-sized kernel,
|
||||
typically set to 3.
|
||||
|
||||
Returns:
|
||||
A weight initializer op used in temporal convolutional layers.
|
||||
|
||||
Raises:
|
||||
ValueError: Input tensor data type has to be tf.float32.
|
||||
ValueError: If input tensor is not a 5-D tensor.
|
||||
ValueError: If input and output channel dimensions are different.
|
||||
ValueError: If spatial kernel sizes are not 1.
|
||||
ValueError: If temporal kernel size is even.
|
||||
"""
|
||||
|
||||
def _initializer(shape, dtype=tf.float32, partition_info=None): # pylint: disable=unused-argument
|
||||
"""Initializer op."""
|
||||
|
||||
if dtype != tf.float32 and dtype != tf.bfloat16:
|
||||
raise ValueError(
|
||||
'Input tensor data type has to be tf.float32 or tf.bfloat16.')
|
||||
if len(shape) != 5:
|
||||
raise ValueError('Input tensor has to be 5-D.')
|
||||
if shape[3] != shape[4]:
|
||||
raise ValueError('Input and output channel dimensions must be the same.')
|
||||
if shape[1] != 1 or shape[2] != 1:
|
||||
raise ValueError('Spatial kernel sizes must be 1 (pointwise conv).')
|
||||
if shape[0] % 2 == 0:
|
||||
raise ValueError('Temporal kernel size has to be odd.')
|
||||
|
||||
center_pos = int(shape[0] / 2)
|
||||
init_mat = np.zeros(
|
||||
[shape[0], shape[1], shape[2], shape[3], shape[4]], dtype=np.float32)
|
||||
for i in range(0, shape[3]):
|
||||
init_mat[center_pos, 0, 0, i, i] = 1.0
|
||||
|
||||
init_op = tf.constant(init_mat, dtype=dtype)
|
||||
return init_op
|
||||
|
||||
return _initializer
|
||||
|
||||
|
||||
@add_arg_scope
|
||||
def conv3d_spatiotemporal(inputs,
|
||||
num_outputs,
|
||||
kernel_size,
|
||||
stride=1,
|
||||
padding='SAME',
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
normalizer_params=None,
|
||||
weights_regularizer=None,
|
||||
separable=False,
|
||||
data_format='NDHWC',
|
||||
scope=''):
|
||||
"""A wrapper for conv3d to model spatiotemporal representations.
|
||||
|
||||
This allows switching between original 3D convolution and separable 3D
|
||||
convolutions for spatial and temporal features respectively. On Kinetics,
|
||||
seprable 3D convolutions yields better classification performance.
|
||||
|
||||
Args:
|
||||
inputs: a 5-D tensor `[batch_size, depth, height, width, channels]`.
|
||||
num_outputs: integer, the number of output filters.
|
||||
kernel_size: a list of length 3
|
||||
`[kernel_depth, kernel_height, kernel_width]` of the filters. Can be an
|
||||
int if all values are the same.
|
||||
stride: a list of length 3 `[stride_depth, stride_height, stride_width]`.
|
||||
Can be an int if all strides are the same.
|
||||
padding: one of `VALID` or `SAME`.
|
||||
activation_fn: activation function.
|
||||
normalizer_fn: normalization function to use instead of `biases`.
|
||||
normalizer_params: dictionary of normalization function parameters.
|
||||
weights_regularizer: Optional regularizer for the weights.
|
||||
separable: If `True`, use separable spatiotemporal convolutions.
|
||||
data_format: An optional string from: "NDHWC", "NCDHW". Defaults to "NDHWC".
|
||||
The data format of the input and output data. With the default format
|
||||
"NDHWC", the data is stored in the order of: [batch, in_depth, in_height,
|
||||
in_width, in_channels]. Alternatively, the format could be "NCDHW", the
|
||||
data storage order is:
|
||||
[batch, in_channels, in_depth, in_height, in_width].
|
||||
scope: scope for `variable_scope`.
|
||||
|
||||
Returns:
|
||||
A tensor representing the output of the (separable) conv3d operation.
|
||||
|
||||
"""
|
||||
assert len(kernel_size) == 3
|
||||
if separable and kernel_size[0] != 1:
|
||||
spatial_kernel_size = [1, kernel_size[1], kernel_size[2]]
|
||||
temporal_kernel_size = [kernel_size[0], 1, 1]
|
||||
if isinstance(stride, list) and len(stride) == 3:
|
||||
spatial_stride = [1, stride[1], stride[2]]
|
||||
temporal_stride = [stride[0], 1, 1]
|
||||
else:
|
||||
spatial_stride = [1, stride, stride]
|
||||
temporal_stride = [stride, 1, 1]
|
||||
net = layers.conv3d(
|
||||
inputs,
|
||||
num_outputs,
|
||||
spatial_kernel_size,
|
||||
stride=spatial_stride,
|
||||
padding=padding,
|
||||
activation_fn=activation_fn,
|
||||
normalizer_fn=normalizer_fn,
|
||||
normalizer_params=normalizer_params,
|
||||
weights_regularizer=weights_regularizer,
|
||||
data_format=data_format,
|
||||
scope=scope)
|
||||
net = layers.conv3d(
|
||||
net,
|
||||
num_outputs,
|
||||
temporal_kernel_size,
|
||||
stride=temporal_stride,
|
||||
padding=padding,
|
||||
scope=scope + '/temporal',
|
||||
activation_fn=activation_fn,
|
||||
normalizer_fn=None,
|
||||
data_format=data_format,
|
||||
weights_initializer=center_initializer())
|
||||
return net
|
||||
else:
|
||||
return layers.conv3d(
|
||||
inputs,
|
||||
num_outputs,
|
||||
kernel_size,
|
||||
stride=stride,
|
||||
padding=padding,
|
||||
activation_fn=activation_fn,
|
||||
normalizer_fn=normalizer_fn,
|
||||
normalizer_params=normalizer_params,
|
||||
weights_regularizer=weights_regularizer,
|
||||
data_format=data_format,
|
||||
scope=scope)
|
||||
|
||||
|
||||
@add_arg_scope
|
||||
def inception_block_v1_3d(inputs,
|
||||
num_outputs_0_0a,
|
||||
num_outputs_1_0a,
|
||||
num_outputs_1_0b,
|
||||
num_outputs_2_0a,
|
||||
num_outputs_2_0b,
|
||||
num_outputs_3_0b,
|
||||
temporal_kernel_size=3,
|
||||
self_gating_fn=None,
|
||||
data_format='NDHWC',
|
||||
scope=''):
|
||||
"""A 3D Inception v1 block.
|
||||
|
||||
This allows use of separable 3D convolutions and self-gating, as
|
||||
described in:
|
||||
Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu and Kevin Murphy,
|
||||
Rethinking Spatiotemporal Feature Learning For Video Understanding.
|
||||
https://arxiv.org/abs/1712.04851.
|
||||
|
||||
Args:
|
||||
inputs: a 5-D tensor `[batch_size, depth, height, width, channels]`.
|
||||
num_outputs_0_0a: integer, the number of output filters for Branch 0,
|
||||
operation Conv2d_0a_1x1.
|
||||
num_outputs_1_0a: integer, the number of output filters for Branch 1,
|
||||
operation Conv2d_0a_1x1.
|
||||
num_outputs_1_0b: integer, the number of output filters for Branch 1,
|
||||
operation Conv2d_0b_3x3.
|
||||
num_outputs_2_0a: integer, the number of output filters for Branch 2,
|
||||
operation Conv2d_0a_1x1.
|
||||
num_outputs_2_0b: integer, the number of output filters for Branch 2,
|
||||
operation Conv2d_0b_3x3.
|
||||
num_outputs_3_0b: integer, the number of output filters for Branch 3,
|
||||
operation Conv2d_0b_1x1.
|
||||
temporal_kernel_size: integer, the size of the temporal convolutional
|
||||
filters in the conv3d_spatiotemporal blocks.
|
||||
self_gating_fn: function which optionally performs self-gating.
|
||||
Must have two arguments, `inputs` and `scope`, and return one output
|
||||
tensor the same size as `inputs`. If `None`, no self-gating is
|
||||
applied.
|
||||
data_format: An optional string from: "NDHWC", "NCDHW". Defaults to "NDHWC".
|
||||
The data format of the input and output data. With the default format
|
||||
"NDHWC", the data is stored in the order of: [batch, in_depth, in_height,
|
||||
in_width, in_channels]. Alternatively, the format could be "NCDHW", the
|
||||
data storage order is:
|
||||
[batch, in_channels, in_depth, in_height, in_width].
|
||||
scope: scope for `variable_scope`.
|
||||
|
||||
Returns:
|
||||
A 5-D tensor `[batch_size, depth, height, width, out_channels]`, where
|
||||
`out_channels = num_outputs_0_0a + num_outputs_1_0b + num_outputs_2_0b
|
||||
+ num_outputs_3_0b`.
|
||||
|
||||
"""
|
||||
use_gating = self_gating_fn is not None
|
||||
|
||||
with tf.compat.v1.variable_scope(scope):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = layers.conv3d(
|
||||
inputs, num_outputs_0_0a, [1, 1, 1], scope='Conv2d_0a_1x1')
|
||||
if use_gating:
|
||||
branch_0 = self_gating_fn(branch_0, scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = layers.conv3d(
|
||||
inputs, num_outputs_1_0a, [1, 1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = conv3d_spatiotemporal(
|
||||
branch_1, num_outputs_1_0b, [temporal_kernel_size, 3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
if use_gating:
|
||||
branch_1 = self_gating_fn(branch_1, scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = layers.conv3d(
|
||||
inputs, num_outputs_2_0a, [1, 1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = conv3d_spatiotemporal(
|
||||
branch_2, num_outputs_2_0b, [temporal_kernel_size, 3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
if use_gating:
|
||||
branch_2 = self_gating_fn(branch_2, scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = layers.max_pool3d(inputs, [3, 3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = layers.conv3d(
|
||||
branch_3, num_outputs_3_0b, [1, 1, 1], scope='Conv2d_0b_1x1')
|
||||
if use_gating:
|
||||
branch_3 = self_gating_fn(branch_3, scope='Conv2d_0b_1x1')
|
||||
index_c = data_format.index('C')
|
||||
assert 1 <= index_c <= 4, 'Cannot identify channel dimension.'
|
||||
output = tf.concat([branch_0, branch_1, branch_2, branch_3], index_c)
|
||||
return output
|
||||
|
||||
|
||||
def reduced_kernel_size_3d(input_tensor, kernel_size):
|
||||
"""Define kernel size which is automatically reduced for small input.
|
||||
|
||||
If the shape of the input images is unknown at graph construction time this
|
||||
function assumes that the input images are large enough.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor of size
|
||||
[batch_size, time, height, width, channels].
|
||||
kernel_size: desired kernel size of length 3, corresponding to time,
|
||||
height and width.
|
||||
|
||||
Returns:
|
||||
a tensor with the kernel size.
|
||||
"""
|
||||
assert len(kernel_size) == 3
|
||||
shape = input_tensor.get_shape().as_list()
|
||||
assert len(shape) == 5
|
||||
if None in shape[1:4]:
|
||||
kernel_size_out = kernel_size
|
||||
else:
|
||||
kernel_size_out = [min(shape[1], kernel_size[0]),
|
||||
min(shape[2], kernel_size[1]),
|
||||
min(shape[3], kernel_size[2])]
|
||||
return kernel_size_out
|
||||
+37
@@ -0,0 +1,37 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Brings all inception models under one namespace."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
# pylint: disable=unused-import
|
||||
from nets.inception_resnet_v2 import inception_resnet_v2
|
||||
from nets.inception_resnet_v2 import inception_resnet_v2_arg_scope
|
||||
from nets.inception_resnet_v2 import inception_resnet_v2_base
|
||||
from nets.inception_v1 import inception_v1
|
||||
from nets.inception_v1 import inception_v1_arg_scope
|
||||
from nets.inception_v1 import inception_v1_base
|
||||
from nets.inception_v2 import inception_v2
|
||||
from nets.inception_v2 import inception_v2_arg_scope
|
||||
from nets.inception_v2 import inception_v2_base
|
||||
from nets.inception_v3 import inception_v3
|
||||
from nets.inception_v3 import inception_v3_arg_scope
|
||||
from nets.inception_v3 import inception_v3_base
|
||||
from nets.inception_v4 import inception_v4
|
||||
from nets.inception_v4 import inception_v4_arg_scope
|
||||
from nets.inception_v4 import inception_v4_base
|
||||
# pylint: enable=unused-import
|
||||
+408
@@ -0,0 +1,408 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition of the Inception Resnet V2 architecture.
|
||||
|
||||
As described in http://arxiv.org/abs/1602.07261.
|
||||
|
||||
Inception-v4, Inception-ResNet and the Impact of Residual Connections
|
||||
on Learning
|
||||
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
|
||||
"""Builds the 35x35 resnet block."""
|
||||
with tf.compat.v1.variable_scope(scope, 'Block35', [net], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3')
|
||||
tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3')
|
||||
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2])
|
||||
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
|
||||
activation_fn=None, scope='Conv2d_1x1')
|
||||
scaled_up = up * scale
|
||||
if activation_fn == tf.nn.relu6:
|
||||
# Use clip_by_value to simulate bandpass activation.
|
||||
scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)
|
||||
|
||||
net += scaled_up
|
||||
if activation_fn:
|
||||
net = activation_fn(net)
|
||||
return net
|
||||
|
||||
|
||||
def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
|
||||
"""Builds the 17x17 resnet block."""
|
||||
with tf.compat.v1.variable_scope(scope, 'Block17', [net], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
|
||||
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
|
||||
activation_fn=None, scope='Conv2d_1x1')
|
||||
|
||||
scaled_up = up * scale
|
||||
if activation_fn == tf.nn.relu6:
|
||||
# Use clip_by_value to simulate bandpass activation.
|
||||
scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)
|
||||
|
||||
net += scaled_up
|
||||
if activation_fn:
|
||||
net = activation_fn(net)
|
||||
return net
|
||||
|
||||
|
||||
def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
|
||||
"""Builds the 8x8 resnet block."""
|
||||
with tf.compat.v1.variable_scope(scope, 'Block8', [net], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3],
|
||||
scope='Conv2d_0b_1x3')
|
||||
tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1],
|
||||
scope='Conv2d_0c_3x1')
|
||||
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
|
||||
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
|
||||
activation_fn=None, scope='Conv2d_1x1')
|
||||
|
||||
scaled_up = up * scale
|
||||
if activation_fn == tf.nn.relu6:
|
||||
# Use clip_by_value to simulate bandpass activation.
|
||||
scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)
|
||||
|
||||
net += scaled_up
|
||||
if activation_fn:
|
||||
net = activation_fn(net)
|
||||
return net
|
||||
|
||||
|
||||
def inception_resnet_v2_base(inputs,
|
||||
final_endpoint='Conv2d_7b_1x1',
|
||||
output_stride=16,
|
||||
align_feature_maps=False,
|
||||
scope=None,
|
||||
activation_fn=tf.nn.relu):
|
||||
"""Inception model from http://arxiv.org/abs/1602.07261.
|
||||
|
||||
Constructs an Inception Resnet v2 network from inputs to the given final
|
||||
endpoint. This method can construct the network up to the final inception
|
||||
block Conv2d_7b_1x1.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
final_endpoint: specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3',
|
||||
'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1']
|
||||
output_stride: A scalar that specifies the requested ratio of input to
|
||||
output spatial resolution. Only supports 8 and 16.
|
||||
align_feature_maps: When true, changes all the VALID paddings in the network
|
||||
to SAME padding so that the feature maps are aligned.
|
||||
scope: Optional variable_scope.
|
||||
activation_fn: Activation function for block scopes.
|
||||
|
||||
Returns:
|
||||
tensor_out: output tensor corresponding to the final_endpoint.
|
||||
end_points: a set of activations for external use, for example summaries or
|
||||
losses.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
or if the output_stride is not 8 or 16, or if the output_stride is 8 and
|
||||
we request an end point after 'PreAuxLogits'.
|
||||
"""
|
||||
if output_stride != 8 and output_stride != 16:
|
||||
raise ValueError('output_stride must be 8 or 16.')
|
||||
|
||||
padding = 'SAME' if align_feature_maps else 'VALID'
|
||||
|
||||
end_points = {}
|
||||
|
||||
def add_and_check_final(name, net):
|
||||
end_points[name] = net
|
||||
return name == final_endpoint
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'InceptionResnetV2', [inputs]):
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
# 149 x 149 x 32
|
||||
net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points
|
||||
|
||||
# 147 x 147 x 32
|
||||
net = slim.conv2d(net, 32, 3, padding=padding,
|
||||
scope='Conv2d_2a_3x3')
|
||||
if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points
|
||||
# 147 x 147 x 64
|
||||
net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3')
|
||||
if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points
|
||||
# 73 x 73 x 64
|
||||
net = slim.max_pool2d(net, 3, stride=2, padding=padding,
|
||||
scope='MaxPool_3a_3x3')
|
||||
if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points
|
||||
# 73 x 73 x 80
|
||||
net = slim.conv2d(net, 80, 1, padding=padding,
|
||||
scope='Conv2d_3b_1x1')
|
||||
if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points
|
||||
# 71 x 71 x 192
|
||||
net = slim.conv2d(net, 192, 3, padding=padding,
|
||||
scope='Conv2d_4a_3x3')
|
||||
if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points
|
||||
# 35 x 35 x 192
|
||||
net = slim.max_pool2d(net, 3, stride=2, padding=padding,
|
||||
scope='MaxPool_5a_3x3')
|
||||
if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points
|
||||
|
||||
# 35 x 35 x 320
|
||||
with tf.compat.v1.variable_scope('Mixed_5b'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5,
|
||||
scope='Conv2d_0b_5x5')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3,
|
||||
scope='Conv2d_0b_3x3')
|
||||
tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3,
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME',
|
||||
scope='AvgPool_0a_3x3')
|
||||
tower_pool_1 = slim.conv2d(tower_pool, 64, 1,
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
[tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3)
|
||||
|
||||
if add_and_check_final('Mixed_5b', net): return net, end_points
|
||||
# TODO(alemi): Register intermediate endpoints
|
||||
net = slim.repeat(net, 10, block35, scale=0.17,
|
||||
activation_fn=activation_fn)
|
||||
|
||||
# 17 x 17 x 1088 if output_stride == 8,
|
||||
# 33 x 33 x 1088 if output_stride == 16
|
||||
use_atrous = output_stride == 8
|
||||
|
||||
with tf.compat.v1.variable_scope('Mixed_6a'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2,
|
||||
padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3,
|
||||
scope='Conv2d_0b_3x3')
|
||||
tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3,
|
||||
stride=1 if use_atrous else 2,
|
||||
padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2,
|
||||
padding=padding,
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3)
|
||||
|
||||
if add_and_check_final('Mixed_6a', net): return net, end_points
|
||||
|
||||
# TODO(alemi): register intermediate endpoints
|
||||
with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1):
|
||||
net = slim.repeat(net, 20, block17, scale=0.10,
|
||||
activation_fn=activation_fn)
|
||||
if add_and_check_final('PreAuxLogits', net): return net, end_points
|
||||
|
||||
if output_stride == 8:
|
||||
# TODO(gpapan): Properly support output_stride for the rest of the net.
|
||||
raise ValueError('output_stride==8 is only supported up to the '
|
||||
'PreAuxlogits end_point for now.')
|
||||
|
||||
# 8 x 8 x 2080
|
||||
with tf.compat.v1.variable_scope('Mixed_7a'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2,
|
||||
padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2,
|
||||
padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
|
||||
tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3,
|
||||
scope='Conv2d_0b_3x3')
|
||||
tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2,
|
||||
padding=padding,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
tower_pool = slim.max_pool2d(net, 3, stride=2,
|
||||
padding=padding,
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(
|
||||
[tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3)
|
||||
|
||||
if add_and_check_final('Mixed_7a', net): return net, end_points
|
||||
|
||||
# TODO(alemi): register intermediate endpoints
|
||||
net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn)
|
||||
net = block8(net, activation_fn=None)
|
||||
|
||||
# 8 x 8 x 1536
|
||||
net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1')
|
||||
if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points
|
||||
|
||||
raise ValueError('final_endpoint (%s) not recognized', final_endpoint)
|
||||
|
||||
|
||||
def inception_resnet_v2(inputs, num_classes=1001, is_training=True,
|
||||
dropout_keep_prob=0.8,
|
||||
reuse=None,
|
||||
scope='InceptionResnetV2',
|
||||
create_aux_logits=True,
|
||||
activation_fn=tf.nn.relu):
|
||||
"""Creates the Inception Resnet V2 model.
|
||||
|
||||
Args:
|
||||
inputs: a 4-D tensor of size [batch_size, height, width, 3].
|
||||
Dimension batch_size may be undefined. If create_aux_logits is false,
|
||||
also height and width may be undefined.
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
is_training: whether is training or not.
|
||||
dropout_keep_prob: float, the fraction to keep before final layer.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
create_aux_logits: Whether to include the auxilliary logits.
|
||||
activation_fn: Activation function for conv2d.
|
||||
|
||||
Returns:
|
||||
net: the output of the logits layer (if num_classes is a non-zero integer),
|
||||
or the non-dropped-out input to the logits layer (if num_classes is 0 or
|
||||
None).
|
||||
end_points: the set of end_points from the inception model.
|
||||
"""
|
||||
end_points = {}
|
||||
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionResnetV2', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
|
||||
net, end_points = inception_resnet_v2_base(inputs, scope=scope,
|
||||
activation_fn=activation_fn)
|
||||
|
||||
if create_aux_logits and num_classes:
|
||||
with tf.compat.v1.variable_scope('AuxLogits'):
|
||||
aux = end_points['PreAuxLogits']
|
||||
aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID',
|
||||
scope='Conv2d_1a_3x3')
|
||||
aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1')
|
||||
aux = slim.conv2d(aux, 768, aux.get_shape()[1:3],
|
||||
padding='VALID', scope='Conv2d_2a_5x5')
|
||||
aux = slim.flatten(aux)
|
||||
aux = slim.fully_connected(aux, num_classes, activation_fn=None,
|
||||
scope='Logits')
|
||||
end_points['AuxLogits'] = aux
|
||||
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
# TODO(sguada,arnoegw): Consider adding a parameter global_pool which
|
||||
# can be set to False to disable pooling here (as in resnet_*()).
|
||||
kernel_size = net.get_shape()[1:3]
|
||||
if kernel_size.is_fully_defined():
|
||||
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
|
||||
scope='AvgPool_1a_8x8')
|
||||
else:
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
net = slim.flatten(net)
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='Dropout')
|
||||
end_points['PreLogitsFlatten'] = net
|
||||
logits = slim.fully_connected(net, num_classes, activation_fn=None,
|
||||
scope='Logits')
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
|
||||
|
||||
return logits, end_points
|
||||
inception_resnet_v2.default_image_size = 299
|
||||
|
||||
|
||||
def inception_resnet_v2_arg_scope(
|
||||
weight_decay=0.00004,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=0.001,
|
||||
activation_fn=tf.nn.relu,
|
||||
batch_norm_updates_collections=tf.compat.v1.GraphKeys.UPDATE_OPS,
|
||||
batch_norm_scale=False):
|
||||
"""Returns the scope with the default parameters for inception_resnet_v2.
|
||||
|
||||
Args:
|
||||
weight_decay: the weight decay for weights variables.
|
||||
batch_norm_decay: decay for the moving average of batch_norm momentums.
|
||||
batch_norm_epsilon: small float added to variance to avoid dividing by zero.
|
||||
activation_fn: Activation function for conv2d.
|
||||
batch_norm_updates_collections: Collection for the update ops for
|
||||
batch norm.
|
||||
batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
|
||||
activations in the batch normalization layer.
|
||||
|
||||
Returns:
|
||||
a arg_scope with the parameters needed for inception_resnet_v2.
|
||||
"""
|
||||
# Set weight_decay for weights in conv2d and fully_connected layers.
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected],
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
biases_regularizer=slim.l2_regularizer(weight_decay)):
|
||||
|
||||
batch_norm_params = {
|
||||
'decay': batch_norm_decay,
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'updates_collections': batch_norm_updates_collections,
|
||||
'fused': None, # Use fused batch norm if possible.
|
||||
'scale': batch_norm_scale,
|
||||
}
|
||||
# Set activation_fn and parameters for batch_norm.
|
||||
with slim.arg_scope([slim.conv2d], activation_fn=activation_fn,
|
||||
normalizer_fn=slim.batch_norm,
|
||||
normalizer_params=batch_norm_params) as scope:
|
||||
return scope
|
||||
+338
@@ -0,0 +1,338 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.inception_resnet_v2."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception
|
||||
|
||||
|
||||
class InceptionTest(tf.test.TestCase):
|
||||
|
||||
def testBuildLogits(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, endpoints = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue('AuxLogits' in endpoints)
|
||||
auxlogits = endpoints['AuxLogits']
|
||||
self.assertTrue(
|
||||
auxlogits.op.name.startswith('InceptionResnetV2/AuxLogits'))
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildWithoutAuxLogits(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, endpoints = inception.inception_resnet_v2(inputs, num_classes,
|
||||
create_aux_logits=False)
|
||||
self.assertTrue('AuxLogits' not in endpoints)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildNoClasses(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = None
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, endpoints = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue('AuxLogits' not in endpoints)
|
||||
self.assertTrue('Logits' not in endpoints)
|
||||
self.assertTrue(
|
||||
net.op.name.startswith('InceptionResnetV2/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 1536])
|
||||
|
||||
def testBuildEndPoints(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue('Logits' in end_points)
|
||||
logits = end_points['Logits']
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('AuxLogits' in end_points)
|
||||
aux_logits = end_points['AuxLogits']
|
||||
self.assertListEqual(aux_logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_7b_1x1']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 8, 8, 1536])
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_resnet_v2_base(inputs)
|
||||
self.assertTrue(net.op.name.startswith('InceptionResnetV2/Conv2d_7b_1x1'))
|
||||
self.assertListEqual(net.get_shape().as_list(),
|
||||
[batch_size, 8, 8, 1536])
|
||||
expected_endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
|
||||
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a',
|
||||
'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
|
||||
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a',
|
||||
'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1']
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = inception.inception_resnet_v2_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
if endpoint != 'PreAuxLogits':
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionResnetV2/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index+1], end_points.keys())
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoPreAuxLogits(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_resnet_v2_base(
|
||||
inputs, final_endpoint='PreAuxLogits')
|
||||
endpoints_shapes = {'Conv2d_1a_3x3': [5, 149, 149, 32],
|
||||
'Conv2d_2a_3x3': [5, 147, 147, 32],
|
||||
'Conv2d_2b_3x3': [5, 147, 147, 64],
|
||||
'MaxPool_3a_3x3': [5, 73, 73, 64],
|
||||
'Conv2d_3b_1x1': [5, 73, 73, 80],
|
||||
'Conv2d_4a_3x3': [5, 71, 71, 192],
|
||||
'MaxPool_5a_3x3': [5, 35, 35, 192],
|
||||
'Mixed_5b': [5, 35, 35, 320],
|
||||
'Mixed_6a': [5, 17, 17, 1088],
|
||||
'PreAuxLogits': [5, 17, 17, 1088]
|
||||
}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoPreAuxLogitsWithAlignedFeatureMaps(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_resnet_v2_base(
|
||||
inputs, final_endpoint='PreAuxLogits', align_feature_maps=True)
|
||||
endpoints_shapes = {'Conv2d_1a_3x3': [5, 150, 150, 32],
|
||||
'Conv2d_2a_3x3': [5, 150, 150, 32],
|
||||
'Conv2d_2b_3x3': [5, 150, 150, 64],
|
||||
'MaxPool_3a_3x3': [5, 75, 75, 64],
|
||||
'Conv2d_3b_1x1': [5, 75, 75, 80],
|
||||
'Conv2d_4a_3x3': [5, 75, 75, 192],
|
||||
'MaxPool_5a_3x3': [5, 38, 38, 192],
|
||||
'Mixed_5b': [5, 38, 38, 320],
|
||||
'Mixed_6a': [5, 19, 19, 1088],
|
||||
'PreAuxLogits': [5, 19, 19, 1088]
|
||||
}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoPreAuxLogitsWithOutputStrideEight(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_resnet_v2_base(
|
||||
inputs, final_endpoint='PreAuxLogits', output_stride=8)
|
||||
endpoints_shapes = {'Conv2d_1a_3x3': [5, 149, 149, 32],
|
||||
'Conv2d_2a_3x3': [5, 147, 147, 32],
|
||||
'Conv2d_2b_3x3': [5, 147, 147, 64],
|
||||
'MaxPool_3a_3x3': [5, 73, 73, 64],
|
||||
'Conv2d_3b_1x1': [5, 73, 73, 80],
|
||||
'Conv2d_4a_3x3': [5, 71, 71, 192],
|
||||
'MaxPool_5a_3x3': [5, 35, 35, 192],
|
||||
'Mixed_5b': [5, 35, 35, 320],
|
||||
'Mixed_6a': [5, 33, 33, 1088],
|
||||
'PreAuxLogits': [5, 33, 33, 1088]
|
||||
}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testVariablesSetDevice(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
# Force all Variables to reside on the device.
|
||||
with tf.compat.v1.variable_scope('on_cpu'), tf.device('/cpu:0'):
|
||||
inception.inception_resnet_v2(inputs, num_classes)
|
||||
with tf.compat.v1.variable_scope('on_gpu'), tf.device('/gpu:0'):
|
||||
inception.inception_resnet_v2(inputs, num_classes)
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'):
|
||||
self.assertDeviceEqual(v.device, '/cpu:0')
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'):
|
||||
self.assertDeviceEqual(v.device, '/gpu:0')
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_7b_1x1']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 3, 3, 1536])
|
||||
|
||||
def testGlobalPool(self):
|
||||
batch_size = 1
|
||||
height, width = 330, 400
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_7b_1x1']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 8, 11, 1536])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
batch_size = 1
|
||||
height, width = 330, 400
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_resnet_v2(
|
||||
inputs, num_classes, create_aux_logits=False)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_7b_1x1']
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
logits_out, pre_pool_out = sess.run([logits, pre_pool],
|
||||
{inputs: images.eval()})
|
||||
self.assertTupleEqual(logits_out.shape, (batch_size, num_classes))
|
||||
self.assertTupleEqual(pre_pool_out.shape, (batch_size, 8, 11, 1536))
|
||||
|
||||
def testUnknownBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = inception.inception_resnet_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_resnet_v2(eval_inputs,
|
||||
num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
inception.inception_resnet_v2(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_resnet_v2(eval_inputs,
|
||||
num_classes,
|
||||
is_training=False,
|
||||
reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testNoBatchNormScaleByDefault(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with contrib_slim.arg_scope(inception.inception_resnet_v2_arg_scope()):
|
||||
inception.inception_resnet_v2(inputs, num_classes, is_training=False)
|
||||
|
||||
self.assertEqual(tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'), [])
|
||||
|
||||
def testBatchNormScale(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with contrib_slim.arg_scope(
|
||||
inception.inception_resnet_v2_arg_scope(batch_norm_scale=True)):
|
||||
inception.inception_resnet_v2(inputs, num_classes, is_training=False)
|
||||
|
||||
gamma_names = set(
|
||||
v.op.name
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'))
|
||||
self.assertGreater(len(gamma_names), 0)
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/moving_mean:0$'):
|
||||
self.assertIn(v.op.name[:-len('moving_mean')] + 'gamma', gamma_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+84
@@ -0,0 +1,84 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains common code shared by all inception models.
|
||||
|
||||
Usage of arg scope:
|
||||
with slim.arg_scope(inception_arg_scope()):
|
||||
logits, end_points = inception.inception_v3(images, num_classes,
|
||||
is_training=is_training)
|
||||
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def inception_arg_scope(
|
||||
weight_decay=0.00004,
|
||||
use_batch_norm=True,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=0.001,
|
||||
activation_fn=tf.nn.relu,
|
||||
batch_norm_updates_collections=tf.compat.v1.GraphKeys.UPDATE_OPS,
|
||||
batch_norm_scale=False):
|
||||
"""Defines the default arg scope for inception models.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
use_batch_norm: "If `True`, batch_norm is applied after each convolution.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
activation_fn: Activation function for conv2d.
|
||||
batch_norm_updates_collections: Collection for the update ops for
|
||||
batch norm.
|
||||
batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
|
||||
activations in the batch normalization layer.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the inception models.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
# Decay for the moving averages.
|
||||
'decay': batch_norm_decay,
|
||||
# epsilon to prevent 0s in variance.
|
||||
'epsilon': batch_norm_epsilon,
|
||||
# collection containing update_ops.
|
||||
'updates_collections': batch_norm_updates_collections,
|
||||
# use fused batch norm if possible.
|
||||
'fused': None,
|
||||
'scale': batch_norm_scale,
|
||||
}
|
||||
if use_batch_norm:
|
||||
normalizer_fn = slim.batch_norm
|
||||
normalizer_params = batch_norm_params
|
||||
else:
|
||||
normalizer_fn = None
|
||||
normalizer_params = {}
|
||||
# Set weight_decay for weights in Conv and FC layers.
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected],
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay)):
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d],
|
||||
weights_initializer=slim.variance_scaling_initializer(),
|
||||
activation_fn=activation_fn,
|
||||
normalizer_fn=normalizer_fn,
|
||||
normalizer_params=normalizer_params) as sc:
|
||||
return sc
|
||||
+347
@@ -0,0 +1,347 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for inception v1 classification network."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
|
||||
|
||||
def inception_v1_base(inputs,
|
||||
final_endpoint='Mixed_5c',
|
||||
include_root_block=True,
|
||||
scope='InceptionV1'):
|
||||
"""Defines the Inception V1 base architecture.
|
||||
|
||||
This architecture is defined in:
|
||||
Going deeper with convolutions
|
||||
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
|
||||
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
|
||||
http://arxiv.org/pdf/1409.4842v1.pdf.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
final_endpoint: specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
|
||||
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
|
||||
'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b', 'Mixed_5c']. If
|
||||
include_root_block is False, ['Conv2d_1a_7x7', 'MaxPool_2a_3x3',
|
||||
'Conv2d_2b_1x1', 'Conv2d_2c_3x3', 'MaxPool_3a_3x3'] will not be available.
|
||||
include_root_block: If True, include the convolution and max-pooling layers
|
||||
before the inception modules. If False, excludes those layers.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
A dictionary from components of the network to the corresponding activation.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values.
|
||||
"""
|
||||
end_points = {}
|
||||
with tf.compat.v1.variable_scope(scope, 'InceptionV1', [inputs]):
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d, slim.fully_connected],
|
||||
weights_initializer=trunc_normal(0.01)):
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
net = inputs
|
||||
if include_root_block:
|
||||
end_point = 'Conv2d_1a_7x7'
|
||||
net = slim.conv2d(inputs, 64, [7, 7], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point:
|
||||
return net, end_points
|
||||
end_point = 'MaxPool_2a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point:
|
||||
return net, end_points
|
||||
end_point = 'Conv2d_2b_1x1'
|
||||
net = slim.conv2d(net, 64, [1, 1], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point:
|
||||
return net, end_points
|
||||
end_point = 'Conv2d_2c_3x3'
|
||||
net = slim.conv2d(net, 192, [3, 3], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point:
|
||||
return net, end_points
|
||||
end_point = 'MaxPool_3a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point:
|
||||
return net, end_points
|
||||
|
||||
end_point = 'Mixed_3b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 96, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 128, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 16, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 32, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_3c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 192, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'MaxPool_4a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_4b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 96, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 208, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 16, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 48, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_4c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 112, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 24, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_4d'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 256, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 24, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_4e'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 112, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 144, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 288, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_4f'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 256, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 320, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'MaxPool_5a_2x2'
|
||||
net = slim.max_pool2d(net, [2, 2], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_5b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 256, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 320, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
|
||||
end_point = 'Mixed_5c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 384, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if final_endpoint == end_point: return net, end_points
|
||||
raise ValueError('Unknown final endpoint %s' % final_endpoint)
|
||||
|
||||
|
||||
def inception_v1(inputs,
|
||||
num_classes=1000,
|
||||
is_training=True,
|
||||
dropout_keep_prob=0.8,
|
||||
prediction_fn=slim.softmax,
|
||||
spatial_squeeze=True,
|
||||
reuse=None,
|
||||
scope='InceptionV1',
|
||||
global_pool=False):
|
||||
"""Defines the Inception V1 architecture.
|
||||
|
||||
This architecture is defined in:
|
||||
|
||||
Going deeper with convolutions
|
||||
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
|
||||
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
|
||||
http://arxiv.org/pdf/1409.4842v1.pdf.
|
||||
|
||||
The default image size used to train this network is 224x224.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
is_training: whether is training or not.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
spatial_squeeze: if True, logits is of shape [B, C], if false logits is of
|
||||
shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
global_pool: Optional boolean flag to control the avgpooling before the
|
||||
logits layer. If false or unset, pooling is done with a fixed window
|
||||
that reduces default-sized inputs to 1x1, while larger inputs lead to
|
||||
larger outputs. If true, any input size is pooled down to 1x1.
|
||||
|
||||
Returns:
|
||||
net: a Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the non-dropped-out input to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
"""
|
||||
# Final pooling and prediction
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionV1', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
net, end_points = inception_v1_base(inputs, scope=scope)
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
if global_pool:
|
||||
# Global average pooling.
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
else:
|
||||
# Pooling with a fixed kernel size.
|
||||
net = slim.avg_pool2d(net, [7, 7], stride=1, scope='AvgPool_0a_7x7')
|
||||
end_points['AvgPool_0a_7x7'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
net = slim.dropout(net, dropout_keep_prob, scope='Dropout_0b')
|
||||
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, scope='Conv2d_0c_1x1')
|
||||
if spatial_squeeze:
|
||||
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
|
||||
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
return logits, end_points
|
||||
inception_v1.default_image_size = 224
|
||||
|
||||
inception_v1_arg_scope = inception_utils.inception_arg_scope
|
||||
+300
@@ -0,0 +1,300 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for nets.inception_v1."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class InceptionV1Test(tf.test.TestCase):
|
||||
|
||||
def testBuildClassificationNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith(
|
||||
'InceptionV1/Logits/SpatialSqueeze'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Predictions' in end_points)
|
||||
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_v1(inputs, num_classes)
|
||||
self.assertTrue(net.op.name.startswith('InceptionV1/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 1024])
|
||||
self.assertFalse('Logits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
mixed_6c, end_points = inception.inception_v1_base(inputs)
|
||||
self.assertTrue(mixed_6c.op.name.startswith('InceptionV1/Mixed_5c'))
|
||||
self.assertListEqual(mixed_6c.get_shape().as_list(),
|
||||
[batch_size, 7, 7, 1024])
|
||||
expected_endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b',
|
||||
'Mixed_3c', 'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c',
|
||||
'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2',
|
||||
'Mixed_5b', 'Mixed_5c']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
|
||||
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d',
|
||||
'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b',
|
||||
'Mixed_5c']
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = inception.inception_v1_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionV1/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index+1], end_points.keys())
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoMixed5c(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v1_base(inputs,
|
||||
final_endpoint='Mixed_5c')
|
||||
endpoints_shapes = {
|
||||
'Conv2d_1a_7x7': [5, 112, 112, 64],
|
||||
'MaxPool_2a_3x3': [5, 56, 56, 64],
|
||||
'Conv2d_2b_1x1': [5, 56, 56, 64],
|
||||
'Conv2d_2c_3x3': [5, 56, 56, 192],
|
||||
'MaxPool_3a_3x3': [5, 28, 28, 192],
|
||||
'Mixed_3b': [5, 28, 28, 256],
|
||||
'Mixed_3c': [5, 28, 28, 480],
|
||||
'MaxPool_4a_3x3': [5, 14, 14, 480],
|
||||
'Mixed_4b': [5, 14, 14, 512],
|
||||
'Mixed_4c': [5, 14, 14, 512],
|
||||
'Mixed_4d': [5, 14, 14, 512],
|
||||
'Mixed_4e': [5, 14, 14, 528],
|
||||
'Mixed_4f': [5, 14, 14, 832],
|
||||
'MaxPool_5a_2x2': [5, 7, 7, 832],
|
||||
'Mixed_5b': [5, 7, 7, 832],
|
||||
'Mixed_5c': [5, 7, 7, 1024]
|
||||
}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testModelHasExpectedNumberOfParameters(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v1_arg_scope()):
|
||||
inception.inception_v1_base(inputs)
|
||||
total_params, _ = slim.model_analyzer.analyze_vars(
|
||||
slim.get_model_variables())
|
||||
self.assertAlmostEqual(5607184, total_params)
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 112, 112
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
mixed_5c, _ = inception.inception_v1_base(inputs)
|
||||
self.assertTrue(mixed_5c.op.name.startswith('InceptionV1/Mixed_5c'))
|
||||
self.assertListEqual(mixed_5c.get_shape().as_list(),
|
||||
[batch_size, 4, 4, 1024])
|
||||
|
||||
def testBuildBaseNetworkWithoutRootBlock(self):
|
||||
batch_size = 5
|
||||
height, width = 28, 28
|
||||
channels = 192
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, channels))
|
||||
_, end_points = inception.inception_v1_base(
|
||||
inputs, include_root_block=False)
|
||||
endpoints_shapes = {
|
||||
'Mixed_3b': [5, 28, 28, 256],
|
||||
'Mixed_3c': [5, 28, 28, 480],
|
||||
'MaxPool_4a_3x3': [5, 14, 14, 480],
|
||||
'Mixed_4b': [5, 14, 14, 512],
|
||||
'Mixed_4c': [5, 14, 14, 512],
|
||||
'Mixed_4d': [5, 14, 14, 512],
|
||||
'Mixed_4e': [5, 14, 14, 528],
|
||||
'Mixed_4f': [5, 14, 14, 832],
|
||||
'MaxPool_5a_2x2': [5, 7, 7, 832],
|
||||
'Mixed_5b': [5, 7, 7, 832],
|
||||
'Mixed_5c': [5, 7, 7, 1024]
|
||||
}
|
||||
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_5c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 1
|
||||
height, width = 250, 300
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v1(inputs, num_classes,
|
||||
global_pool=True)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_5c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 10, 1024])
|
||||
|
||||
def testUnknowBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = inception.inception_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v1(eval_inputs, num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
inception.inception_v1(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v1(eval_inputs, num_classes, reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testLogitsNotSqueezed(self):
|
||||
num_classes = 25
|
||||
images = tf.random.uniform([1, 224, 224, 3])
|
||||
logits, _ = inception.inception_v1(images,
|
||||
num_classes=num_classes,
|
||||
spatial_squeeze=False)
|
||||
|
||||
with self.test_session() as sess:
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
logits_out = sess.run(logits)
|
||||
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
|
||||
|
||||
def testNoBatchNormScaleByDefault(self):
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v1_arg_scope()):
|
||||
inception.inception_v1(inputs, num_classes, is_training=False)
|
||||
|
||||
self.assertEqual(tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'), [])
|
||||
|
||||
def testBatchNormScale(self):
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(
|
||||
inception.inception_v1_arg_scope(batch_norm_scale=True)):
|
||||
inception.inception_v1(inputs, num_classes, is_training=False)
|
||||
|
||||
gamma_names = set(
|
||||
v.op.name
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'))
|
||||
self.assertGreater(len(gamma_names), 0)
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/moving_mean:0$'):
|
||||
self.assertIn(v.op.name[:-len('moving_mean')] + 'gamma', gamma_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+596
@@ -0,0 +1,596 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for inception v2 classification network."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
|
||||
|
||||
def inception_v2_base(inputs,
|
||||
final_endpoint='Mixed_5c',
|
||||
min_depth=16,
|
||||
depth_multiplier=1.0,
|
||||
use_separable_conv=True,
|
||||
data_format='NHWC',
|
||||
include_root_block=True,
|
||||
scope=None):
|
||||
"""Inception v2 (6a2).
|
||||
|
||||
Constructs an Inception v2 network from inputs to the given final endpoint.
|
||||
This method can construct the network up to the layer inception(5b) as
|
||||
described in http://arxiv.org/abs/1502.03167.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
final_endpoint: specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'Mixed_4a',
|
||||
'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_5a', 'Mixed_5b',
|
||||
'Mixed_5c']. If include_root_block is False, ['Conv2d_1a_7x7',
|
||||
'MaxPool_2a_3x3', 'Conv2d_2b_1x1', 'Conv2d_2c_3x3', 'MaxPool_3a_3x3'] will
|
||||
not be available.
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
use_separable_conv: Use a separable convolution for the first layer
|
||||
Conv2d_1a_7x7. If this is False, use a normal convolution instead.
|
||||
data_format: Data format of the activations ('NHWC' or 'NCHW').
|
||||
include_root_block: If True, include the convolution and max-pooling layers
|
||||
before the inception modules. If False, excludes those layers.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
tensor_out: output tensor corresponding to the final_endpoint.
|
||||
end_points: a set of activations for external use, for example summaries or
|
||||
losses.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
or depth_multiplier <= 0
|
||||
"""
|
||||
|
||||
# end_points will collect relevant activations for external use, for example
|
||||
# summaries or losses.
|
||||
end_points = {}
|
||||
|
||||
# Used to find thinned depths for each layer.
|
||||
if depth_multiplier <= 0:
|
||||
raise ValueError('depth_multiplier is not greater than zero.')
|
||||
depth = lambda d: max(int(d * depth_multiplier), min_depth)
|
||||
|
||||
if data_format != 'NHWC' and data_format != 'NCHW':
|
||||
raise ValueError('data_format must be either NHWC or NCHW.')
|
||||
if data_format == 'NCHW' and use_separable_conv:
|
||||
raise ValueError(
|
||||
'separable convolution only supports NHWC layout. NCHW data format can'
|
||||
' only be used when use_separable_conv is False.'
|
||||
)
|
||||
|
||||
concat_dim = 3 if data_format == 'NHWC' else 1
|
||||
with tf.compat.v1.variable_scope(scope, 'InceptionV2', [inputs]):
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1,
|
||||
padding='SAME',
|
||||
data_format=data_format):
|
||||
|
||||
net = inputs
|
||||
if include_root_block:
|
||||
# Note that sizes in the comments below assume an input spatial size of
|
||||
# 224x224, however, the inputs can be of any size greater 32x32.
|
||||
|
||||
# 224 x 224 x 3
|
||||
end_point = 'Conv2d_1a_7x7'
|
||||
|
||||
if use_separable_conv:
|
||||
# depthwise_multiplier here is different from depth_multiplier.
|
||||
# depthwise_multiplier determines the output channels of the initial
|
||||
# depthwise conv (see docs for tf.nn.separable_conv2d), while
|
||||
# depth_multiplier controls the # channels of the subsequent 1x1
|
||||
# convolution. Must have
|
||||
# in_channels * depthwise_multipler <= out_channels
|
||||
# so that the separable convolution is not overparameterized.
|
||||
depthwise_multiplier = min(int(depth(64) / 3), 8)
|
||||
net = slim.separable_conv2d(
|
||||
inputs,
|
||||
depth(64), [7, 7],
|
||||
depth_multiplier=depthwise_multiplier,
|
||||
stride=2,
|
||||
padding='SAME',
|
||||
weights_initializer=trunc_normal(1.0),
|
||||
scope=end_point)
|
||||
else:
|
||||
# Use a normal convolution instead of a separable convolution.
|
||||
net = slim.conv2d(
|
||||
inputs,
|
||||
depth(64), [7, 7],
|
||||
stride=2,
|
||||
weights_initializer=trunc_normal(1.0),
|
||||
scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
# 112 x 112 x 64
|
||||
end_point = 'MaxPool_2a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
# 56 x 56 x 64
|
||||
end_point = 'Conv2d_2b_1x1'
|
||||
net = slim.conv2d(
|
||||
net,
|
||||
depth(64), [1, 1],
|
||||
scope=end_point,
|
||||
weights_initializer=trunc_normal(0.1))
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
# 56 x 56 x 64
|
||||
end_point = 'Conv2d_2c_3x3'
|
||||
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
# 56 x 56 x 192
|
||||
end_point = 'MaxPool_3a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
|
||||
# 28 x 28 x 192
|
||||
# Inception module.
|
||||
end_point = 'Mixed_3b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(64), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(32), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 28 x 28 x 256
|
||||
end_point = 'Mixed_3c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 28 x 28 x 320
|
||||
end_point = 'Mixed_4a'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(
|
||||
net, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_0 = slim.conv2d(branch_0, depth(160), [3, 3], stride=2,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(
|
||||
branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
|
||||
branch_1 = slim.conv2d(
|
||||
branch_1, depth(96), [3, 3], stride=2, scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(
|
||||
net, [3, 3], stride=2, scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(axis=concat_dim, values=[branch_0, branch_1, branch_2])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 14 x 14 x 576
|
||||
end_point = 'Mixed_4b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(224), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(64), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(
|
||||
branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(96), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 14 x 14 x 576
|
||||
end_point = 'Mixed_4c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(96), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(128), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(96), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 14 x 14 x 576
|
||||
end_point = 'Mixed_4d'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(160), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(96), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 14 x 14 x 576
|
||||
end_point = 'Mixed_4e'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(96), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(160), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(96), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 14 x 14 x 576
|
||||
end_point = 'Mixed_5a'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(
|
||||
net, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_0 = slim.conv2d(branch_0, depth(192), [3, 3], stride=2,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(192), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(256), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_1 = slim.conv2d(branch_1, depth(256), [3, 3], stride=2,
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(net, [3, 3], stride=2,
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 7 x 7 x 1024
|
||||
end_point = 'Mixed_5b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(352), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(192), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(320), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(160), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 7 x 7 x 1024
|
||||
end_point = 'Mixed_5c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(352), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(
|
||||
net, depth(192), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(320), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(
|
||||
net, depth(192), [1, 1],
|
||||
weights_initializer=trunc_normal(0.09),
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(128), [1, 1],
|
||||
weights_initializer=trunc_normal(0.1),
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(
|
||||
axis=concat_dim, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
raise ValueError('Unknown final endpoint %s' % final_endpoint)
|
||||
|
||||
|
||||
def inception_v2(inputs,
|
||||
num_classes=1000,
|
||||
is_training=True,
|
||||
dropout_keep_prob=0.8,
|
||||
min_depth=16,
|
||||
depth_multiplier=1.0,
|
||||
prediction_fn=slim.softmax,
|
||||
spatial_squeeze=True,
|
||||
reuse=None,
|
||||
scope='InceptionV2',
|
||||
global_pool=False):
|
||||
"""Inception v2 model for classification.
|
||||
|
||||
Constructs an Inception v2 network for classification as described in
|
||||
http://arxiv.org/abs/1502.03167.
|
||||
|
||||
The default image size used to train this network is 224x224.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
is_training: whether is training or not.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
spatial_squeeze: if True, logits is of shape [B, C], if false logits is of
|
||||
shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
global_pool: Optional boolean flag to control the avgpooling before the
|
||||
logits layer. If false or unset, pooling is done with a fixed window
|
||||
that reduces default-sized inputs to 1x1, while larger inputs lead to
|
||||
larger outputs. If true, any input size is pooled down to 1x1.
|
||||
|
||||
Returns:
|
||||
net: a Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the non-dropped-out input to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
or depth_multiplier <= 0
|
||||
"""
|
||||
if depth_multiplier <= 0:
|
||||
raise ValueError('depth_multiplier is not greater than zero.')
|
||||
|
||||
# Final pooling and prediction
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionV2', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
net, end_points = inception_v2_base(
|
||||
inputs, scope=scope, min_depth=min_depth,
|
||||
depth_multiplier=depth_multiplier)
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
if global_pool:
|
||||
# Global average pooling.
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
else:
|
||||
# Pooling with a fixed kernel size.
|
||||
kernel_size = _reduced_kernel_size_for_small_input(net, [7, 7])
|
||||
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
|
||||
scope='AvgPool_1a_{}x{}'.format(*kernel_size))
|
||||
end_points['AvgPool_1a'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
# 1 x 1 x 1024
|
||||
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
|
||||
end_points['PreLogits'] = net
|
||||
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, scope='Conv2d_1c_1x1')
|
||||
if spatial_squeeze:
|
||||
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
return logits, end_points
|
||||
inception_v2.default_image_size = 224
|
||||
|
||||
|
||||
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
|
||||
"""Define kernel size which is automatically reduced for small input.
|
||||
|
||||
If the shape of the input images is unknown at graph construction time this
|
||||
function assumes that the input images are is large enough.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor of size [batch_size, height, width, channels].
|
||||
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
|
||||
|
||||
Returns:
|
||||
a tensor with the kernel size.
|
||||
|
||||
TODO(jrru): Make this function work with unknown shapes. Theoretically, this
|
||||
can be done with the code below. Problems are two-fold: (1) If the shape was
|
||||
known, it will be lost. (2) inception.slim.ops._two_element_tuple cannot
|
||||
handle tensors that define the kernel size.
|
||||
shape = tf.shape(input_tensor)
|
||||
return = tf.stack([tf.minimum(shape[1], kernel_size[0]),
|
||||
tf.minimum(shape[2], kernel_size[1])])
|
||||
|
||||
"""
|
||||
shape = input_tensor.get_shape().as_list()
|
||||
if shape[1] is None or shape[2] is None:
|
||||
kernel_size_out = kernel_size
|
||||
else:
|
||||
kernel_size_out = [min(shape[1], kernel_size[0]),
|
||||
min(shape[2], kernel_size[1])]
|
||||
return kernel_size_out
|
||||
|
||||
|
||||
inception_v2_arg_scope = inception_utils.inception_arg_scope
|
||||
+412
@@ -0,0 +1,412 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for nets.inception_v2."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class InceptionV2Test(tf.test.TestCase):
|
||||
|
||||
def testBuildClassificationNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith(
|
||||
'InceptionV2/Logits/SpatialSqueeze'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Predictions' in end_points)
|
||||
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_v2(inputs, num_classes)
|
||||
self.assertTrue(net.op.name.startswith('InceptionV2/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 1024])
|
||||
self.assertFalse('Logits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
mixed_5c, end_points = inception.inception_v2_base(inputs)
|
||||
self.assertTrue(mixed_5c.op.name.startswith('InceptionV2/Mixed_5c'))
|
||||
self.assertListEqual(mixed_5c.get_shape().as_list(),
|
||||
[batch_size, 7, 7, 1024])
|
||||
expected_endpoints = ['Mixed_3b', 'Mixed_3c', 'Mixed_4a', 'Mixed_4b',
|
||||
'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_5a',
|
||||
'Mixed_5b', 'Mixed_5c', 'Conv2d_1a_7x7',
|
||||
'MaxPool_2a_3x3', 'Conv2d_2b_1x1', 'Conv2d_2c_3x3',
|
||||
'MaxPool_3a_3x3']
|
||||
self.assertItemsEqual(list(end_points.keys()), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
|
||||
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
|
||||
'Mixed_4a', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
|
||||
'Mixed_5a', 'Mixed_5b', 'Mixed_5c']
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = inception.inception_v2_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionV2/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index + 1], list(end_points.keys()))
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoMixed5c(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v2_base(inputs,
|
||||
final_endpoint='Mixed_5c')
|
||||
endpoints_shapes = {'Mixed_3b': [batch_size, 28, 28, 256],
|
||||
'Mixed_3c': [batch_size, 28, 28, 320],
|
||||
'Mixed_4a': [batch_size, 14, 14, 576],
|
||||
'Mixed_4b': [batch_size, 14, 14, 576],
|
||||
'Mixed_4c': [batch_size, 14, 14, 576],
|
||||
'Mixed_4d': [batch_size, 14, 14, 576],
|
||||
'Mixed_4e': [batch_size, 14, 14, 576],
|
||||
'Mixed_5a': [batch_size, 7, 7, 1024],
|
||||
'Mixed_5b': [batch_size, 7, 7, 1024],
|
||||
'Mixed_5c': [batch_size, 7, 7, 1024],
|
||||
'Conv2d_1a_7x7': [batch_size, 112, 112, 64],
|
||||
'MaxPool_2a_3x3': [batch_size, 56, 56, 64],
|
||||
'Conv2d_2b_1x1': [batch_size, 56, 56, 64],
|
||||
'Conv2d_2c_3x3': [batch_size, 56, 56, 192],
|
||||
'MaxPool_3a_3x3': [batch_size, 28, 28, 192]}
|
||||
self.assertItemsEqual(
|
||||
list(endpoints_shapes.keys()), list(end_points.keys()))
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testModelHasExpectedNumberOfParameters(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v2_arg_scope()):
|
||||
inception.inception_v2_base(inputs)
|
||||
total_params, _ = slim.model_analyzer.analyze_vars(
|
||||
slim.get_model_variables())
|
||||
self.assertAlmostEqual(10173112, total_params)
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v2(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = inception.inception_v2(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=0.5)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(0.5 * original_depth, new_depth)
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v2(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = inception.inception_v2(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=2.0)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(2.0 * original_depth, new_depth)
|
||||
|
||||
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v2(inputs, num_classes, depth_multiplier=-0.1)
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v2(inputs, num_classes, depth_multiplier=0.0)
|
||||
|
||||
def testBuildEndPointsWithUseSeparableConvolutionFalse(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v2_base(inputs)
|
||||
|
||||
endpoint_keys = [
|
||||
key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')
|
||||
]
|
||||
|
||||
_, end_points_with_replacement = inception.inception_v2_base(
|
||||
inputs, use_separable_conv=False)
|
||||
|
||||
# The endpoint shapes must be equal to the original shape even when the
|
||||
# separable convolution is replaced with a normal convolution.
|
||||
for key in endpoint_keys:
|
||||
original_shape = end_points[key].get_shape().as_list()
|
||||
self.assertTrue(key in end_points_with_replacement)
|
||||
new_shape = end_points_with_replacement[key].get_shape().as_list()
|
||||
self.assertListEqual(original_shape, new_shape)
|
||||
|
||||
def testBuildEndPointsNCHWDataFormat(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v2_base(inputs)
|
||||
|
||||
endpoint_keys = [
|
||||
key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')
|
||||
]
|
||||
|
||||
inputs_in_nchw = tf.random.uniform((batch_size, 3, height, width))
|
||||
_, end_points_with_replacement = inception.inception_v2_base(
|
||||
inputs_in_nchw, use_separable_conv=False, data_format='NCHW')
|
||||
|
||||
# With the 'NCHW' data format, all endpoint activations have a transposed
|
||||
# shape from the original shape with the 'NHWC' layout.
|
||||
for key in endpoint_keys:
|
||||
transposed_original_shape = tf.transpose(
|
||||
a=end_points[key], perm=[0, 3, 1, 2]).get_shape().as_list()
|
||||
self.assertTrue(key in end_points_with_replacement)
|
||||
new_shape = end_points_with_replacement[key].get_shape().as_list()
|
||||
self.assertListEqual(transposed_original_shape, new_shape)
|
||||
|
||||
def testBuildErrorsForDataFormats(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
|
||||
# 'NCWH' data format is not supported.
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v2_base(inputs, data_format='NCWH')
|
||||
|
||||
# 'NCHW' data format is not supported for separable convolution.
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v2_base(inputs, data_format='NCHW')
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 112, 112
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_5c']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 4, 4, 1024])
|
||||
|
||||
def testBuildBaseNetworkWithoutRootBlock(self):
|
||||
batch_size = 5
|
||||
height, width = 28, 28
|
||||
channels = 192
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, channels))
|
||||
_, end_points = inception.inception_v2_base(
|
||||
inputs, include_root_block=False)
|
||||
endpoints_shapes = {
|
||||
'Mixed_3b': [batch_size, 28, 28, 256],
|
||||
'Mixed_3c': [batch_size, 28, 28, 320],
|
||||
'Mixed_4a': [batch_size, 14, 14, 576],
|
||||
'Mixed_4b': [batch_size, 14, 14, 576],
|
||||
'Mixed_4c': [batch_size, 14, 14, 576],
|
||||
'Mixed_4d': [batch_size, 14, 14, 576],
|
||||
'Mixed_4e': [batch_size, 14, 14, 576],
|
||||
'Mixed_5a': [batch_size, 7, 7, 1024],
|
||||
'Mixed_5b': [batch_size, 7, 7, 1024],
|
||||
'Mixed_5c': [batch_size, 7, 7, 1024]
|
||||
}
|
||||
self.assertItemsEqual(
|
||||
list(endpoints_shapes.keys()), list(end_points.keys()))
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_5c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 1
|
||||
height, width = 250, 300
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v2(inputs, num_classes,
|
||||
global_pool=True)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_5c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 10, 1024])
|
||||
|
||||
def testUnknowBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = inception.inception_v2(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v2(eval_inputs, num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
inception.inception_v2(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v2(eval_inputs, num_classes, reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testLogitsNotSqueezed(self):
|
||||
num_classes = 25
|
||||
images = tf.random.uniform([1, 224, 224, 3])
|
||||
logits, _ = inception.inception_v2(images,
|
||||
num_classes=num_classes,
|
||||
spatial_squeeze=False)
|
||||
|
||||
with self.test_session() as sess:
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
logits_out = sess.run(logits)
|
||||
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
|
||||
|
||||
def testNoBatchNormScaleByDefault(self):
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v2_arg_scope()):
|
||||
inception.inception_v2(inputs, num_classes, is_training=False)
|
||||
|
||||
self.assertEqual(tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'), [])
|
||||
|
||||
def testBatchNormScale(self):
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(
|
||||
inception.inception_v2_arg_scope(batch_norm_scale=True)):
|
||||
inception.inception_v2(inputs, num_classes, is_training=False)
|
||||
|
||||
gamma_names = set(
|
||||
v.op.name
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'))
|
||||
self.assertGreater(len(gamma_names), 0)
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/moving_mean:0$'):
|
||||
self.assertIn(v.op.name[:-len('moving_mean')] + 'gamma', gamma_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+585
@@ -0,0 +1,585 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for inception v3 classification network."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
|
||||
|
||||
def inception_v3_base(inputs,
|
||||
final_endpoint='Mixed_7c',
|
||||
min_depth=16,
|
||||
depth_multiplier=1.0,
|
||||
scope=None):
|
||||
"""Inception model from http://arxiv.org/abs/1512.00567.
|
||||
|
||||
Constructs an Inception v3 network from inputs to the given final endpoint.
|
||||
This method can construct the network up to the final inception block
|
||||
Mixed_7c.
|
||||
|
||||
Note that the names of the layers in the paper do not correspond to the names
|
||||
of the endpoints registered by this function although they build the same
|
||||
network.
|
||||
|
||||
Here is a mapping from the old_names to the new names:
|
||||
Old name | New name
|
||||
=======================================
|
||||
conv0 | Conv2d_1a_3x3
|
||||
conv1 | Conv2d_2a_3x3
|
||||
conv2 | Conv2d_2b_3x3
|
||||
pool1 | MaxPool_3a_3x3
|
||||
conv3 | Conv2d_3b_1x1
|
||||
conv4 | Conv2d_4a_3x3
|
||||
pool2 | MaxPool_5a_3x3
|
||||
mixed_35x35x256a | Mixed_5b
|
||||
mixed_35x35x288a | Mixed_5c
|
||||
mixed_35x35x288b | Mixed_5d
|
||||
mixed_17x17x768a | Mixed_6a
|
||||
mixed_17x17x768b | Mixed_6b
|
||||
mixed_17x17x768c | Mixed_6c
|
||||
mixed_17x17x768d | Mixed_6d
|
||||
mixed_17x17x768e | Mixed_6e
|
||||
mixed_8x8x1280a | Mixed_7a
|
||||
mixed_8x8x2048a | Mixed_7b
|
||||
mixed_8x8x2048b | Mixed_7c
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
final_endpoint: specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3',
|
||||
'Mixed_5b', 'Mixed_5c', 'Mixed_5d', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c',
|
||||
'Mixed_6d', 'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c'].
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
tensor_out: output tensor corresponding to the final_endpoint.
|
||||
end_points: a set of activations for external use, for example summaries or
|
||||
losses.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
or depth_multiplier <= 0
|
||||
"""
|
||||
# end_points will collect relevant activations for external use, for example
|
||||
# summaries or losses.
|
||||
end_points = {}
|
||||
|
||||
if depth_multiplier <= 0:
|
||||
raise ValueError('depth_multiplier is not greater than zero.')
|
||||
depth = lambda d: max(int(d * depth_multiplier), min_depth)
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'InceptionV3', [inputs]):
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='VALID'):
|
||||
# 299 x 299 x 3
|
||||
end_point = 'Conv2d_1a_3x3'
|
||||
net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 149 x 149 x 32
|
||||
end_point = 'Conv2d_2a_3x3'
|
||||
net = slim.conv2d(net, depth(32), [3, 3], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 147 x 147 x 32
|
||||
end_point = 'Conv2d_2b_3x3'
|
||||
net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 147 x 147 x 64
|
||||
end_point = 'MaxPool_3a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 73 x 73 x 64
|
||||
end_point = 'Conv2d_3b_1x1'
|
||||
net = slim.conv2d(net, depth(80), [1, 1], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 73 x 73 x 80.
|
||||
end_point = 'Conv2d_4a_3x3'
|
||||
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 71 x 71 x 192.
|
||||
end_point = 'MaxPool_5a_3x3'
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# 35 x 35 x 192.
|
||||
|
||||
# Inception blocks
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
# mixed: 35 x 35 x 256.
|
||||
end_point = 'Mixed_5b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
|
||||
scope='Conv2d_0b_5x5')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(32), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_1: 35 x 35 x 288.
|
||||
end_point = 'Mixed_5c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
|
||||
scope='Conv_1_0c_5x5')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(64), [1, 1],
|
||||
scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_2: 35 x 35 x 288.
|
||||
end_point = 'Mixed_5d'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
|
||||
scope='Conv2d_0b_5x5')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
|
||||
scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_3: 17 x 17 x 768.
|
||||
end_point = 'Mixed_6a'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3],
|
||||
scope='Conv2d_0b_3x3')
|
||||
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed4: 17 x 17 x 768.
|
||||
end_point = 'Mixed_6b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(128), [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
|
||||
scope='Conv2d_0b_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [1, 7],
|
||||
scope='Conv2d_0c_1x7')
|
||||
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
|
||||
scope='Conv2d_0d_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
|
||||
scope='Conv2d_0e_1x7')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_5: 17 x 17 x 768.
|
||||
end_point = 'Mixed_6c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(160), [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
|
||||
scope='Conv2d_0b_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [1, 7],
|
||||
scope='Conv2d_0c_1x7')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
|
||||
scope='Conv2d_0d_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
|
||||
scope='Conv2d_0e_1x7')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# mixed_6: 17 x 17 x 768.
|
||||
end_point = 'Mixed_6d'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(160), [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
|
||||
scope='Conv2d_0b_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [1, 7],
|
||||
scope='Conv2d_0c_1x7')
|
||||
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
|
||||
scope='Conv2d_0d_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
|
||||
scope='Conv2d_0e_1x7')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_7: 17 x 17 x 768.
|
||||
end_point = 'Mixed_6e'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [7, 1],
|
||||
scope='Conv2d_0b_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
|
||||
scope='Conv2d_0c_1x7')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [7, 1],
|
||||
scope='Conv2d_0d_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
|
||||
scope='Conv2d_0e_1x7')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
|
||||
scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_8: 8 x 8 x 1280.
|
||||
end_point = 'Mixed_7a'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [1, 7],
|
||||
scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
|
||||
scope='Conv2d_0c_7x1')
|
||||
branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
# mixed_9: 8 x 8 x 2048.
|
||||
end_point = 'Mixed_7b'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
|
||||
slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(
|
||||
branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
|
||||
branch_2 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
|
||||
slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
|
||||
# mixed_10: 8 x 8 x 2048.
|
||||
end_point = 'Mixed_7c'
|
||||
with tf.compat.v1.variable_scope(end_point):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
|
||||
slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(
|
||||
branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
|
||||
branch_2 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
|
||||
slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(
|
||||
branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint: return net, end_points
|
||||
raise ValueError('Unknown final endpoint %s' % final_endpoint)
|
||||
|
||||
|
||||
def inception_v3(inputs,
|
||||
num_classes=1000,
|
||||
is_training=True,
|
||||
dropout_keep_prob=0.8,
|
||||
min_depth=16,
|
||||
depth_multiplier=1.0,
|
||||
prediction_fn=slim.softmax,
|
||||
spatial_squeeze=True,
|
||||
reuse=None,
|
||||
create_aux_logits=True,
|
||||
scope='InceptionV3',
|
||||
global_pool=False):
|
||||
"""Inception model from http://arxiv.org/abs/1512.00567.
|
||||
|
||||
"Rethinking the Inception Architecture for Computer Vision"
|
||||
|
||||
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
|
||||
Zbigniew Wojna.
|
||||
|
||||
With the default arguments this method constructs the exact model defined in
|
||||
the paper. However, one can experiment with variations of the inception_v3
|
||||
network by changing arguments dropout_keep_prob, min_depth and
|
||||
depth_multiplier.
|
||||
|
||||
The default image size used to train this network is 299x299.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
is_training: whether is training or not.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
spatial_squeeze: if True, logits is of shape [B, C], if false logits is of
|
||||
shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
create_aux_logits: Whether to create the auxiliary logits.
|
||||
scope: Optional variable_scope.
|
||||
global_pool: Optional boolean flag to control the avgpooling before the
|
||||
logits layer. If false or unset, pooling is done with a fixed window
|
||||
that reduces default-sized inputs to 1x1, while larger inputs lead to
|
||||
larger outputs. If true, any input size is pooled down to 1x1.
|
||||
|
||||
Returns:
|
||||
net: a Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the non-dropped-out input to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
|
||||
Raises:
|
||||
ValueError: if 'depth_multiplier' is less than or equal to zero.
|
||||
"""
|
||||
if depth_multiplier <= 0:
|
||||
raise ValueError('depth_multiplier is not greater than zero.')
|
||||
depth = lambda d: max(int(d * depth_multiplier), min_depth)
|
||||
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionV3', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
net, end_points = inception_v3_base(
|
||||
inputs, scope=scope, min_depth=min_depth,
|
||||
depth_multiplier=depth_multiplier)
|
||||
|
||||
# Auxiliary Head logits
|
||||
if create_aux_logits and num_classes:
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
aux_logits = end_points['Mixed_6e']
|
||||
with tf.compat.v1.variable_scope('AuxLogits'):
|
||||
aux_logits = slim.avg_pool2d(
|
||||
aux_logits, [5, 5], stride=3, padding='VALID',
|
||||
scope='AvgPool_1a_5x5')
|
||||
aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1],
|
||||
scope='Conv2d_1b_1x1')
|
||||
|
||||
# Shape of feature map before the final layer.
|
||||
kernel_size = _reduced_kernel_size_for_small_input(
|
||||
aux_logits, [5, 5])
|
||||
aux_logits = slim.conv2d(
|
||||
aux_logits, depth(768), kernel_size,
|
||||
weights_initializer=trunc_normal(0.01),
|
||||
padding='VALID', scope='Conv2d_2a_{}x{}'.format(*kernel_size))
|
||||
aux_logits = slim.conv2d(
|
||||
aux_logits, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, weights_initializer=trunc_normal(0.001),
|
||||
scope='Conv2d_2b_1x1')
|
||||
if spatial_squeeze:
|
||||
aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze')
|
||||
end_points['AuxLogits'] = aux_logits
|
||||
|
||||
# Final pooling and prediction
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
if global_pool:
|
||||
# Global average pooling.
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='GlobalPool')
|
||||
end_points['global_pool'] = net
|
||||
else:
|
||||
# Pooling with a fixed kernel size.
|
||||
kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8])
|
||||
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
|
||||
scope='AvgPool_1a_{}x{}'.format(*kernel_size))
|
||||
end_points['AvgPool_1a'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
# 1 x 1 x 2048
|
||||
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
|
||||
end_points['PreLogits'] = net
|
||||
# 2048
|
||||
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, scope='Conv2d_1c_1x1')
|
||||
if spatial_squeeze:
|
||||
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
|
||||
# 1000
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
return logits, end_points
|
||||
inception_v3.default_image_size = 299
|
||||
|
||||
|
||||
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
|
||||
"""Define kernel size which is automatically reduced for small input.
|
||||
|
||||
If the shape of the input images is unknown at graph construction time this
|
||||
function assumes that the input images are is large enough.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor of size [batch_size, height, width, channels].
|
||||
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
|
||||
|
||||
Returns:
|
||||
a tensor with the kernel size.
|
||||
|
||||
TODO(jrru): Make this function work with unknown shapes. Theoretically, this
|
||||
can be done with the code below. Problems are two-fold: (1) If the shape was
|
||||
known, it will be lost. (2) inception.slim.ops._two_element_tuple cannot
|
||||
handle tensors that define the kernel size.
|
||||
shape = tf.shape(input_tensor)
|
||||
return = tf.stack([tf.minimum(shape[1], kernel_size[0]),
|
||||
tf.minimum(shape[2], kernel_size[1])])
|
||||
|
||||
"""
|
||||
shape = input_tensor.get_shape().as_list()
|
||||
if shape[1] is None or shape[2] is None:
|
||||
kernel_size_out = kernel_size
|
||||
else:
|
||||
kernel_size_out = [min(shape[1], kernel_size[0]),
|
||||
min(shape[2], kernel_size[1])]
|
||||
return kernel_size_out
|
||||
|
||||
|
||||
inception_v3_arg_scope = inception_utils.inception_arg_scope
|
||||
+350
@@ -0,0 +1,350 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for nets.inception_v1."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class InceptionV3Test(tf.test.TestCase):
|
||||
|
||||
def testBuildClassificationNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v3(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith(
|
||||
'InceptionV3/Logits/SpatialSqueeze'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Predictions' in end_points)
|
||||
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = None
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_v3(inputs, num_classes)
|
||||
self.assertTrue(net.op.name.startswith('InceptionV3/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 2048])
|
||||
self.assertFalse('Logits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
final_endpoint, end_points = inception.inception_v3_base(inputs)
|
||||
self.assertTrue(final_endpoint.op.name.startswith(
|
||||
'InceptionV3/Mixed_7c'))
|
||||
self.assertListEqual(final_endpoint.get_shape().as_list(),
|
||||
[batch_size, 8, 8, 2048])
|
||||
expected_endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
|
||||
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
|
||||
'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
|
||||
'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
|
||||
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
|
||||
'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
|
||||
'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c']
|
||||
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = inception.inception_v3_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionV3/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index+1], end_points.keys())
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoMixed7c(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v3_base(
|
||||
inputs, final_endpoint='Mixed_7c')
|
||||
endpoints_shapes = {'Conv2d_1a_3x3': [batch_size, 149, 149, 32],
|
||||
'Conv2d_2a_3x3': [batch_size, 147, 147, 32],
|
||||
'Conv2d_2b_3x3': [batch_size, 147, 147, 64],
|
||||
'MaxPool_3a_3x3': [batch_size, 73, 73, 64],
|
||||
'Conv2d_3b_1x1': [batch_size, 73, 73, 80],
|
||||
'Conv2d_4a_3x3': [batch_size, 71, 71, 192],
|
||||
'MaxPool_5a_3x3': [batch_size, 35, 35, 192],
|
||||
'Mixed_5b': [batch_size, 35, 35, 256],
|
||||
'Mixed_5c': [batch_size, 35, 35, 288],
|
||||
'Mixed_5d': [batch_size, 35, 35, 288],
|
||||
'Mixed_6a': [batch_size, 17, 17, 768],
|
||||
'Mixed_6b': [batch_size, 17, 17, 768],
|
||||
'Mixed_6c': [batch_size, 17, 17, 768],
|
||||
'Mixed_6d': [batch_size, 17, 17, 768],
|
||||
'Mixed_6e': [batch_size, 17, 17, 768],
|
||||
'Mixed_7a': [batch_size, 8, 8, 1280],
|
||||
'Mixed_7b': [batch_size, 8, 8, 2048],
|
||||
'Mixed_7c': [batch_size, 8, 8, 2048]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testModelHasExpectedNumberOfParameters(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v3_arg_scope()):
|
||||
inception.inception_v3_base(inputs)
|
||||
total_params, _ = slim.model_analyzer.analyze_vars(
|
||||
slim.get_model_variables())
|
||||
self.assertAlmostEqual(21802784, total_params)
|
||||
|
||||
def testBuildEndPoints(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v3(inputs, num_classes)
|
||||
self.assertTrue('Logits' in end_points)
|
||||
logits = end_points['Logits']
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('AuxLogits' in end_points)
|
||||
aux_logits = end_points['AuxLogits']
|
||||
self.assertListEqual(aux_logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Mixed_7c' in end_points)
|
||||
pre_pool = end_points['Mixed_7c']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 8, 8, 2048])
|
||||
self.assertTrue('PreLogits' in end_points)
|
||||
pre_logits = end_points['PreLogits']
|
||||
self.assertListEqual(pre_logits.get_shape().as_list(),
|
||||
[batch_size, 1, 1, 2048])
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v3(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = inception.inception_v3(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=0.5)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(0.5 * original_depth, new_depth)
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v3(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = inception.inception_v3(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=2.0)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(2.0 * original_depth, new_depth)
|
||||
|
||||
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v3(inputs, num_classes, depth_multiplier=-0.1)
|
||||
with self.assertRaises(ValueError):
|
||||
_ = inception.inception_v3(inputs, num_classes, depth_multiplier=0.0)
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v3(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV3/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7c']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 3, 3, 2048])
|
||||
|
||||
def testUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 2
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v3(inputs, num_classes)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 8, 2048])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 1
|
||||
height, width = 330, 400
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v3(inputs, num_classes,
|
||||
global_pool=True)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7c']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 11, 2048])
|
||||
|
||||
def testUnknowBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = inception.inception_v3(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV3/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v3(eval_inputs, num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
inception.inception_v3(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v3(eval_inputs, num_classes,
|
||||
is_training=False, reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testLogitsNotSqueezed(self):
|
||||
num_classes = 25
|
||||
images = tf.random.uniform([1, 299, 299, 3])
|
||||
logits, _ = inception.inception_v3(images,
|
||||
num_classes=num_classes,
|
||||
spatial_squeeze=False)
|
||||
|
||||
with self.test_session() as sess:
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
logits_out = sess.run(logits)
|
||||
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
|
||||
|
||||
def testNoBatchNormScaleByDefault(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(inception.inception_v3_arg_scope()):
|
||||
inception.inception_v3(inputs, num_classes, is_training=False)
|
||||
|
||||
self.assertEqual(tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'), [])
|
||||
|
||||
def testBatchNormScale(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with slim.arg_scope(
|
||||
inception.inception_v3_arg_scope(batch_norm_scale=True)):
|
||||
inception.inception_v3(inputs, num_classes, is_training=False)
|
||||
|
||||
gamma_names = set(
|
||||
v.op.name
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'))
|
||||
self.assertGreater(len(gamma_names), 0)
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/moving_mean:0$'):
|
||||
self.assertIn(v.op.name[:-len('moving_mean')] + 'gamma', gamma_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+347
@@ -0,0 +1,347 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition of the Inception V4 architecture.
|
||||
|
||||
As described in http://arxiv.org/abs/1602.07261.
|
||||
|
||||
Inception-v4, Inception-ResNet and the Impact of Residual Connections
|
||||
on Learning
|
||||
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception_utils
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def block_inception_a(inputs, scope=None, reuse=None):
|
||||
"""Builds Inception-A block for Inception v4 network."""
|
||||
# By default use stride=1 and SAME padding
|
||||
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'BlockInceptionA', [inputs], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(inputs, 96, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
|
||||
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 96, [1, 1], scope='Conv2d_0b_1x1')
|
||||
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
|
||||
|
||||
def block_reduction_a(inputs, scope=None, reuse=None):
|
||||
"""Builds Reduction-A block for Inception v4 network."""
|
||||
# By default use stride=1 and SAME padding
|
||||
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'BlockReductionA', [inputs], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(inputs, 384, [3, 3], stride=2, padding='VALID',
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
|
||||
branch_1 = slim.conv2d(branch_1, 256, [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_1a_3x3')
|
||||
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
|
||||
|
||||
|
||||
def block_inception_b(inputs, scope=None, reuse=None):
|
||||
"""Builds Inception-B block for Inception v4 network."""
|
||||
# By default use stride=1 and SAME padding
|
||||
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'BlockInceptionB', [inputs], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 224, [1, 7], scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, 256, [7, 1], scope='Conv2d_0c_7x1')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, 224, [1, 7], scope='Conv2d_0c_1x7')
|
||||
branch_2 = slim.conv2d(branch_2, 224, [7, 1], scope='Conv2d_0d_7x1')
|
||||
branch_2 = slim.conv2d(branch_2, 256, [1, 7], scope='Conv2d_0e_1x7')
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
|
||||
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
|
||||
|
||||
def block_reduction_b(inputs, scope=None, reuse=None):
|
||||
"""Builds Reduction-B block for Inception v4 network."""
|
||||
# By default use stride=1 and SAME padding
|
||||
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'BlockReductionB', [inputs], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_0 = slim.conv2d(branch_0, 192, [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 256, [1, 7], scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, 320, [7, 1], scope='Conv2d_0c_7x1')
|
||||
branch_1 = slim.conv2d(branch_1, 320, [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_1a_3x3')
|
||||
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
|
||||
|
||||
|
||||
def block_inception_c(inputs, scope=None, reuse=None):
|
||||
"""Builds Inception-C block for Inception v4 network."""
|
||||
# By default use stride=1 and SAME padding
|
||||
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'BlockInceptionC', [inputs], reuse=reuse):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_1, 256, [1, 3], scope='Conv2d_0b_1x3'),
|
||||
slim.conv2d(branch_1, 256, [3, 1], scope='Conv2d_0c_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_2'):
|
||||
branch_2 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_2 = slim.conv2d(branch_2, 448, [3, 1], scope='Conv2d_0b_3x1')
|
||||
branch_2 = slim.conv2d(branch_2, 512, [1, 3], scope='Conv2d_0c_1x3')
|
||||
branch_2 = tf.concat(axis=3, values=[
|
||||
slim.conv2d(branch_2, 256, [1, 3], scope='Conv2d_0d_1x3'),
|
||||
slim.conv2d(branch_2, 256, [3, 1], scope='Conv2d_0e_3x1')])
|
||||
with tf.compat.v1.variable_scope('Branch_3'):
|
||||
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
|
||||
branch_3 = slim.conv2d(branch_3, 256, [1, 1], scope='Conv2d_0b_1x1')
|
||||
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
|
||||
|
||||
|
||||
def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None):
|
||||
"""Creates the Inception V4 network up to the given final endpoint.
|
||||
|
||||
Args:
|
||||
inputs: a 4-D tensor of size [batch_size, height, width, 3].
|
||||
final_endpoint: specifies the endpoint to construct the network up to.
|
||||
It can be one of [ 'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
|
||||
'Mixed_3a', 'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
|
||||
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d', 'Mixed_6e',
|
||||
'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c',
|
||||
'Mixed_7d']
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
logits: the logits outputs of the model.
|
||||
end_points: the set of end_points from the inception model.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
"""
|
||||
end_points = {}
|
||||
|
||||
def add_and_check_final(name, net):
|
||||
end_points[name] = net
|
||||
return name == final_endpoint
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'InceptionV4', [inputs]):
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
# 299 x 299 x 3
|
||||
net = slim.conv2d(inputs, 32, [3, 3], stride=2,
|
||||
padding='VALID', scope='Conv2d_1a_3x3')
|
||||
if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points
|
||||
# 149 x 149 x 32
|
||||
net = slim.conv2d(net, 32, [3, 3], padding='VALID',
|
||||
scope='Conv2d_2a_3x3')
|
||||
if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points
|
||||
# 147 x 147 x 32
|
||||
net = slim.conv2d(net, 64, [3, 3], scope='Conv2d_2b_3x3')
|
||||
if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points
|
||||
# 147 x 147 x 64
|
||||
with tf.compat.v1.variable_scope('Mixed_3a'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_0a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 96, [3, 3], stride=2, padding='VALID',
|
||||
scope='Conv2d_0a_3x3')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1])
|
||||
if add_and_check_final('Mixed_3a', net): return net, end_points
|
||||
|
||||
# 73 x 73 x 160
|
||||
with tf.compat.v1.variable_scope('Mixed_4a'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_0 = slim.conv2d(branch_0, 96, [3, 3], padding='VALID',
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
|
||||
branch_1 = slim.conv2d(branch_1, 64, [1, 7], scope='Conv2d_0b_1x7')
|
||||
branch_1 = slim.conv2d(branch_1, 64, [7, 1], scope='Conv2d_0c_7x1')
|
||||
branch_1 = slim.conv2d(branch_1, 96, [3, 3], padding='VALID',
|
||||
scope='Conv2d_1a_3x3')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1])
|
||||
if add_and_check_final('Mixed_4a', net): return net, end_points
|
||||
|
||||
# 71 x 71 x 192
|
||||
with tf.compat.v1.variable_scope('Mixed_5a'):
|
||||
with tf.compat.v1.variable_scope('Branch_0'):
|
||||
branch_0 = slim.conv2d(net, 192, [3, 3], stride=2, padding='VALID',
|
||||
scope='Conv2d_1a_3x3')
|
||||
with tf.compat.v1.variable_scope('Branch_1'):
|
||||
branch_1 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
|
||||
scope='MaxPool_1a_3x3')
|
||||
net = tf.concat(axis=3, values=[branch_0, branch_1])
|
||||
if add_and_check_final('Mixed_5a', net): return net, end_points
|
||||
|
||||
# 35 x 35 x 384
|
||||
# 4 x Inception-A blocks
|
||||
for idx in range(4):
|
||||
block_scope = 'Mixed_5' + chr(ord('b') + idx)
|
||||
net = block_inception_a(net, block_scope)
|
||||
if add_and_check_final(block_scope, net): return net, end_points
|
||||
|
||||
# 35 x 35 x 384
|
||||
# Reduction-A block
|
||||
net = block_reduction_a(net, 'Mixed_6a')
|
||||
if add_and_check_final('Mixed_6a', net): return net, end_points
|
||||
|
||||
# 17 x 17 x 1024
|
||||
# 7 x Inception-B blocks
|
||||
for idx in range(7):
|
||||
block_scope = 'Mixed_6' + chr(ord('b') + idx)
|
||||
net = block_inception_b(net, block_scope)
|
||||
if add_and_check_final(block_scope, net): return net, end_points
|
||||
|
||||
# 17 x 17 x 1024
|
||||
# Reduction-B block
|
||||
net = block_reduction_b(net, 'Mixed_7a')
|
||||
if add_and_check_final('Mixed_7a', net): return net, end_points
|
||||
|
||||
# 8 x 8 x 1536
|
||||
# 3 x Inception-C blocks
|
||||
for idx in range(3):
|
||||
block_scope = 'Mixed_7' + chr(ord('b') + idx)
|
||||
net = block_inception_c(net, block_scope)
|
||||
if add_and_check_final(block_scope, net): return net, end_points
|
||||
raise ValueError('Unknown final endpoint %s' % final_endpoint)
|
||||
|
||||
|
||||
def inception_v4(inputs, num_classes=1001, is_training=True,
|
||||
dropout_keep_prob=0.8,
|
||||
reuse=None,
|
||||
scope='InceptionV4',
|
||||
create_aux_logits=True):
|
||||
"""Creates the Inception V4 model.
|
||||
|
||||
Args:
|
||||
inputs: a 4-D tensor of size [batch_size, height, width, 3].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
is_training: whether is training or not.
|
||||
dropout_keep_prob: float, the fraction to keep before final layer.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
create_aux_logits: Whether to include the auxiliary logits.
|
||||
|
||||
Returns:
|
||||
net: a Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the non-dropped input to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: the set of end_points from the inception model.
|
||||
"""
|
||||
end_points = {}
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'InceptionV4', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
net, end_points = inception_v4_base(inputs, scope=scope)
|
||||
|
||||
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
|
||||
stride=1, padding='SAME'):
|
||||
# Auxiliary Head logits
|
||||
if create_aux_logits and num_classes:
|
||||
with tf.compat.v1.variable_scope('AuxLogits'):
|
||||
# 17 x 17 x 1024
|
||||
aux_logits = end_points['Mixed_6h']
|
||||
aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3,
|
||||
padding='VALID',
|
||||
scope='AvgPool_1a_5x5')
|
||||
aux_logits = slim.conv2d(aux_logits, 128, [1, 1],
|
||||
scope='Conv2d_1b_1x1')
|
||||
aux_logits = slim.conv2d(aux_logits, 768,
|
||||
aux_logits.get_shape()[1:3],
|
||||
padding='VALID', scope='Conv2d_2a')
|
||||
aux_logits = slim.flatten(aux_logits)
|
||||
aux_logits = slim.fully_connected(aux_logits, num_classes,
|
||||
activation_fn=None,
|
||||
scope='Aux_logits')
|
||||
end_points['AuxLogits'] = aux_logits
|
||||
|
||||
# Final pooling and prediction
|
||||
# TODO(sguada,arnoegw): Consider adding a parameter global_pool which
|
||||
# can be set to False to disable pooling here (as in resnet_*()).
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
# 8 x 8 x 1536
|
||||
kernel_size = net.get_shape()[1:3]
|
||||
if kernel_size.is_fully_defined():
|
||||
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
|
||||
scope='AvgPool_1a')
|
||||
else:
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net,
|
||||
axis=[1, 2],
|
||||
keepdims=True,
|
||||
name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
# 1 x 1 x 1536
|
||||
net = slim.dropout(net, dropout_keep_prob, scope='Dropout_1b')
|
||||
net = slim.flatten(net, scope='PreLogitsFlatten')
|
||||
end_points['PreLogitsFlatten'] = net
|
||||
# 1536
|
||||
logits = slim.fully_connected(net, num_classes, activation_fn=None,
|
||||
scope='Logits')
|
||||
end_points['Logits'] = logits
|
||||
end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
|
||||
return logits, end_points
|
||||
inception_v4.default_image_size = 299
|
||||
|
||||
|
||||
inception_v4_arg_scope = inception_utils.inception_arg_scope
|
||||
+287
@@ -0,0 +1,287 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.inception_v4."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import inception
|
||||
|
||||
|
||||
class InceptionTest(tf.test.TestCase):
|
||||
|
||||
def testBuildLogits(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v4(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertTrue(auxlogits.op.name.startswith('InceptionV4/AuxLogits'))
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue(predictions.op.name.startswith(
|
||||
'InceptionV4/Logits/Predictions'))
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_v4(inputs, num_classes)
|
||||
self.assertTrue(net.op.name.startswith('InceptionV4/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 1536])
|
||||
self.assertFalse('Logits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
|
||||
def testBuildWithoutAuxLogits(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, endpoints = inception.inception_v4(inputs, num_classes,
|
||||
create_aux_logits=False)
|
||||
self.assertFalse('AuxLogits' in endpoints)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testAllEndPointsShapes(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = inception.inception_v4(inputs, num_classes)
|
||||
endpoints_shapes = {'Conv2d_1a_3x3': [batch_size, 149, 149, 32],
|
||||
'Conv2d_2a_3x3': [batch_size, 147, 147, 32],
|
||||
'Conv2d_2b_3x3': [batch_size, 147, 147, 64],
|
||||
'Mixed_3a': [batch_size, 73, 73, 160],
|
||||
'Mixed_4a': [batch_size, 71, 71, 192],
|
||||
'Mixed_5a': [batch_size, 35, 35, 384],
|
||||
# 4 x Inception-A blocks
|
||||
'Mixed_5b': [batch_size, 35, 35, 384],
|
||||
'Mixed_5c': [batch_size, 35, 35, 384],
|
||||
'Mixed_5d': [batch_size, 35, 35, 384],
|
||||
'Mixed_5e': [batch_size, 35, 35, 384],
|
||||
# Reduction-A block
|
||||
'Mixed_6a': [batch_size, 17, 17, 1024],
|
||||
# 7 x Inception-B blocks
|
||||
'Mixed_6b': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6c': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6d': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6e': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6f': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6g': [batch_size, 17, 17, 1024],
|
||||
'Mixed_6h': [batch_size, 17, 17, 1024],
|
||||
# Reduction-A block
|
||||
'Mixed_7a': [batch_size, 8, 8, 1536],
|
||||
# 3 x Inception-C blocks
|
||||
'Mixed_7b': [batch_size, 8, 8, 1536],
|
||||
'Mixed_7c': [batch_size, 8, 8, 1536],
|
||||
'Mixed_7d': [batch_size, 8, 8, 1536],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, num_classes],
|
||||
'global_pool': [batch_size, 1, 1, 1536],
|
||||
'PreLogitsFlatten': [batch_size, 1536],
|
||||
'Logits': [batch_size, num_classes],
|
||||
'Predictions': [batch_size, num_classes]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = inception.inception_v4_base(inputs)
|
||||
self.assertTrue(net.op.name.startswith(
|
||||
'InceptionV4/Mixed_7d'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 8, 8, 1536])
|
||||
expected_endpoints = [
|
||||
'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'Mixed_3a',
|
||||
'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
|
||||
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
|
||||
'Mixed_6e', 'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a',
|
||||
'Mixed_7b', 'Mixed_7c', 'Mixed_7d']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
for name, op in end_points.items():
|
||||
self.assertTrue(op.name.startswith('InceptionV4/' + name))
|
||||
|
||||
def testBuildOnlyUpToFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
all_endpoints = [
|
||||
'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'Mixed_3a',
|
||||
'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
|
||||
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
|
||||
'Mixed_6e', 'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a',
|
||||
'Mixed_7b', 'Mixed_7c', 'Mixed_7d']
|
||||
for index, endpoint in enumerate(all_endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = inception.inception_v4_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'InceptionV4/' + endpoint))
|
||||
self.assertItemsEqual(all_endpoints[:index+1], end_points.keys())
|
||||
|
||||
def testVariablesSetDevice(self):
|
||||
batch_size = 5
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
# Force all Variables to reside on the device.
|
||||
with tf.compat.v1.variable_scope('on_cpu'), tf.device('/cpu:0'):
|
||||
inception.inception_v4(inputs, num_classes)
|
||||
with tf.compat.v1.variable_scope('on_gpu'), tf.device('/gpu:0'):
|
||||
inception.inception_v4(inputs, num_classes)
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'):
|
||||
self.assertDeviceEqual(v.device, '/cpu:0')
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'):
|
||||
self.assertDeviceEqual(v.device, '/gpu:0')
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v4(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7d']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 3, 3, 1536])
|
||||
|
||||
def testGlobalPool(self):
|
||||
batch_size = 1
|
||||
height, width = 350, 400
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = inception.inception_v4(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7d']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 9, 11, 1536])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
batch_size = 1
|
||||
height, width = 350, 400
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (batch_size, None, None, 3))
|
||||
logits, end_points = inception.inception_v4(
|
||||
inputs, num_classes, create_aux_logits=False)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Mixed_7d']
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
logits_out, pre_pool_out = sess.run([logits, pre_pool],
|
||||
{inputs: images.eval()})
|
||||
self.assertTupleEqual(logits_out.shape, (batch_size, num_classes))
|
||||
self.assertTupleEqual(pre_pool_out.shape, (batch_size, 9, 11, 1536))
|
||||
|
||||
def testUnknownBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = inception.inception_v4(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v4(eval_inputs,
|
||||
num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
inception.inception_v4(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = inception.inception_v4(eval_inputs,
|
||||
num_classes,
|
||||
is_training=False,
|
||||
reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testNoBatchNormScaleByDefault(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with contrib_slim.arg_scope(inception.inception_v4_arg_scope()):
|
||||
inception.inception_v4(inputs, num_classes, is_training=False)
|
||||
|
||||
self.assertEqual(tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'), [])
|
||||
|
||||
def testBatchNormScale(self):
|
||||
height, width = 299, 299
|
||||
num_classes = 1000
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (1, height, width, 3))
|
||||
with contrib_slim.arg_scope(
|
||||
inception.inception_v4_arg_scope(batch_norm_scale=True)):
|
||||
inception.inception_v4(inputs, num_classes, is_training=False)
|
||||
|
||||
gamma_names = set(
|
||||
v.op.name
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/gamma:0$'))
|
||||
self.assertGreater(len(gamma_names), 0)
|
||||
for v in tf.compat.v1.global_variables('.*/BatchNorm/moving_mean:0$'):
|
||||
self.assertIn(v.op.name[:-len('moving_mean')] + 'gamma', gamma_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+98
@@ -0,0 +1,98 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains a variant of the LeNet model definition."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def lenet(images, num_classes=10, is_training=False,
|
||||
dropout_keep_prob=0.5,
|
||||
prediction_fn=slim.softmax,
|
||||
scope='LeNet'):
|
||||
"""Creates a variant of the LeNet model.
|
||||
|
||||
Note that since the output is a set of 'logits', the values fall in the
|
||||
interval of (-infinity, infinity). Consequently, to convert the outputs to a
|
||||
probability distribution over the characters, one will need to convert them
|
||||
using the softmax function:
|
||||
|
||||
logits = lenet.lenet(images, is_training=False)
|
||||
probabilities = tf.nn.softmax(logits)
|
||||
predictions = tf.argmax(logits, 1)
|
||||
|
||||
Args:
|
||||
images: A batch of `Tensors` of size [batch_size, height, width, channels].
|
||||
num_classes: the number of classes in the dataset. If 0 or None, the logits
|
||||
layer is omitted and the input features to the logits layer are returned
|
||||
instead.
|
||||
is_training: specifies whether or not we're currently training the model.
|
||||
This variable will determine the behaviour of the dropout layer.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
net: a 2D Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the inon-dropped-out nput to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
"""
|
||||
end_points = {}
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'LeNet', [images]):
|
||||
net = end_points['conv1'] = slim.conv2d(images, 32, [5, 5], scope='conv1')
|
||||
net = end_points['pool1'] = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
|
||||
net = end_points['conv2'] = slim.conv2d(net, 64, [5, 5], scope='conv2')
|
||||
net = end_points['pool2'] = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
|
||||
net = slim.flatten(net)
|
||||
end_points['Flatten'] = net
|
||||
|
||||
net = end_points['fc3'] = slim.fully_connected(net, 1024, scope='fc3')
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
net = end_points['dropout3'] = slim.dropout(
|
||||
net, dropout_keep_prob, is_training=is_training, scope='dropout3')
|
||||
logits = end_points['Logits'] = slim.fully_connected(
|
||||
net, num_classes, activation_fn=None, scope='fc4')
|
||||
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
|
||||
return logits, end_points
|
||||
lenet.default_image_size = 28
|
||||
|
||||
|
||||
def lenet_arg_scope(weight_decay=0.0):
|
||||
"""Defines the default lenet argument scope.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the inception v3 model.
|
||||
"""
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d, slim.fully_connected],
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
weights_initializer=tf.compat.v1.truncated_normal_initializer(stddev=0.1),
|
||||
activation_fn=tf.nn.relu) as sc:
|
||||
return sc
|
||||
+4
@@ -0,0 +1,4 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project version="4">
|
||||
<component name="Encoding" addBOMForNewFiles="with NO BOM" />
|
||||
</project>
|
||||
+4
@@ -0,0 +1,4 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project version="4">
|
||||
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6" project-jdk-type="Python SDK" />
|
||||
</project>
|
||||
+8
@@ -0,0 +1,8 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<module type="PYTHON_MODULE" version="4">
|
||||
<component name="NewModuleRootManager">
|
||||
<content url="file://$MODULE_DIR$" />
|
||||
<orderEntry type="inheritedJdk" />
|
||||
<orderEntry type="sourceFolder" forTests="false" />
|
||||
</component>
|
||||
</module>
|
||||
+8
@@ -0,0 +1,8 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project version="4">
|
||||
<component name="ProjectModuleManager">
|
||||
<modules>
|
||||
<module fileurl="file://$PROJECT_DIR$/.idea/mobilenet.iml" filepath="$PROJECT_DIR$/.idea/mobilenet.iml" />
|
||||
</modules>
|
||||
</component>
|
||||
</project>
|
||||
+166
@@ -0,0 +1,166 @@
|
||||
# MobileNet
|
||||
|
||||
This folder contains building code for
|
||||
[MobileNetV2](https://arxiv.org/abs/1801.04381) and
|
||||
[MobilenetV3](https://arxiv.org/abs/1905.02244) networks. The architectural
|
||||
definition for each model is located in [mobilenet_v2.py](mobilenet_v2.py) and
|
||||
[mobilenet_v3.py](mobilenet_v3.py) respectively.
|
||||
|
||||
For MobilenetV1 please refer to this [page](../mobilenet_v1.md)
|
||||
|
||||
We have also introduced a family of MobileNets customized for the Edge TPU
|
||||
accelerator found in
|
||||
[Google Pixel4](https://blog.google/products/pixel/pixel-4/) devices. The
|
||||
architectural definition for MobileNetEdgeTPU is located in
|
||||
[mobilenet_v3.py](mobilenet_v3.py)
|
||||
|
||||
## Performance
|
||||
|
||||
### Mobilenet V3 latency
|
||||
|
||||
This is the timing of [MobileNetV2] vs [MobileNetV3] using TF-Lite on the large
|
||||
core of Pixel 1 phone.
|
||||
|
||||

|
||||
|
||||
### MACs
|
||||
|
||||
MACs, also sometimes known as MADDs - the number of multiply-accumulates needed
|
||||
to compute an inference on a single image is a common metric to measure the
|
||||
efficiency of the model. Full size Mobilenet V3 on image size 224 uses ~215
|
||||
Million MADDs (MMadds) while achieving accuracy 75.1%, while Mobilenet V2 uses
|
||||
~300MMadds and achieving accuracy 72%. By comparison ResNet-50 uses
|
||||
approximately 3500 MMAdds while achieving 76% accuracy.
|
||||
|
||||
Below is the graph comparing Mobilenets and a few selected networks. The size of
|
||||
each blob represents the number of parameters. Note for
|
||||
[ShuffleNet](https://arxiv.org/abs/1707.01083) there are no published size
|
||||
numbers. We estimate it to be comparable to MobileNetV2 numbers.
|
||||
|
||||

|
||||
|
||||
### Mobilenet EdgeTPU latency
|
||||
|
||||
The figure below shows the Pixel 4 Edge TPU latency of int8-quantized Mobilenet
|
||||
EdgeTPU compared with MobilenetV2 and the minimalistic variants of MobilenetV3
|
||||
(see below).
|
||||
|
||||

|
||||
|
||||
## Pretrained models
|
||||
|
||||
### Mobilenet V3 Imagenet Checkpoints
|
||||
|
||||
All mobilenet V3 checkpoints were trained with image resolution 224x224. All
|
||||
phone latencies are in milliseconds, measured on large core. In addition to
|
||||
large and small models this page also contains so-called minimalistic models,
|
||||
these models have the same per-layer dimensions characteristic as MobilenetV3
|
||||
however, they don't utilize any of the advanced blocks (squeeze-and-excite
|
||||
units, hard-swish, and 5x5 convolutions). While these models are less efficient
|
||||
on CPU, we find that they are much more performant on GPU/DSP.
|
||||
|
||||
| Imagenet Checkpoint | MACs (M) | Params (M) | Top1 | Pixel 1 | Pixel 2 | Pixel 3 |
|
||||
| ------------------ | -------- | ---------- | ---- | ------- | ------- | ------- |
|
||||
| [Large dm=1 (float)] | 217 | 5.4 | 75.2 | 51.2 | 61 | 44 |
|
||||
| [Large dm=1 (8-bit)] | 217 | 5.4 | 73.9 | 44 | 42.5 | 32 |
|
||||
| [Large dm=0.75 (float)] | 155 | 4.0 | 73.3 | 39.8 | 48 | 34 |
|
||||
| [Small dm=1 (float)] | 66 | 2.9 | 67.5 | 15.8 | 19.4 | 14.4 |
|
||||
| [Small dm=1 (8-bit)] | 66 | 2.9 | 64.9 | 15.5 | 15 | 10.7 |
|
||||
| [Small dm=0.75 (float)] | 44 | 2.4 | 65.4 | 12.8 | 15.9 | 11.6 |
|
||||
|
||||
#### Minimalistic checkpoints:
|
||||
|
||||
| Imagenet Checkpoint | MACs (M) | Params (M) | Top1 | Pixel 1 | Pixel 2 | Pixel 3 |
|
||||
| -------------- | -------- | ---------- | ---- | ------- | ------- | ------- |
|
||||
| [Large minimalistic (float)] | 209 | 3.9 | 72.3 | 44.1 | 51 | 35 |
|
||||
| [Large minimalistic (8-bit)][lm8] | 209 | 3.9 | 71.3 | 37 | 35 | 27 |
|
||||
| [Small minimalistic (float)] | 65 | 2.0 | 61.9 | 12.2 | 15.1 | 11 |
|
||||
|
||||
#### Edge TPU checkpoints:
|
||||
|
||||
| Imagenet Checkpoint | MACs (M) | Params (M) | Top1 | Pixel 4 Edge TPU | Pixel 4 CPU |
|
||||
| ----------------- | -------- | ---------- | ---- | ------- | ----------- |
|
||||
| [MobilenetEdgeTPU dm=0.75 (8-bit)]| 624 | 2.9 | 73.5 | 3.1 | 13.8 |
|
||||
| [MobilenetEdgeTPU dm=1 (8-bit)] | 990 | 4.0 | 75.6 | 3.6 | 20.6 |
|
||||
|
||||
|
||||
Note: 8-bit quantized versions of the MobilenetEdgeTPU models were obtained
|
||||
using Tensorflow Lite's
|
||||
[post training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization)
|
||||
tool.
|
||||
|
||||
[Small minimalistic (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-small-minimalistic_224_1.0_float.tgz
|
||||
[Large minimalistic (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-large-minimalistic_224_1.0_float.tgz
|
||||
[lm8]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-large-minimalistic_224_1.0_uint8.tgz
|
||||
[Large dm=1 (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-large_224_1.0_float.tgz
|
||||
[Small dm=1 (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-small_224_1.0_float.tgz
|
||||
[Large dm=1 (8-bit)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-large_224_1.0_uint8.tgz
|
||||
[Small dm=1 (8-bit)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-small_224_1.0_uint8.tgz
|
||||
[Large dm=0.75 (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-large_224_0.75_float.tgz
|
||||
[Small dm=0.75 (float)]: https://storage.googleapis.com/mobilenet_v3/checkpoints/v3-small_224_0.75_float.tgz
|
||||
[MobilenetEdgeTPU dm=0.75 (8-bit)]: https://storage.cloud.google.com/mobilenet_edgetpu/checkpoints/mobilenet_edgetpu_224_0.75.tgz
|
||||
[MobilenetEdgeTPU dm=1 (8-bit)]: https://storage.cloud.google.com/mobilenet_edgetpu/checkpoints/mobilenet_edgetpu_224_1.0.tgz
|
||||
|
||||
### Mobilenet V2 Imagenet Checkpoints
|
||||
|
||||
Classification Checkpoint | MACs (M) | Parameters (M) | Top 1 Accuracy | Top 5 Accuracy | Mobile CPU (ms) Pixel 1
|
||||
---------------------------------------------------------------------------------------------------------- | -------- | -------------- | -------------- | -------------- | -----------------------
|
||||
[mobilenet_v2_1.4_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.4_224.tgz) | 582 | 6.06 | 75.0 | 92.5 | 138.0
|
||||
[mobilenet_v2_1.3_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.3_224.tgz) | 509 | 5.34 | 74.4 | 92.1 | 123.0
|
||||
[mobilenet_v2_1.0_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_224.tgz) | 300 | 3.47 | 71.8 | 91.0 | 73.8
|
||||
[mobilenet_v2_1.0_192](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_192.tgz) | 221 | 3.47 | 70.7 | 90.1 | 55.1
|
||||
[mobilenet_v2_1.0_160](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_160.tgz) | 154 | 3.47 | 68.8 | 89.0 | 40.2
|
||||
[mobilenet_v2_1.0_128](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_128.tgz) | 99 | 3.47 | 65.3 | 86.9 | 27.6
|
||||
[mobilenet_v2_1.0_96](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_96.tgz) | 56 | 3.47 | 60.3 | 83.2 | 17.6
|
||||
[mobilenet_v2_0.75_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.75_224.tgz) | 209 | 2.61 | 69.8 | 89.6 | 55.8
|
||||
[mobilenet_v2_0.75_192](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.75_192.tgz) | 153 | 2.61 | 68.7 | 88.9 | 41.6
|
||||
[mobilenet_v2_0.75_160](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.75_160.tgz) | 107 | 2.61 | 66.4 | 87.3 | 30.4
|
||||
[mobilenet_v2_0.75_128](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.75_128.tgz) | 69 | 2.61 | 63.2 | 85.3 | 21.9
|
||||
[mobilenet_v2_0.75_96](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.75_96.tgz) | 39 | 2.61 | 58.8 | 81.6 | 14.2
|
||||
[mobilenet_v2_0.5_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.5_224.tgz) | 97 | 1.95 | 65.4 | 86.4 | 28.7
|
||||
[mobilenet_v2_0.5_192](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.5_192.tgz) | 71 | 1.95 | 63.9 | 85.4 | 21.1
|
||||
[mobilenet_v2_0.5_160](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.5_160.tgz) | 50 | 1.95 | 61.0 | 83.2 | 14.9
|
||||
[mobilenet_v2_0.5_128](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.5_128.tgz) | 32 | 1.95 | 57.7 | 80.8 | 9.9
|
||||
[mobilenet_v2_0.5_96](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.5_96.tgz) | 18 | 1.95 | 51.2 | 75.8 | 6.4
|
||||
[mobilenet_v2_0.35_224](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_224.tgz) | 59 | 1.66 | 60.3 | 82.9 | 19.7
|
||||
[mobilenet_v2_0.35_192](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_192.tgz) | 43 | 1.66 | 58.2 | 81.2 | 14.6
|
||||
[mobilenet_v2_0.35_160](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_160.tgz) | 30 | 1.66 | 55.7 | 79.1 | 10.5
|
||||
[mobilenet_v2_0.35_128](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_128.tgz) | 20 | 1.66 | 50.8 | 75.0 | 6.9
|
||||
[mobilenet_v2_0.35_96](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_96.tgz) | 11 | 1.66 | 45.5 | 70.4 | 4.5
|
||||
|
||||
## Training
|
||||
|
||||
### V3
|
||||
|
||||
TODO: Add V3 hyperparameters
|
||||
|
||||
### V2
|
||||
|
||||
The numbers above can be reproduced using slim's
|
||||
[`train_image_classifier`](https://github.com/tensorflow/models/blob/master/research/slim/README.md#training-a-model-from-scratch).
|
||||
Below is the set of parameters that achieves 72.0% for full size MobileNetV2,
|
||||
after about 700K when trained on 8 GPU. If trained on a single GPU the full
|
||||
convergence is after 5.5M steps. Also note that learning rate and
|
||||
num_epochs_per_decay both need to be adjusted depending on how many GPUs are
|
||||
being used due to slim's internal averaging.
|
||||
|
||||
```bash
|
||||
--model_name="mobilenet_v2"
|
||||
--learning_rate=0.045 * NUM_GPUS #slim internally averages clones so we compensate
|
||||
--preprocessing_name="inception_v2"
|
||||
--label_smoothing=0.1
|
||||
--moving_average_decay=0.9999
|
||||
--batch_size= 96
|
||||
--num_clones = NUM_GPUS # you can use any number here between 1 and 8 depending on your hardware setup.
|
||||
--learning_rate_decay_factor=0.98
|
||||
--num_epochs_per_decay = 2.5 / NUM_GPUS # train_image_classifier does per clone epochs
|
||||
```
|
||||
|
||||
# Example
|
||||
|
||||
See this [ipython notebook](mobilenet_example.ipynb) or open and run the network
|
||||
directly in
|
||||
[Colaboratory](https://colab.research.google.com/github/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_example.ipynb).
|
||||
|
||||
[MobilenetV2]: https://arxiv.org/abs/1801.04381
|
||||
[MobilenetV3]: https://arxiv.org/abs/1905.02244
|
||||
+475
@@ -0,0 +1,475 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Convolution blocks for mobilenet."""
|
||||
import contextlib
|
||||
import functools
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def _fixed_padding(inputs, kernel_size, rate=1):
|
||||
"""Pads the input along the spatial dimensions independently of input size.
|
||||
|
||||
Pads the input such that if it was used in a convolution with 'VALID' padding,
|
||||
the output would have the same dimensions as if the unpadded input was used
|
||||
in a convolution with 'SAME' padding.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch, height_in, width_in, channels].
|
||||
kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
|
||||
rate: An integer, rate for atrous convolution.
|
||||
|
||||
Returns:
|
||||
output: A tensor of size [batch, height_out, width_out, channels] with the
|
||||
input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
|
||||
"""
|
||||
kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
|
||||
kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
|
||||
pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
|
||||
pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
|
||||
pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
|
||||
padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
|
||||
[pad_beg[1], pad_end[1]], [0, 0]])
|
||||
return padded_inputs
|
||||
|
||||
|
||||
def _make_divisible(v, divisor, min_value=None):
|
||||
if min_value is None:
|
||||
min_value = divisor
|
||||
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
|
||||
# Make sure that round down does not go down by more than 10%.
|
||||
if new_v < 0.9 * v:
|
||||
new_v += divisor
|
||||
return new_v
|
||||
|
||||
|
||||
def _split_divisible(num, num_ways, divisible_by=8):
|
||||
"""Evenly splits num, num_ways so each piece is a multiple of divisible_by."""
|
||||
assert num % divisible_by == 0
|
||||
assert num / num_ways >= divisible_by
|
||||
# Note: want to round down, we adjust each split to match the total.
|
||||
base = num // num_ways // divisible_by * divisible_by
|
||||
result = []
|
||||
accumulated = 0
|
||||
for i in range(num_ways):
|
||||
r = base
|
||||
while accumulated + r < num * (i + 1) / num_ways:
|
||||
r += divisible_by
|
||||
result.append(r)
|
||||
accumulated += r
|
||||
assert accumulated == num
|
||||
return result
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def _v1_compatible_scope_naming(scope):
|
||||
"""v1 compatible scope naming."""
|
||||
if scope is None: # Create uniqified separable blocks.
|
||||
with tf.compat.v1.variable_scope(None, default_name='separable') as s, \
|
||||
tf.compat.v1.name_scope(s.original_name_scope):
|
||||
yield ''
|
||||
else:
|
||||
# We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
|
||||
# which provide numbered scopes.
|
||||
scope += '_'
|
||||
yield scope
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def split_separable_conv2d(input_tensor,
|
||||
num_outputs,
|
||||
scope=None,
|
||||
normalizer_fn=None,
|
||||
stride=1,
|
||||
rate=1,
|
||||
endpoints=None,
|
||||
use_explicit_padding=False):
|
||||
"""Separable mobilenet V1 style convolution.
|
||||
|
||||
Depthwise convolution, with default non-linearity,
|
||||
followed by 1x1 depthwise convolution. This is similar to
|
||||
slim.separable_conv2d, but differs in tha it applies batch
|
||||
normalization and non-linearity to depthwise. This matches
|
||||
the basic building of Mobilenet Paper
|
||||
(https://arxiv.org/abs/1704.04861)
|
||||
|
||||
Args:
|
||||
input_tensor: input
|
||||
num_outputs: number of outputs
|
||||
scope: optional name of the scope. Note if provided it will use
|
||||
scope_depthwise for deptwhise, and scope_pointwise for pointwise.
|
||||
normalizer_fn: which normalizer function to use for depthwise/pointwise
|
||||
stride: stride
|
||||
rate: output rate (also known as dilation rate)
|
||||
endpoints: optional, if provided, will export additional tensors to it.
|
||||
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
|
||||
inputs so that the output dimensions are the same as if 'SAME' padding
|
||||
were used.
|
||||
|
||||
Returns:
|
||||
output tesnor
|
||||
"""
|
||||
|
||||
with _v1_compatible_scope_naming(scope) as scope:
|
||||
dw_scope = scope + 'depthwise'
|
||||
endpoints = endpoints if endpoints is not None else {}
|
||||
kernel_size = [3, 3]
|
||||
padding = 'SAME'
|
||||
if use_explicit_padding:
|
||||
padding = 'VALID'
|
||||
input_tensor = _fixed_padding(input_tensor, kernel_size, rate)
|
||||
net = slim.separable_conv2d(
|
||||
input_tensor,
|
||||
None,
|
||||
kernel_size,
|
||||
depth_multiplier=1,
|
||||
stride=stride,
|
||||
rate=rate,
|
||||
normalizer_fn=normalizer_fn,
|
||||
padding=padding,
|
||||
scope=dw_scope)
|
||||
|
||||
endpoints[dw_scope] = net
|
||||
|
||||
pw_scope = scope + 'pointwise'
|
||||
net = slim.conv2d(
|
||||
net,
|
||||
num_outputs, [1, 1],
|
||||
stride=1,
|
||||
normalizer_fn=normalizer_fn,
|
||||
scope=pw_scope)
|
||||
endpoints[pw_scope] = net
|
||||
return net
|
||||
|
||||
|
||||
def expand_input_by_factor(n, divisible_by=8):
|
||||
return lambda num_inputs, **_: _make_divisible(num_inputs * n, divisible_by)
|
||||
|
||||
|
||||
def split_conv(input_tensor,
|
||||
num_outputs,
|
||||
num_ways,
|
||||
scope,
|
||||
divisible_by=8,
|
||||
**kwargs):
|
||||
"""Creates a split convolution.
|
||||
|
||||
Split convolution splits the input and output into
|
||||
'num_blocks' blocks of approximately the same size each,
|
||||
and only connects $i$-th input to $i$ output.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor
|
||||
num_outputs: number of output filters
|
||||
num_ways: num blocks to split by.
|
||||
scope: scope for all the operators.
|
||||
divisible_by: make sure that every part is divisiable by this.
|
||||
**kwargs: will be passed directly into conv2d operator
|
||||
Returns:
|
||||
tensor
|
||||
"""
|
||||
b = input_tensor.get_shape().as_list()[3]
|
||||
|
||||
if num_ways == 1 or min(b // num_ways,
|
||||
num_outputs // num_ways) < divisible_by:
|
||||
# Don't do any splitting if we end up with less than 8 filters
|
||||
# on either side.
|
||||
return slim.conv2d(input_tensor, num_outputs, [1, 1], scope=scope, **kwargs)
|
||||
|
||||
outs = []
|
||||
input_splits = _split_divisible(b, num_ways, divisible_by=divisible_by)
|
||||
output_splits = _split_divisible(
|
||||
num_outputs, num_ways, divisible_by=divisible_by)
|
||||
inputs = tf.split(input_tensor, input_splits, axis=3, name='split_' + scope)
|
||||
base = scope
|
||||
for i, (input_tensor, out_size) in enumerate(zip(inputs, output_splits)):
|
||||
scope = base + '_part_%d' % (i,)
|
||||
n = slim.conv2d(input_tensor, out_size, [1, 1], scope=scope, **kwargs)
|
||||
n = tf.identity(n, scope + '_output')
|
||||
outs.append(n)
|
||||
return tf.concat(outs, 3, name=scope + '_concat')
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def expanded_conv(input_tensor,
|
||||
num_outputs,
|
||||
expansion_size=expand_input_by_factor(6),
|
||||
stride=1,
|
||||
rate=1,
|
||||
kernel_size=(3, 3),
|
||||
residual=True,
|
||||
normalizer_fn=None,
|
||||
split_projection=1,
|
||||
split_expansion=1,
|
||||
split_divisible_by=8,
|
||||
expansion_transform=None,
|
||||
depthwise_location='expansion',
|
||||
depthwise_channel_multiplier=1,
|
||||
endpoints=None,
|
||||
use_explicit_padding=False,
|
||||
padding='SAME',
|
||||
inner_activation_fn=None,
|
||||
depthwise_activation_fn=None,
|
||||
project_activation_fn=tf.identity,
|
||||
depthwise_fn=slim.separable_conv2d,
|
||||
expansion_fn=split_conv,
|
||||
projection_fn=split_conv,
|
||||
scope=None):
|
||||
"""Depthwise Convolution Block with expansion.
|
||||
|
||||
Builds a composite convolution that has the following structure
|
||||
expansion (1x1) -> depthwise (kernel_size) -> projection (1x1)
|
||||
|
||||
Args:
|
||||
input_tensor: input
|
||||
num_outputs: number of outputs in the final layer.
|
||||
expansion_size: the size of expansion, could be a constant or a callable.
|
||||
If latter it will be provided 'num_inputs' as an input. For forward
|
||||
compatibility it should accept arbitrary keyword arguments.
|
||||
Default will expand the input by factor of 6.
|
||||
stride: depthwise stride
|
||||
rate: depthwise rate
|
||||
kernel_size: depthwise kernel
|
||||
residual: whether to include residual connection between input
|
||||
and output.
|
||||
normalizer_fn: batchnorm or otherwise
|
||||
split_projection: how many ways to split projection operator
|
||||
(that is conv expansion->bottleneck)
|
||||
split_expansion: how many ways to split expansion op
|
||||
(that is conv bottleneck->expansion) ops will keep depth divisible
|
||||
by this value.
|
||||
split_divisible_by: make sure every split group is divisible by this number.
|
||||
expansion_transform: Optional function that takes expansion
|
||||
as a single input and returns output.
|
||||
depthwise_location: where to put depthwise covnvolutions supported
|
||||
values None, 'input', 'output', 'expansion'
|
||||
depthwise_channel_multiplier: depthwise channel multiplier:
|
||||
each input will replicated (with different filters)
|
||||
that many times. So if input had c channels,
|
||||
output will have c x depthwise_channel_multpilier.
|
||||
endpoints: An optional dictionary into which intermediate endpoints are
|
||||
placed. The keys "expansion_output", "depthwise_output",
|
||||
"projection_output" and "expansion_transform" are always populated, even
|
||||
if the corresponding functions are not invoked.
|
||||
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
|
||||
inputs so that the output dimensions are the same as if 'SAME' padding
|
||||
were used.
|
||||
padding: Padding type to use if `use_explicit_padding` is not set.
|
||||
inner_activation_fn: activation function to use in all inner convolutions.
|
||||
If none, will rely on slim default scopes.
|
||||
depthwise_activation_fn: activation function to use for deptwhise only.
|
||||
If not provided will rely on slim default scopes. If both
|
||||
inner_activation_fn and depthwise_activation_fn are provided,
|
||||
depthwise_activation_fn takes precedence over inner_activation_fn.
|
||||
project_activation_fn: activation function for the project layer.
|
||||
(note this layer is not affected by inner_activation_fn)
|
||||
depthwise_fn: Depthwise convolution function.
|
||||
expansion_fn: Expansion convolution function. If use custom function then
|
||||
"split_expansion" and "split_divisible_by" will be ignored.
|
||||
projection_fn: Projection convolution function. If use custom function then
|
||||
"split_projection" and "split_divisible_by" will be ignored.
|
||||
|
||||
scope: optional scope.
|
||||
|
||||
Returns:
|
||||
Tensor of depth num_outputs
|
||||
|
||||
Raises:
|
||||
TypeError: on inval
|
||||
"""
|
||||
conv_defaults = {}
|
||||
dw_defaults = {}
|
||||
if inner_activation_fn is not None:
|
||||
conv_defaults['activation_fn'] = inner_activation_fn
|
||||
dw_defaults['activation_fn'] = inner_activation_fn
|
||||
if depthwise_activation_fn is not None:
|
||||
dw_defaults['activation_fn'] = depthwise_activation_fn
|
||||
# pylint: disable=g-backslash-continuation
|
||||
with tf.compat.v1.variable_scope(scope, default_name='expanded_conv') as s, \
|
||||
tf.compat.v1.name_scope(s.original_name_scope), \
|
||||
slim.arg_scope((slim.conv2d,), **conv_defaults), \
|
||||
slim.arg_scope((slim.separable_conv2d,), **dw_defaults):
|
||||
prev_depth = input_tensor.get_shape().as_list()[3]
|
||||
if depthwise_location not in [None, 'input', 'output', 'expansion']:
|
||||
raise TypeError('%r is unknown value for depthwise_location' %
|
||||
depthwise_location)
|
||||
if use_explicit_padding:
|
||||
if padding != 'SAME':
|
||||
raise TypeError('`use_explicit_padding` should only be used with '
|
||||
'"SAME" padding.')
|
||||
padding = 'VALID'
|
||||
depthwise_func = functools.partial(
|
||||
depthwise_fn,
|
||||
num_outputs=None,
|
||||
kernel_size=kernel_size,
|
||||
depth_multiplier=depthwise_channel_multiplier,
|
||||
stride=stride,
|
||||
rate=rate,
|
||||
normalizer_fn=normalizer_fn,
|
||||
padding=padding,
|
||||
scope='depthwise')
|
||||
# b1 -> b2 * r -> b2
|
||||
# i -> (o * r) (bottleneck) -> o
|
||||
input_tensor = tf.identity(input_tensor, 'input')
|
||||
net = input_tensor
|
||||
|
||||
if depthwise_location == 'input':
|
||||
if use_explicit_padding:
|
||||
net = _fixed_padding(net, kernel_size, rate)
|
||||
net = depthwise_func(net, activation_fn=None)
|
||||
net = tf.identity(net, name='depthwise_output')
|
||||
if endpoints is not None:
|
||||
endpoints['depthwise_output'] = net
|
||||
|
||||
if callable(expansion_size):
|
||||
inner_size = expansion_size(num_inputs=prev_depth)
|
||||
else:
|
||||
inner_size = expansion_size
|
||||
|
||||
if inner_size > net.shape[3]:
|
||||
if expansion_fn == split_conv:
|
||||
expansion_fn = functools.partial(
|
||||
expansion_fn,
|
||||
num_ways=split_expansion,
|
||||
divisible_by=split_divisible_by,
|
||||
stride=1)
|
||||
net = expansion_fn(
|
||||
net,
|
||||
inner_size,
|
||||
scope='expand',
|
||||
normalizer_fn=normalizer_fn)
|
||||
net = tf.identity(net, 'expansion_output')
|
||||
if endpoints is not None:
|
||||
endpoints['expansion_output'] = net
|
||||
|
||||
if depthwise_location == 'expansion':
|
||||
if use_explicit_padding:
|
||||
net = _fixed_padding(net, kernel_size, rate)
|
||||
net = depthwise_func(net)
|
||||
net = tf.identity(net, name='depthwise_output')
|
||||
if endpoints is not None:
|
||||
endpoints['depthwise_output'] = net
|
||||
|
||||
if expansion_transform:
|
||||
net = expansion_transform(expansion_tensor=net, input_tensor=input_tensor)
|
||||
# Note in contrast with expansion, we always have
|
||||
# projection to produce the desired output size.
|
||||
if projection_fn == split_conv:
|
||||
projection_fn = functools.partial(
|
||||
projection_fn,
|
||||
num_ways=split_projection,
|
||||
divisible_by=split_divisible_by,
|
||||
stride=1)
|
||||
net = projection_fn(
|
||||
net,
|
||||
num_outputs,
|
||||
scope='project',
|
||||
normalizer_fn=normalizer_fn,
|
||||
activation_fn=project_activation_fn)
|
||||
if endpoints is not None:
|
||||
endpoints['projection_output'] = net
|
||||
if depthwise_location == 'output':
|
||||
if use_explicit_padding:
|
||||
net = _fixed_padding(net, kernel_size, rate)
|
||||
net = depthwise_func(net, activation_fn=None)
|
||||
net = tf.identity(net, name='depthwise_output')
|
||||
if endpoints is not None:
|
||||
endpoints['depthwise_output'] = net
|
||||
|
||||
if callable(residual): # custom residual
|
||||
net = residual(input_tensor=input_tensor, output_tensor=net)
|
||||
elif (residual and
|
||||
# stride check enforces that we don't add residuals when spatial
|
||||
# dimensions are None
|
||||
stride == 1 and
|
||||
# Depth matches
|
||||
net.get_shape().as_list()[3] ==
|
||||
input_tensor.get_shape().as_list()[3]):
|
||||
net += input_tensor
|
||||
return tf.identity(net, name='output')
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def squeeze_excite(input_tensor,
|
||||
divisible_by=8,
|
||||
squeeze_factor=3,
|
||||
inner_activation_fn=tf.nn.relu,
|
||||
gating_fn=tf.sigmoid,
|
||||
squeeze_input_tensor=None,
|
||||
pool=None):
|
||||
"""Squeeze excite block for Mobilenet V3.
|
||||
|
||||
If the squeeze_input_tensor - or the input_tensor if squeeze_input_tensor is
|
||||
None - contains variable dimensions (Nonetype in tensor shape), perform
|
||||
average pooling (as the first step in the squeeze operation) by calling
|
||||
reduce_mean across the H/W of the input tensor.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor to apply SE block to.
|
||||
divisible_by: ensures all inner dimensions are divisible by this number.
|
||||
squeeze_factor: the factor of squeezing in the inner fully connected layer
|
||||
inner_activation_fn: non-linearity to be used in inner layer.
|
||||
gating_fn: non-linearity to be used for final gating function
|
||||
squeeze_input_tensor: custom tensor to use for computing gating activation.
|
||||
If provided the result will be input_tensor * SE(squeeze_input_tensor)
|
||||
instead of input_tensor * SE(input_tensor).
|
||||
pool: if number is provided will average pool with that kernel size
|
||||
to compute inner tensor, followed by bilinear upsampling.
|
||||
|
||||
Returns:
|
||||
Gated input_tensor. (e.g. X * SE(X))
|
||||
"""
|
||||
with tf.compat.v1.variable_scope('squeeze_excite'):
|
||||
if squeeze_input_tensor is None:
|
||||
squeeze_input_tensor = input_tensor
|
||||
input_size = input_tensor.shape.as_list()[1:3]
|
||||
pool_height, pool_width = squeeze_input_tensor.shape.as_list()[1:3]
|
||||
stride = 1
|
||||
if pool is not None and pool_height >= pool:
|
||||
pool_height, pool_width, stride = pool, pool, pool
|
||||
input_channels = squeeze_input_tensor.shape.as_list()[3]
|
||||
output_channels = input_tensor.shape.as_list()[3]
|
||||
squeeze_channels = _make_divisible(
|
||||
input_channels / squeeze_factor, divisor=divisible_by)
|
||||
|
||||
if pool is None:
|
||||
pooled = tf.reduce_mean(squeeze_input_tensor, axis=[1, 2], keepdims=True)
|
||||
else:
|
||||
pooled = tf.nn.avg_pool(
|
||||
squeeze_input_tensor, (1, pool_height, pool_width, 1),
|
||||
strides=(1, stride, stride, 1),
|
||||
padding='VALID')
|
||||
squeeze = slim.conv2d(
|
||||
pooled,
|
||||
kernel_size=(1, 1),
|
||||
num_outputs=squeeze_channels,
|
||||
normalizer_fn=None,
|
||||
activation_fn=inner_activation_fn)
|
||||
excite_outputs = output_channels
|
||||
excite = slim.conv2d(squeeze, num_outputs=excite_outputs,
|
||||
kernel_size=[1, 1],
|
||||
normalizer_fn=None,
|
||||
activation_fn=gating_fn)
|
||||
if pool is not None:
|
||||
# Note: As of 03/20/2019 only BILINEAR (the default) with
|
||||
# align_corners=True has gradients implemented in TPU.
|
||||
excite = tf.image.resize_images(
|
||||
excite, input_size,
|
||||
align_corners=True)
|
||||
result = input_tensor * excite
|
||||
return result
|
||||
BIN
Binary file not shown.
|
After Width: | Height: | Size: 47 KiB |
BIN
Binary file not shown.
|
After Width: | Height: | Size: 73 KiB |
BIN
Binary file not shown.
|
After Width: | Height: | Size: 80 KiB |
+501
@@ -0,0 +1,501 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Mobilenet Base Class."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
import collections
|
||||
import contextlib
|
||||
import copy
|
||||
import os
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def apply_activation(x, name=None, activation_fn=None):
|
||||
return activation_fn(x, name=name) if activation_fn else x
|
||||
|
||||
|
||||
def _fixed_padding(inputs, kernel_size, rate=1):
|
||||
"""Pads the input along the spatial dimensions independently of input size.
|
||||
|
||||
Pads the input such that if it was used in a convolution with 'VALID' padding,
|
||||
the output would have the same dimensions as if the unpadded input was used
|
||||
in a convolution with 'SAME' padding.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch, height_in, width_in, channels].
|
||||
kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
|
||||
rate: An integer, rate for atrous convolution.
|
||||
|
||||
Returns:
|
||||
output: A tensor of size [batch, height_out, width_out, channels] with the
|
||||
input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
|
||||
"""
|
||||
kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
|
||||
kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
|
||||
pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
|
||||
pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
|
||||
pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
|
||||
padded_inputs = tf.pad(
|
||||
tensor=inputs,
|
||||
paddings=[[0, 0], [pad_beg[0], pad_end[0]], [pad_beg[1], pad_end[1]],
|
||||
[0, 0]])
|
||||
return padded_inputs
|
||||
|
||||
|
||||
def _make_divisible(v, divisor, min_value=None):
|
||||
if min_value is None:
|
||||
min_value = divisor
|
||||
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
|
||||
# Make sure that round down does not go down by more than 10%.
|
||||
if new_v < 0.9 * v:
|
||||
new_v += divisor
|
||||
return int(new_v)
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def _set_arg_scope_defaults(defaults):
|
||||
"""Sets arg scope defaults for all items present in defaults.
|
||||
|
||||
Args:
|
||||
defaults: dictionary/list of pairs, containing a mapping from
|
||||
function to a dictionary of default args.
|
||||
|
||||
Yields:
|
||||
context manager where all defaults are set.
|
||||
"""
|
||||
if hasattr(defaults, 'items'):
|
||||
items = list(defaults.items())
|
||||
else:
|
||||
items = defaults
|
||||
if not items:
|
||||
yield
|
||||
else:
|
||||
func, default_arg = items[0]
|
||||
with slim.arg_scope(func, **default_arg):
|
||||
with _set_arg_scope_defaults(items[1:]):
|
||||
yield
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def depth_multiplier(output_params,
|
||||
multiplier,
|
||||
divisible_by=8,
|
||||
min_depth=8,
|
||||
**unused_kwargs):
|
||||
if 'num_outputs' not in output_params:
|
||||
return
|
||||
d = output_params['num_outputs']
|
||||
output_params['num_outputs'] = _make_divisible(d * multiplier, divisible_by,
|
||||
min_depth)
|
||||
|
||||
|
||||
_Op = collections.namedtuple('Op', ['op', 'params', 'multiplier_func'])
|
||||
|
||||
|
||||
def op(opfunc, multiplier_func=depth_multiplier, **params):
|
||||
multiplier = params.pop('multiplier_transform', multiplier_func)
|
||||
return _Op(opfunc, params=params, multiplier_func=multiplier)
|
||||
|
||||
|
||||
class NoOpScope(object):
|
||||
"""No-op context manager."""
|
||||
|
||||
def __enter__(self):
|
||||
return None
|
||||
|
||||
def __exit__(self, exc_type, exc_value, traceback):
|
||||
return False
|
||||
|
||||
|
||||
def safe_arg_scope(funcs, **kwargs):
|
||||
"""Returns `slim.arg_scope` with all None arguments removed.
|
||||
|
||||
Arguments:
|
||||
funcs: Functions to pass to `arg_scope`.
|
||||
**kwargs: Arguments to pass to `arg_scope`.
|
||||
|
||||
Returns:
|
||||
arg_scope or No-op context manager.
|
||||
|
||||
Note: can be useful if None value should be interpreted as "do not overwrite
|
||||
this parameter value".
|
||||
"""
|
||||
filtered_args = {name: value for name, value in kwargs.items()
|
||||
if value is not None}
|
||||
if filtered_args:
|
||||
return slim.arg_scope(funcs, **filtered_args)
|
||||
else:
|
||||
return NoOpScope()
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet_base( # pylint: disable=invalid-name
|
||||
inputs,
|
||||
conv_defs,
|
||||
multiplier=1.0,
|
||||
final_endpoint=None,
|
||||
output_stride=None,
|
||||
use_explicit_padding=False,
|
||||
scope=None,
|
||||
is_training=False):
|
||||
"""Mobilenet base network.
|
||||
|
||||
Constructs a network from inputs to the given final endpoint. By default
|
||||
the network is constructed in inference mode. To create network
|
||||
in training mode use:
|
||||
|
||||
with slim.arg_scope(mobilenet.training_scope()):
|
||||
logits, endpoints = mobilenet_base(...)
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
conv_defs: A list of op(...) layers specifying the net architecture.
|
||||
multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
final_endpoint: The name of last layer, for early termination for
|
||||
for V1-based networks: last layer is "layer_14", for V2: "layer_20"
|
||||
output_stride: An integer that specifies the requested ratio of input to
|
||||
output spatial resolution. If not None, then we invoke atrous convolution
|
||||
if necessary to prevent the network from reducing the spatial resolution
|
||||
of the activation maps. Allowed values are 1 or any even number, excluding
|
||||
zero. Typical values are 8 (accurate fully convolutional mode), 16
|
||||
(fast fully convolutional mode), and 32 (classification mode).
|
||||
|
||||
NOTE- output_stride relies on all consequent operators to support dilated
|
||||
operators via "rate" parameter. This might require wrapping non-conv
|
||||
operators to operate properly.
|
||||
|
||||
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
|
||||
inputs so that the output dimensions are the same as if 'SAME' padding
|
||||
were used.
|
||||
scope: optional variable scope.
|
||||
is_training: How to setup batch_norm and other ops. Note: most of the time
|
||||
this does not need be set directly. Use mobilenet.training_scope() to set
|
||||
up training instead. This parameter is here for backward compatibility
|
||||
only. It is safe to set it to the value matching
|
||||
training_scope(is_training=...). It is also safe to explicitly set
|
||||
it to False, even if there is outer training_scope set to to training.
|
||||
(The network will be built in inference mode). If this is set to None,
|
||||
no arg_scope is added for slim.batch_norm's is_training parameter.
|
||||
|
||||
Returns:
|
||||
tensor_out: output tensor.
|
||||
end_points: a set of activations for external use, for example summaries or
|
||||
losses.
|
||||
|
||||
Raises:
|
||||
ValueError: depth_multiplier <= 0, or the target output_stride is not
|
||||
allowed.
|
||||
"""
|
||||
if multiplier <= 0:
|
||||
raise ValueError('multiplier is not greater than zero.')
|
||||
|
||||
# Set conv defs defaults and overrides.
|
||||
conv_defs_defaults = conv_defs.get('defaults', {})
|
||||
conv_defs_overrides = conv_defs.get('overrides', {})
|
||||
if use_explicit_padding:
|
||||
conv_defs_overrides = copy.deepcopy(conv_defs_overrides)
|
||||
conv_defs_overrides[
|
||||
(slim.conv2d, slim.separable_conv2d)] = {'padding': 'VALID'}
|
||||
|
||||
if output_stride is not None:
|
||||
if output_stride == 0 or (output_stride > 1 and output_stride % 2):
|
||||
raise ValueError('Output stride must be None, 1 or a multiple of 2.')
|
||||
|
||||
# a) Set the tensorflow scope
|
||||
# b) set padding to default: note we might consider removing this
|
||||
# since it is also set by mobilenet_scope
|
||||
# c) set all defaults
|
||||
# d) set all extra overrides.
|
||||
# pylint: disable=g-backslash-continuation
|
||||
with _scope_all(scope, default_scope='Mobilenet'), \
|
||||
safe_arg_scope([slim.batch_norm], is_training=is_training), \
|
||||
_set_arg_scope_defaults(conv_defs_defaults), \
|
||||
_set_arg_scope_defaults(conv_defs_overrides):
|
||||
# The current_stride variable keeps track of the output stride of the
|
||||
# activations, i.e., the running product of convolution strides up to the
|
||||
# current network layer. This allows us to invoke atrous convolution
|
||||
# whenever applying the next convolution would result in the activations
|
||||
# having output stride larger than the target output_stride.
|
||||
current_stride = 1
|
||||
|
||||
# The atrous convolution rate parameter.
|
||||
rate = 1
|
||||
|
||||
net = inputs
|
||||
# Insert default parameters before the base scope which includes
|
||||
# any custom overrides set in mobilenet.
|
||||
end_points = {}
|
||||
scopes = {}
|
||||
for i, opdef in enumerate(conv_defs['spec']):
|
||||
params = dict(opdef.params)
|
||||
opdef.multiplier_func(params, multiplier)
|
||||
stride = params.get('stride', 1)
|
||||
if output_stride is not None and current_stride == output_stride:
|
||||
# If we have reached the target output_stride, then we need to employ
|
||||
# atrous convolution with stride=1 and multiply the atrous rate by the
|
||||
# current unit's stride for use in subsequent layers.
|
||||
layer_stride = 1
|
||||
layer_rate = rate
|
||||
rate *= stride
|
||||
else:
|
||||
layer_stride = stride
|
||||
layer_rate = 1
|
||||
current_stride *= stride
|
||||
# Update params.
|
||||
params['stride'] = layer_stride
|
||||
# Only insert rate to params if rate > 1 and kernel size is not [1, 1].
|
||||
if layer_rate > 1:
|
||||
if tuple(params.get('kernel_size', [])) != (1, 1):
|
||||
# We will apply atrous rate in the following cases:
|
||||
# 1) When kernel_size is not in params, the operation then uses
|
||||
# default kernel size 3x3.
|
||||
# 2) When kernel_size is in params, and if the kernel_size is not
|
||||
# equal to (1, 1) (there is no need to apply atrous convolution to
|
||||
# any 1x1 convolution).
|
||||
params['rate'] = layer_rate
|
||||
# Set padding
|
||||
if use_explicit_padding:
|
||||
if 'kernel_size' in params:
|
||||
net = _fixed_padding(net, params['kernel_size'], layer_rate)
|
||||
else:
|
||||
params['use_explicit_padding'] = True
|
||||
|
||||
end_point = 'layer_%d' % (i + 1)
|
||||
try:
|
||||
net = opdef.op(net, **params)
|
||||
except Exception:
|
||||
print('Failed to create op %i: %r params: %r' % (i, opdef, params))
|
||||
raise
|
||||
end_points[end_point] = net
|
||||
scope = os.path.dirname(net.name)
|
||||
scopes[scope] = end_point
|
||||
if final_endpoint is not None and end_point == final_endpoint:
|
||||
break
|
||||
|
||||
# Add all tensors that end with 'output' to
|
||||
# endpoints
|
||||
for t in net.graph.get_operations():
|
||||
scope = os.path.dirname(t.name)
|
||||
bn = os.path.basename(t.name)
|
||||
if scope in scopes and t.name.endswith('output'):
|
||||
end_points[scopes[scope] + '/' + bn] = t.outputs[0]
|
||||
return net, end_points
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def _scope_all(scope, default_scope=None):
|
||||
with tf.compat.v1.variable_scope(scope, default_name=default_scope) as s,\
|
||||
tf.compat.v1.name_scope(s.original_name_scope):
|
||||
yield s
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet(inputs,
|
||||
num_classes=1001,
|
||||
prediction_fn=slim.softmax,
|
||||
reuse=None,
|
||||
scope='Mobilenet',
|
||||
base_only=False,
|
||||
**mobilenet_args):
|
||||
"""Mobilenet model for classification, supports both V1 and V2.
|
||||
|
||||
Note: default mode is inference, use mobilenet.training_scope to create
|
||||
training network.
|
||||
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
prediction_fn: a function to get predictions out of logits
|
||||
(default softmax).
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
base_only: if True will only create the base of the network (no pooling
|
||||
and no logits).
|
||||
**mobilenet_args: passed to mobilenet_base verbatim.
|
||||
- conv_defs: list of conv defs
|
||||
- multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
- output_stride: will ensure that the last layer has at most total stride.
|
||||
If the architecture calls for more stride than that provided
|
||||
(e.g. output_stride=16, but the architecture has 5 stride=2 operators),
|
||||
it will replace output_stride with fractional convolutions using Atrous
|
||||
Convolutions.
|
||||
|
||||
Returns:
|
||||
logits: the pre-softmax activations, a tensor of size
|
||||
[batch_size, num_classes]
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation tensor.
|
||||
|
||||
Raises:
|
||||
ValueError: Input rank is invalid.
|
||||
"""
|
||||
is_training = mobilenet_args.get('is_training', False)
|
||||
input_shape = inputs.get_shape().as_list()
|
||||
if len(input_shape) != 4:
|
||||
raise ValueError('Expected rank 4 input, was: %d' % len(input_shape))
|
||||
|
||||
with tf.compat.v1.variable_scope(scope, 'Mobilenet', reuse=reuse) as scope:
|
||||
inputs = tf.identity(inputs, 'input')
|
||||
net, end_points = mobilenet_base(inputs, scope=scope, **mobilenet_args)
|
||||
if base_only:
|
||||
return net, end_points
|
||||
|
||||
net = tf.identity(net, name='embedding')
|
||||
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
net = global_pool(net)
|
||||
end_points['global_pool'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
# net = slim.dropout(net, scope='Dropout', is_training=is_training)
|
||||
# 1 x 1 x num_classes
|
||||
# Note: legacy scope name.
|
||||
# logits = slim.conv2d(
|
||||
# net,
|
||||
# num_classes, [1, 1],
|
||||
# activation_fn=None,
|
||||
# normalizer_fn=None,
|
||||
# biases_initializer=tf.compat.v1.zeros_initializer(),
|
||||
# scope='Conv2d_1c_1x1')
|
||||
|
||||
# logits = tf.squeeze(logits, [1, 2])
|
||||
|
||||
# use slim.fully_connected instead
|
||||
net = tf.squeeze(net)
|
||||
net = slim.dropout(net, keep_prob=0.8, scope='Dropout', is_training=is_training)
|
||||
logits = slim.fully_connected(
|
||||
net,
|
||||
num_classes,
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
scope='FC'
|
||||
)
|
||||
#logits = tf.expand_dims(logits, axis=[])
|
||||
|
||||
logits = tf.identity(logits, name='output')
|
||||
end_points['Logits'] = logits
|
||||
if prediction_fn:
|
||||
end_points['Predictions'] = prediction_fn(logits, 'Predictions')
|
||||
return logits, end_points
|
||||
|
||||
|
||||
def global_pool(input_tensor, pool_op=tf.compat.v2.nn.avg_pool2d):
|
||||
"""Applies avg pool to produce 1x1 output.
|
||||
|
||||
NOTE: This function is funcitonally equivalenet to reduce_mean, but it has
|
||||
baked in average pool which has better support across hardware.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor
|
||||
pool_op: pooling op (avg pool is default)
|
||||
Returns:
|
||||
a tensor batch_size x 1 x 1 x depth.
|
||||
"""
|
||||
shape = input_tensor.get_shape().as_list()
|
||||
if shape[1] is None or shape[2] is None:
|
||||
kernel_size = tf.convert_to_tensor(value=[
|
||||
1,
|
||||
tf.shape(input=input_tensor)[1],
|
||||
tf.shape(input=input_tensor)[2], 1
|
||||
])
|
||||
else:
|
||||
kernel_size = [1, shape[1], shape[2], 1]
|
||||
output = pool_op(
|
||||
input_tensor, ksize=kernel_size, strides=[1, 1, 1, 1], padding='VALID')
|
||||
# Recover output shape, for unknown shape.
|
||||
output.set_shape([None, 1, 1, None])
|
||||
return output
|
||||
|
||||
|
||||
def training_scope(is_training=True,
|
||||
weight_decay=0.00004,
|
||||
stddev=0.09,
|
||||
dropout_keep_prob=0.8,
|
||||
bn_decay=0.997):
|
||||
"""Defines Mobilenet training scope.
|
||||
|
||||
Usage:
|
||||
with tf.contrib.slim.arg_scope(mobilenet.training_scope()):
|
||||
logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
|
||||
|
||||
# the network created will be trainble with dropout/batch norm
|
||||
# initialized appropriately.
|
||||
Args:
|
||||
is_training: if set to False this will ensure that all customizations are
|
||||
set to non-training mode. This might be helpful for code that is reused
|
||||
across both training/evaluation, but most of the time training_scope with
|
||||
value False is not needed. If this is set to None, the parameters is not
|
||||
added to the batch_norm arg_scope.
|
||||
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
stddev: Standard deviation for initialization, if negative uses xavier.
|
||||
dropout_keep_prob: dropout keep probability (not set if equals to None).
|
||||
bn_decay: decay for the batch norm moving averages (not set if equals to
|
||||
None).
|
||||
|
||||
Returns:
|
||||
An argument scope to use via arg_scope.
|
||||
"""
|
||||
# Note: do not introduce parameters that would change the inference
|
||||
# model here (for example whether to use bias), modify conv_def instead.
|
||||
batch_norm_params = {
|
||||
'decay': bn_decay,
|
||||
'is_training': is_training
|
||||
}
|
||||
#if stddev < 0:
|
||||
# weight_intitializer = slim.initializers.xavier_initializer()
|
||||
#else:
|
||||
# weight_intitializer = tf.compat.v1.truncated_normal_initializer(stddev=stddev)
|
||||
|
||||
# modified for NPU
|
||||
weight_2d = tf.initializers.variance_scaling(scale=2., mode="fan_out", distribution="untruncated_normal")
|
||||
weight_dw = tf.initializers.variance_scaling(scale=2., mode="fan_in", distribution="untruncated_normal")
|
||||
weight_pw = tf.initializers.variance_scaling(scale=2., mode="fan_out", distribution="untruncated_normal")
|
||||
weight_fc = tf.initializers.random_normal(stddev=0.01)
|
||||
|
||||
# Set weight_decay for weights in Conv and FC layers.
|
||||
with slim.arg_scope(
|
||||
#[slim.conv2d, slim.fully_connected, slim.separable_conv2d],
|
||||
[slim.conv2d],
|
||||
#weights_initializer=weight_intitializer,
|
||||
weights_initializer=weight_2d,
|
||||
normalizer_fn=slim.batch_norm), \
|
||||
slim.arg_scope([slim.fully_connected], weights_initializer=weight_fc, normalizer_fn=slim.batch_norm), \
|
||||
slim.arg_scope([slim.separable_conv2d], weights_initializer=weight_dw, pointwise_initializer=weight_pw, normalizer_fn=slim.batch_norm), \
|
||||
slim.arg_scope([mobilenet_base, mobilenet], is_training=is_training),\
|
||||
safe_arg_scope([slim.batch_norm], **batch_norm_params), \
|
||||
safe_arg_scope([slim.dropout], is_training=is_training,
|
||||
keep_prob=dropout_keep_prob), \
|
||||
slim.arg_scope([slim.conv2d], \
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay)), \
|
||||
slim.arg_scope([slim.separable_conv2d], weights_regularizer=None) as s:
|
||||
return s
|
||||
+445
File diff suppressed because one or more lines are too long
+249
@@ -0,0 +1,249 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Implementation of Mobilenet V2.
|
||||
|
||||
Architecture: https://arxiv.org/abs/1801.04381
|
||||
|
||||
The base model gives 72.2% accuracy on ImageNet, with 300MMadds,
|
||||
3.4 M parameters.
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import copy
|
||||
import functools
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets.mobilenet import conv_blocks as ops
|
||||
from nets.mobilenet import mobilenet as lib
|
||||
|
||||
slim = contrib_slim
|
||||
op = lib.op
|
||||
|
||||
expand_input = ops.expand_input_by_factor
|
||||
|
||||
# pyformat: disable
|
||||
# Architecture: https://arxiv.org/abs/1801.04381
|
||||
V2_DEF = dict(
|
||||
defaults={
|
||||
# Note: these parameters of batch norm affect the architecture
|
||||
# that's why they are here and not in training_scope.
|
||||
(slim.batch_norm,): {'center': True, 'scale': True},
|
||||
(slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
|
||||
'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6
|
||||
},
|
||||
(ops.expanded_conv,): {
|
||||
'expansion_size': expand_input(6),
|
||||
'split_expansion': 1,
|
||||
'normalizer_fn': slim.batch_norm,
|
||||
'residual': True
|
||||
},
|
||||
(slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'}
|
||||
},
|
||||
spec=[
|
||||
op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]),
|
||||
op(ops.expanded_conv,
|
||||
expansion_size=expand_input(1, divisible_by=1),
|
||||
num_outputs=16),
|
||||
op(ops.expanded_conv, stride=2, num_outputs=24),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=24),
|
||||
op(ops.expanded_conv, stride=2, num_outputs=32),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=32),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=32),
|
||||
op(ops.expanded_conv, stride=2, num_outputs=64),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=64),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=64),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=64),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=96),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=96),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=96),
|
||||
op(ops.expanded_conv, stride=2, num_outputs=160),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=160),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=160),
|
||||
op(ops.expanded_conv, stride=1, num_outputs=320),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280)
|
||||
],
|
||||
)
|
||||
# pyformat: enable
|
||||
|
||||
# Mobilenet v2 Definition with group normalization.
|
||||
V2_DEF_GROUP_NORM = copy.deepcopy(V2_DEF)
|
||||
V2_DEF_GROUP_NORM['defaults'] = {
|
||||
(contrib_slim.conv2d, contrib_slim.fully_connected,
|
||||
contrib_slim.separable_conv2d): {
|
||||
'normalizer_fn': contrib_layers.group_norm, # pylint: disable=C0330
|
||||
'activation_fn': tf.nn.relu6, # pylint: disable=C0330
|
||||
}, # pylint: disable=C0330
|
||||
(ops.expanded_conv,): {
|
||||
'expansion_size': ops.expand_input_by_factor(6),
|
||||
'split_expansion': 1,
|
||||
'normalizer_fn': contrib_layers.group_norm,
|
||||
'residual': True
|
||||
},
|
||||
(contrib_slim.conv2d, contrib_slim.separable_conv2d): {
|
||||
'padding': 'SAME'
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet(input_tensor,
|
||||
num_classes=1001,
|
||||
depth_multiplier=1.0,
|
||||
scope='MobilenetV2',
|
||||
conv_defs=None,
|
||||
finegrain_classification_mode=False,
|
||||
min_depth=None,
|
||||
divisible_by=None,
|
||||
activation_fn=None,
|
||||
**kwargs):
|
||||
"""Creates mobilenet V2 network.
|
||||
|
||||
Inference mode is created by default. To create training use training_scope
|
||||
below.
|
||||
|
||||
with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
|
||||
logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
|
||||
|
||||
Args:
|
||||
input_tensor: The input tensor
|
||||
num_classes: number of classes
|
||||
depth_multiplier: The multiplier applied to scale number of
|
||||
channels in each layer.
|
||||
scope: Scope of the operator
|
||||
conv_defs: Allows to override default conv def.
|
||||
finegrain_classification_mode: When set to True, the model
|
||||
will keep the last layer large even for small multipliers. Following
|
||||
https://arxiv.org/abs/1801.04381
|
||||
suggests that it improves performance for ImageNet-type of problems.
|
||||
*Note* ignored if final_endpoint makes the builder exit earlier.
|
||||
min_depth: If provided, will ensure that all layers will have that
|
||||
many channels after application of depth multiplier.
|
||||
divisible_by: If provided will ensure that all layers # channels
|
||||
will be divisible by this number.
|
||||
activation_fn: Activation function to use, defaults to tf.nn.relu6 if not
|
||||
specified.
|
||||
**kwargs: passed directly to mobilenet.mobilenet:
|
||||
prediction_fn- what prediction function to use.
|
||||
reuse-: whether to reuse variables (if reuse set to true, scope
|
||||
must be given).
|
||||
Returns:
|
||||
logits/endpoints pair
|
||||
|
||||
Raises:
|
||||
ValueError: On invalid arguments
|
||||
"""
|
||||
if conv_defs is None:
|
||||
conv_defs = V2_DEF
|
||||
if 'multiplier' in kwargs:
|
||||
raise ValueError('mobilenetv2 doesn\'t support generic '
|
||||
'multiplier parameter use "depth_multiplier" instead.')
|
||||
if finegrain_classification_mode:
|
||||
conv_defs = copy.deepcopy(conv_defs)
|
||||
if depth_multiplier < 1:
|
||||
conv_defs['spec'][-1].params['num_outputs'] /= depth_multiplier
|
||||
if activation_fn:
|
||||
conv_defs = copy.deepcopy(conv_defs)
|
||||
defaults = conv_defs['defaults']
|
||||
conv_defaults = (
|
||||
defaults[(slim.conv2d, slim.fully_connected, slim.separable_conv2d)])
|
||||
conv_defaults['activation_fn'] = activation_fn
|
||||
|
||||
depth_args = {}
|
||||
# NB: do not set depth_args unless they are provided to avoid overriding
|
||||
# whatever default depth_multiplier might have thanks to arg_scope.
|
||||
if min_depth is not None:
|
||||
depth_args['min_depth'] = min_depth
|
||||
if divisible_by is not None:
|
||||
depth_args['divisible_by'] = divisible_by
|
||||
|
||||
with slim.arg_scope((lib.depth_multiplier,), **depth_args):
|
||||
return lib.mobilenet(
|
||||
input_tensor,
|
||||
num_classes=num_classes,
|
||||
conv_defs=conv_defs,
|
||||
scope=scope,
|
||||
multiplier=depth_multiplier,
|
||||
**kwargs)
|
||||
|
||||
mobilenet.default_image_size = 224
|
||||
|
||||
|
||||
def wrapped_partial(func, *args, **kwargs):
|
||||
partial_func = functools.partial(func, *args, **kwargs)
|
||||
functools.update_wrapper(partial_func, func)
|
||||
return partial_func
|
||||
|
||||
|
||||
# Wrappers for mobilenet v2 with depth-multipliers. Be noticed that
|
||||
# 'finegrain_classification_mode' is set to True, which means the embedding
|
||||
# layer will not be shrinked when given a depth-multiplier < 1.0.
|
||||
mobilenet_v2_140 = wrapped_partial(mobilenet, depth_multiplier=1.4)
|
||||
mobilenet_v2_050 = wrapped_partial(mobilenet, depth_multiplier=0.50,
|
||||
finegrain_classification_mode=True)
|
||||
mobilenet_v2_035 = wrapped_partial(mobilenet, depth_multiplier=0.35,
|
||||
finegrain_classification_mode=True)
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs):
|
||||
"""Creates base of the mobilenet (no pooling and no logits) ."""
|
||||
return mobilenet(input_tensor,
|
||||
depth_multiplier=depth_multiplier,
|
||||
base_only=True, **kwargs)
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet_base_group_norm(input_tensor, depth_multiplier=1.0, **kwargs):
|
||||
"""Creates base of the mobilenet (no pooling and no logits) ."""
|
||||
kwargs['conv_defs'] = V2_DEF_GROUP_NORM
|
||||
kwargs['conv_defs']['defaults'].update({
|
||||
(contrib_layers.group_norm,): {
|
||||
'groups': kwargs.pop('groups', 8)
|
||||
}
|
||||
})
|
||||
return mobilenet(
|
||||
input_tensor, depth_multiplier=depth_multiplier, base_only=True, **kwargs)
|
||||
|
||||
|
||||
def training_scope(**kwargs):
|
||||
"""Defines MobilenetV2 training scope.
|
||||
|
||||
Usage:
|
||||
with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
|
||||
logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
|
||||
|
||||
with slim.
|
||||
|
||||
Args:
|
||||
**kwargs: Passed to mobilenet.training_scope. The following parameters
|
||||
are supported:
|
||||
weight_decay- The weight decay to use for regularizing the model.
|
||||
stddev- Standard deviation for initialization, if negative uses xavier.
|
||||
dropout_keep_prob- dropout keep probability
|
||||
bn_decay- decay for the batch norm moving averages.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the mobilenet v2 model.
|
||||
"""
|
||||
return lib.training_scope(**kwargs)
|
||||
|
||||
|
||||
__all__ = ['training_scope', 'mobilenet_base', 'mobilenet', 'V2_DEF']
|
||||
+219
@@ -0,0 +1,219 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for mobilenet_v2."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
import copy
|
||||
from six.moves import range
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
from nets.mobilenet import conv_blocks as ops
|
||||
from nets.mobilenet import mobilenet
|
||||
from nets.mobilenet import mobilenet_v2
|
||||
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def find_ops(optype):
|
||||
"""Find ops of a given type in graphdef or a graph.
|
||||
|
||||
Args:
|
||||
optype: operation type (e.g. Conv2D)
|
||||
Returns:
|
||||
List of operations.
|
||||
"""
|
||||
gd = tf.compat.v1.get_default_graph()
|
||||
return [var for var in gd.get_operations() if var.type == optype]
|
||||
|
||||
|
||||
class MobilenetV2Test(tf.test.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
|
||||
def testCreation(self):
|
||||
spec = dict(mobilenet_v2.V2_DEF)
|
||||
_, ep = mobilenet.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=spec)
|
||||
num_convs = len(find_ops('Conv2D'))
|
||||
|
||||
# This is mostly a sanity test. No deep reason for these particular
|
||||
# constants.
|
||||
#
|
||||
# All but first 2 and last one have two convolutions, and there is one
|
||||
# extra conv that is not in the spec. (logits)
|
||||
self.assertEqual(num_convs, len(spec['spec']) * 2 - 2)
|
||||
# Check that depthwise are exposed.
|
||||
for i in range(2, 17):
|
||||
self.assertIn('layer_%d/depthwise_output' % i, ep)
|
||||
|
||||
def testCreationNoClasses(self):
|
||||
spec = copy.deepcopy(mobilenet_v2.V2_DEF)
|
||||
net, ep = mobilenet.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=spec,
|
||||
num_classes=None)
|
||||
self.assertIs(net, ep['global_pool'])
|
||||
|
||||
def testImageSizes(self):
|
||||
for input_size, output_size in [(224, 7), (192, 6), (160, 5),
|
||||
(128, 4), (96, 3)]:
|
||||
tf.compat.v1.reset_default_graph()
|
||||
_, ep = mobilenet_v2.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, input_size, input_size, 3)))
|
||||
|
||||
self.assertEqual(ep['layer_18/output'].get_shape().as_list()[1:3],
|
||||
[output_size] * 2)
|
||||
|
||||
def testWithSplits(self):
|
||||
spec = copy.deepcopy(mobilenet_v2.V2_DEF)
|
||||
spec['overrides'] = {
|
||||
(ops.expanded_conv,): dict(split_expansion=2),
|
||||
}
|
||||
_, _ = mobilenet.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=spec)
|
||||
num_convs = len(find_ops('Conv2D'))
|
||||
# All but 3 op has 3 conv operatore, the remainign 3 have one
|
||||
# and there is one unaccounted.
|
||||
self.assertEqual(num_convs, len(spec['spec']) * 3 - 5)
|
||||
|
||||
def testWithOutputStride8(self):
|
||||
out, _ = mobilenet.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
output_stride=8,
|
||||
scope='MobilenetV2')
|
||||
self.assertEqual(out.get_shape().as_list()[1:3], [28, 28])
|
||||
|
||||
def testDivisibleBy(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
mobilenet_v2.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
divisible_by=16,
|
||||
min_depth=32)
|
||||
s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
|
||||
s = set(s)
|
||||
self.assertSameElements([32, 64, 96, 160, 192, 320, 384, 576, 960, 1280,
|
||||
1001], s)
|
||||
|
||||
def testDivisibleByWithArgScope(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
# Verifies that depth_multiplier arg scope actually works
|
||||
# if no default min_depth is provided.
|
||||
with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32):
|
||||
mobilenet_v2.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 2)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
depth_multiplier=0.1)
|
||||
s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
|
||||
s = set(s)
|
||||
self.assertSameElements(s, [32, 192, 128, 1001])
|
||||
|
||||
def testFineGrained(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
# Verifies that depth_multiplier arg scope actually works
|
||||
# if no default min_depth is provided.
|
||||
|
||||
mobilenet_v2.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 2)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
depth_multiplier=0.01,
|
||||
finegrain_classification_mode=True)
|
||||
s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
|
||||
s = set(s)
|
||||
# All convolutions will be 8->48, except for the last one.
|
||||
self.assertSameElements(s, [8, 48, 1001, 1280])
|
||||
|
||||
def testMobilenetBase(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
# Verifies that mobilenet_base returns pre-pooling layer.
|
||||
with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32):
|
||||
net, _ = mobilenet_v2.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
depth_multiplier=0.1)
|
||||
self.assertEqual(net.get_shape().as_list(), [10, 7, 7, 128])
|
||||
|
||||
def testWithOutputStride16(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
out, _ = mobilenet.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
output_stride=16)
|
||||
self.assertEqual(out.get_shape().as_list()[1:3], [14, 14])
|
||||
|
||||
def testMultiplier(self):
|
||||
op = mobilenet.op
|
||||
new_def = copy.deepcopy(mobilenet_v2.V2_DEF)
|
||||
|
||||
def inverse_multiplier(output_params, multiplier):
|
||||
output_params['num_outputs'] = int(
|
||||
output_params['num_outputs'] / multiplier)
|
||||
|
||||
new_def['spec'][0] = op(
|
||||
slim.conv2d,
|
||||
kernel_size=(3, 3),
|
||||
multiplier_func=inverse_multiplier,
|
||||
num_outputs=16)
|
||||
_ = mobilenet_v2.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=new_def,
|
||||
depth_multiplier=0.1)
|
||||
s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
|
||||
# Expect first layer to be 160 (16 / 0.1), and other layers
|
||||
# their max(original size * 0.1, 8)
|
||||
self.assertEqual([160, 8, 48, 8, 48], s[:5])
|
||||
|
||||
def testWithOutputStride8AndExplicitPadding(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
out, _ = mobilenet.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
output_stride=8,
|
||||
use_explicit_padding=True,
|
||||
scope='MobilenetV2')
|
||||
self.assertEqual(out.get_shape().as_list()[1:3], [28, 28])
|
||||
|
||||
def testWithOutputStride16AndExplicitPadding(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
out, _ = mobilenet.mobilenet_base(
|
||||
tf.compat.v1.placeholder(tf.float32, (10, 224, 224, 16)),
|
||||
conv_defs=mobilenet_v2.V2_DEF,
|
||||
output_stride=16,
|
||||
use_explicit_padding=True)
|
||||
self.assertEqual(out.get_shape().as_list()[1:3], [14, 14])
|
||||
|
||||
def testBatchNormScopeDoesNotHaveIsTrainingWhenItsSetToNone(self):
|
||||
sc = mobilenet.training_scope(is_training=None)
|
||||
self.assertNotIn('is_training', sc[slim.arg_scope_func_key(
|
||||
slim.batch_norm)])
|
||||
|
||||
def testBatchNormScopeDoesHasIsTrainingWhenItsNotNone(self):
|
||||
sc = mobilenet.training_scope(is_training=False)
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
sc = mobilenet.training_scope(is_training=True)
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
sc = mobilenet.training_scope()
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+405
@@ -0,0 +1,405 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Mobilenet V3 conv defs and helper functions."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import copy
|
||||
import functools
|
||||
import numpy as np
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets.mobilenet import conv_blocks as ops
|
||||
from nets.mobilenet import mobilenet as lib
|
||||
|
||||
slim = contrib_slim
|
||||
op = lib.op
|
||||
expand_input = ops.expand_input_by_factor
|
||||
|
||||
# Squeeze Excite with all parameters filled-in, we use hard-sigmoid
|
||||
# for gating function and relu for inner activation function.
|
||||
squeeze_excite = functools.partial(
|
||||
ops.squeeze_excite, squeeze_factor=4,
|
||||
inner_activation_fn=tf.nn.relu,
|
||||
gating_fn=lambda x: tf.nn.relu6(x+3)*0.16667)
|
||||
|
||||
# Wrap squeeze excite op as expansion_transform that takes
|
||||
# both expansion and input tensor.
|
||||
_se4 = lambda expansion_tensor, input_tensor: squeeze_excite(expansion_tensor)
|
||||
|
||||
|
||||
def hard_swish(x):
|
||||
with tf.compat.v1.name_scope('hard_swish'):
|
||||
return x * tf.nn.relu6(x + np.float32(3)) * np.float32(1. / 6.)
|
||||
|
||||
|
||||
def reduce_to_1x1(input_tensor, default_size=7, **kwargs):
|
||||
h, w = input_tensor.shape.as_list()[1:3]
|
||||
if h is not None and w == h:
|
||||
k = [h, h]
|
||||
else:
|
||||
k = [default_size, default_size]
|
||||
return slim.avg_pool2d(input_tensor, kernel_size=k, **kwargs)
|
||||
|
||||
|
||||
def mbv3_op(ef, n, k, s=1, act=tf.nn.relu, se=None, **kwargs):
|
||||
"""Defines a single Mobilenet V3 convolution block.
|
||||
|
||||
Args:
|
||||
ef: expansion factor
|
||||
n: number of output channels
|
||||
k: stride of depthwise
|
||||
s: stride
|
||||
act: activation function in inner layers
|
||||
se: squeeze excite function.
|
||||
**kwargs: passed to expanded_conv
|
||||
|
||||
Returns:
|
||||
An object (lib._Op) for inserting in conv_def, representing this operation.
|
||||
"""
|
||||
return op(
|
||||
ops.expanded_conv,
|
||||
expansion_size=expand_input(ef),
|
||||
kernel_size=(k, k),
|
||||
stride=s,
|
||||
num_outputs=n,
|
||||
inner_activation_fn=act,
|
||||
expansion_transform=se,
|
||||
**kwargs)
|
||||
|
||||
|
||||
def mbv3_fused(ef, n, k, s=1, **kwargs):
|
||||
"""Defines a single Mobilenet V3 convolution block.
|
||||
|
||||
Args:
|
||||
ef: expansion factor
|
||||
n: number of output channels
|
||||
k: stride of depthwise
|
||||
s: stride
|
||||
**kwargs: will be passed to mbv3_op
|
||||
|
||||
Returns:
|
||||
An object (lib._Op) for inserting in conv_def, representing this operation.
|
||||
"""
|
||||
expansion_fn = functools.partial(slim.conv2d, kernel_size=k, stride=s)
|
||||
return mbv3_op(
|
||||
ef,
|
||||
n,
|
||||
k=1,
|
||||
s=s,
|
||||
depthwise_location=None,
|
||||
expansion_fn=expansion_fn,
|
||||
**kwargs)
|
||||
|
||||
|
||||
mbv3_op_se = functools.partial(mbv3_op, se=_se4)
|
||||
|
||||
|
||||
DEFAULTS = {
|
||||
(ops.expanded_conv,):
|
||||
dict(
|
||||
normalizer_fn=slim.batch_norm,
|
||||
residual=True),
|
||||
(slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
|
||||
'normalizer_fn': slim.batch_norm,
|
||||
'activation_fn': tf.nn.relu,
|
||||
},
|
||||
(slim.batch_norm,): {
|
||||
'center': True,
|
||||
'scale': True
|
||||
},
|
||||
}
|
||||
|
||||
# Compatible checkpoint: http://mldash/5511169891790690458#scalars
|
||||
V3_LARGE = dict(
|
||||
defaults=dict(DEFAULTS),
|
||||
spec=([
|
||||
# stage 1
|
||||
op(slim.conv2d, stride=2, num_outputs=16, kernel_size=(3, 3),
|
||||
activation_fn=hard_swish),
|
||||
mbv3_op(ef=1, n=16, k=3),
|
||||
mbv3_op(ef=4, n=24, k=3, s=2),
|
||||
mbv3_op(ef=3, n=24, k=3, s=1),
|
||||
mbv3_op_se(ef=3, n=40, k=5, s=2),
|
||||
mbv3_op_se(ef=3, n=40, k=5, s=1),
|
||||
mbv3_op_se(ef=3, n=40, k=5, s=1),
|
||||
mbv3_op(ef=6, n=80, k=3, s=2, act=hard_swish),
|
||||
mbv3_op(ef=2.5, n=80, k=3, s=1, act=hard_swish),
|
||||
mbv3_op(ef=184/80., n=80, k=3, s=1, act=hard_swish),
|
||||
mbv3_op(ef=184/80., n=80, k=3, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=112, k=3, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=112, k=3, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=160, k=5, s=2, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=160, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=160, k=5, s=1, act=hard_swish),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=960,
|
||||
activation_fn=hard_swish),
|
||||
op(reduce_to_1x1, default_size=7, stride=1, padding='VALID'),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280,
|
||||
normalizer_fn=None, activation_fn=hard_swish)
|
||||
]))
|
||||
|
||||
# 72.2% accuracy.
|
||||
V3_LARGE_MINIMALISTIC = dict(
|
||||
defaults=dict(DEFAULTS),
|
||||
spec=([
|
||||
# stage 1
|
||||
op(slim.conv2d, stride=2, num_outputs=16, kernel_size=(3, 3)),
|
||||
mbv3_op(ef=1, n=16, k=3),
|
||||
mbv3_op(ef=4, n=24, k=3, s=2),
|
||||
mbv3_op(ef=3, n=24, k=3, s=1),
|
||||
mbv3_op(ef=3, n=40, k=3, s=2),
|
||||
mbv3_op(ef=3, n=40, k=3, s=1),
|
||||
mbv3_op(ef=3, n=40, k=3, s=1),
|
||||
mbv3_op(ef=6, n=80, k=3, s=2),
|
||||
mbv3_op(ef=2.5, n=80, k=3, s=1),
|
||||
mbv3_op(ef=184 / 80., n=80, k=3, s=1),
|
||||
mbv3_op(ef=184 / 80., n=80, k=3, s=1),
|
||||
mbv3_op(ef=6, n=112, k=3, s=1),
|
||||
mbv3_op(ef=6, n=112, k=3, s=1),
|
||||
mbv3_op(ef=6, n=160, k=3, s=2),
|
||||
mbv3_op(ef=6, n=160, k=3, s=1),
|
||||
mbv3_op(ef=6, n=160, k=3, s=1),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=960),
|
||||
op(reduce_to_1x1, default_size=7, stride=1, padding='VALID'),
|
||||
op(slim.conv2d,
|
||||
stride=1,
|
||||
kernel_size=[1, 1],
|
||||
num_outputs=1280,
|
||||
normalizer_fn=None)
|
||||
]))
|
||||
|
||||
# Compatible run: http://mldash/2023283040014348118#scalars
|
||||
V3_SMALL = dict(
|
||||
defaults=dict(DEFAULTS),
|
||||
spec=([
|
||||
# stage 1
|
||||
op(slim.conv2d, stride=2, num_outputs=16, kernel_size=(3, 3),
|
||||
activation_fn=hard_swish),
|
||||
mbv3_op_se(ef=1, n=16, k=3, s=2),
|
||||
mbv3_op(ef=72./16, n=24, k=3, s=2),
|
||||
mbv3_op(ef=(88./24), n=24, k=3, s=1),
|
||||
mbv3_op_se(ef=4, n=40, k=5, s=2, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=40, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=40, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=3, n=48, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=3, n=48, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=96, k=5, s=2, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=96, k=5, s=1, act=hard_swish),
|
||||
mbv3_op_se(ef=6, n=96, k=5, s=1, act=hard_swish),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=576,
|
||||
activation_fn=hard_swish),
|
||||
op(reduce_to_1x1, default_size=7, stride=1, padding='VALID'),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1024,
|
||||
normalizer_fn=None, activation_fn=hard_swish)
|
||||
]))
|
||||
|
||||
# 62% accuracy.
|
||||
V3_SMALL_MINIMALISTIC = dict(
|
||||
defaults=dict(DEFAULTS),
|
||||
spec=([
|
||||
# stage 1
|
||||
op(slim.conv2d, stride=2, num_outputs=16, kernel_size=(3, 3)),
|
||||
mbv3_op(ef=1, n=16, k=3, s=2),
|
||||
mbv3_op(ef=72. / 16, n=24, k=3, s=2),
|
||||
mbv3_op(ef=(88. / 24), n=24, k=3, s=1),
|
||||
mbv3_op(ef=4, n=40, k=3, s=2),
|
||||
mbv3_op(ef=6, n=40, k=3, s=1),
|
||||
mbv3_op(ef=6, n=40, k=3, s=1),
|
||||
mbv3_op(ef=3, n=48, k=3, s=1),
|
||||
mbv3_op(ef=3, n=48, k=3, s=1),
|
||||
mbv3_op(ef=6, n=96, k=3, s=2),
|
||||
mbv3_op(ef=6, n=96, k=3, s=1),
|
||||
mbv3_op(ef=6, n=96, k=3, s=1),
|
||||
op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=576),
|
||||
op(reduce_to_1x1, default_size=7, stride=1, padding='VALID'),
|
||||
op(slim.conv2d,
|
||||
stride=1,
|
||||
kernel_size=[1, 1],
|
||||
num_outputs=1024,
|
||||
normalizer_fn=None)
|
||||
]))
|
||||
|
||||
|
||||
# EdgeTPU friendly variant of MobilenetV3 that uses fused convolutions
|
||||
# instead of depthwise in the early layers.
|
||||
V3_EDGETPU = dict(
|
||||
defaults=dict(DEFAULTS),
|
||||
spec=[
|
||||
op(slim.conv2d, stride=2, num_outputs=32, kernel_size=(3, 3)),
|
||||
mbv3_fused(k=3, s=1, ef=1, n=16),
|
||||
mbv3_fused(k=3, s=2, ef=8, n=32),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=32),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=32),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=32),
|
||||
mbv3_fused(k=3, s=2, ef=8, n=48),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=48),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=48),
|
||||
mbv3_fused(k=3, s=1, ef=4, n=48),
|
||||
mbv3_op(k=3, s=2, ef=8, n=96),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=3, s=1, ef=8, n=96, residual=False),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=3, s=1, ef=4, n=96),
|
||||
mbv3_op(k=5, s=2, ef=8, n=160),
|
||||
mbv3_op(k=5, s=1, ef=4, n=160),
|
||||
mbv3_op(k=5, s=1, ef=4, n=160),
|
||||
mbv3_op(k=5, s=1, ef=4, n=160),
|
||||
mbv3_op(k=3, s=1, ef=8, n=192),
|
||||
op(slim.conv2d, stride=1, num_outputs=1280, kernel_size=(1, 1)),
|
||||
])
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet(input_tensor,
|
||||
num_classes=1001,
|
||||
depth_multiplier=1.0,
|
||||
scope='MobilenetV3',
|
||||
conv_defs=None,
|
||||
finegrain_classification_mode=False,
|
||||
**kwargs):
|
||||
"""Creates mobilenet V3 network.
|
||||
|
||||
Inference mode is created by default. To create training use training_scope
|
||||
below.
|
||||
|
||||
with tf.contrib.slim.arg_scope(mobilenet_v3.training_scope()):
|
||||
logits, endpoints = mobilenet_v3.mobilenet(input_tensor)
|
||||
|
||||
Args:
|
||||
input_tensor: The input tensor
|
||||
num_classes: number of classes
|
||||
depth_multiplier: The multiplier applied to scale number of
|
||||
channels in each layer.
|
||||
scope: Scope of the operator
|
||||
conv_defs: Which version to create. Could be large/small or
|
||||
any conv_def (see mobilenet_v3.py for examples).
|
||||
finegrain_classification_mode: When set to True, the model
|
||||
will keep the last layer large even for small multipliers. Following
|
||||
https://arxiv.org/abs/1801.04381
|
||||
it improves performance for ImageNet-type of problems.
|
||||
*Note* ignored if final_endpoint makes the builder exit earlier.
|
||||
**kwargs: passed directly to mobilenet.mobilenet:
|
||||
prediction_fn- what prediction function to use.
|
||||
reuse-: whether to reuse variables (if reuse set to true, scope
|
||||
must be given).
|
||||
Returns:
|
||||
logits/endpoints pair
|
||||
|
||||
Raises:
|
||||
ValueError: On invalid arguments
|
||||
"""
|
||||
if conv_defs is None:
|
||||
conv_defs = V3_LARGE
|
||||
if 'multiplier' in kwargs:
|
||||
raise ValueError('mobilenetv2 doesn\'t support generic '
|
||||
'multiplier parameter use "depth_multiplier" instead.')
|
||||
if finegrain_classification_mode:
|
||||
conv_defs = copy.deepcopy(conv_defs)
|
||||
conv_defs['spec'][-1] = conv_defs['spec'][-1]._replace(
|
||||
multiplier_func=lambda params, multiplier: params)
|
||||
depth_args = {}
|
||||
with slim.arg_scope((lib.depth_multiplier,), **depth_args):
|
||||
return lib.mobilenet(
|
||||
input_tensor,
|
||||
num_classes=num_classes,
|
||||
conv_defs=conv_defs,
|
||||
scope=scope,
|
||||
multiplier=depth_multiplier,
|
||||
**kwargs)
|
||||
|
||||
mobilenet.default_image_size = 224
|
||||
training_scope = lib.training_scope
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs):
|
||||
"""Creates base of the mobilenet (no pooling and no logits) ."""
|
||||
return mobilenet(
|
||||
input_tensor, depth_multiplier=depth_multiplier, base_only=True, **kwargs)
|
||||
|
||||
|
||||
def wrapped_partial(func, new_defaults=None,
|
||||
**kwargs):
|
||||
"""Partial function with new default parameters and updated docstring."""
|
||||
if not new_defaults:
|
||||
new_defaults = {}
|
||||
def func_wrapper(*f_args, **f_kwargs):
|
||||
new_kwargs = dict(new_defaults)
|
||||
new_kwargs.update(f_kwargs)
|
||||
return func(*f_args, **new_kwargs)
|
||||
functools.update_wrapper(func_wrapper, func)
|
||||
partial_func = functools.partial(func_wrapper, **kwargs)
|
||||
functools.update_wrapper(partial_func, func)
|
||||
return partial_func
|
||||
|
||||
|
||||
large = wrapped_partial(mobilenet, conv_defs=V3_LARGE)
|
||||
small = wrapped_partial(mobilenet, conv_defs=V3_SMALL)
|
||||
edge_tpu = wrapped_partial(mobilenet,
|
||||
new_defaults={'scope': 'MobilenetEdgeTPU'},
|
||||
conv_defs=V3_EDGETPU)
|
||||
edge_tpu_075 = wrapped_partial(
|
||||
mobilenet,
|
||||
new_defaults={'scope': 'MobilenetEdgeTPU'},
|
||||
conv_defs=V3_EDGETPU,
|
||||
depth_multiplier=0.75,
|
||||
finegrain_classification_mode=True)
|
||||
|
||||
# Minimalistic model that does not have Squeeze Excite blocks,
|
||||
# Hardswish, or 5x5 depthwise convolution.
|
||||
# This makes the model very friendly for a wide range of hardware
|
||||
large_minimalistic = wrapped_partial(mobilenet, conv_defs=V3_LARGE_MINIMALISTIC)
|
||||
small_minimalistic = wrapped_partial(mobilenet, conv_defs=V3_SMALL_MINIMALISTIC)
|
||||
|
||||
|
||||
def _reduce_consecutive_layers(conv_defs, start_id, end_id, multiplier=0.5):
|
||||
"""Reduce the outputs of consecutive layers with multiplier.
|
||||
|
||||
Args:
|
||||
conv_defs: Mobilenet conv_defs.
|
||||
start_id: 0-based index of the starting conv_def to be reduced.
|
||||
end_id: 0-based index of the last conv_def to be reduced.
|
||||
multiplier: The multiplier by which to reduce the conv_defs.
|
||||
|
||||
Returns:
|
||||
Mobilenet conv_defs where the output sizes from layers [start_id, end_id],
|
||||
inclusive, are reduced by multiplier.
|
||||
|
||||
Raises:
|
||||
ValueError if any layer to be reduced does not have the 'num_outputs'
|
||||
attribute.
|
||||
"""
|
||||
defs = copy.deepcopy(conv_defs)
|
||||
for d in defs['spec'][start_id:end_id+1]:
|
||||
d.params.update({
|
||||
'num_outputs': np.int(np.round(d.params['num_outputs'] * multiplier))
|
||||
})
|
||||
return defs
|
||||
|
||||
|
||||
V3_LARGE_DETECTION = _reduce_consecutive_layers(V3_LARGE, 13, 16)
|
||||
V3_SMALL_DETECTION = _reduce_consecutive_layers(V3_SMALL, 9, 12)
|
||||
|
||||
|
||||
__all__ = ['training_scope', 'mobilenet', 'V3_LARGE', 'V3_SMALL', 'large',
|
||||
'small', 'V3_LARGE_DETECTION', 'V3_SMALL_DETECTION']
|
||||
+82
@@ -0,0 +1,82 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for google3.third_party.tensorflow_models.slim.nets.mobilenet.mobilenet_v3."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from absl.testing import absltest
|
||||
import tensorflow as tf
|
||||
|
||||
from nets.mobilenet import mobilenet_v3
|
||||
|
||||
|
||||
class MobilenetV3Test(absltest.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
super(MobilenetV3Test, self).setUp()
|
||||
tf.compat.v1.reset_default_graph()
|
||||
|
||||
def testMobilenetV3Large(self):
|
||||
logits, endpoints = mobilenet_v3.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (1, 224, 224, 3)))
|
||||
self.assertEqual(endpoints['layer_19'].shape, [1, 1, 1, 1280])
|
||||
self.assertEqual(logits.shape, [1, 1001])
|
||||
|
||||
def testMobilenetV3Small(self):
|
||||
_, endpoints = mobilenet_v3.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (1, 224, 224, 3)),
|
||||
conv_defs=mobilenet_v3.V3_SMALL)
|
||||
self.assertEqual(endpoints['layer_15'].shape, [1, 1, 1, 1024])
|
||||
|
||||
def testMobilenetEdgeTpu(self):
|
||||
_, endpoints = mobilenet_v3.edge_tpu(
|
||||
tf.compat.v1.placeholder(tf.float32, (1, 224, 224, 3)))
|
||||
self.assertIn('Inference mode is created by default',
|
||||
mobilenet_v3.edge_tpu.__doc__)
|
||||
self.assertEqual(endpoints['layer_24'].shape, [1, 7, 7, 1280])
|
||||
self.assertStartsWith(
|
||||
endpoints['layer_24'].name, 'MobilenetEdgeTPU')
|
||||
|
||||
def testMobilenetEdgeTpuChangeScope(self):
|
||||
_, endpoints = mobilenet_v3.edge_tpu(
|
||||
tf.compat.v1.placeholder(tf.float32, (1, 224, 224, 3)), scope='Scope')
|
||||
self.assertStartsWith(
|
||||
endpoints['layer_24'].name, 'Scope')
|
||||
|
||||
def testMobilenetV3BaseOnly(self):
|
||||
result, endpoints = mobilenet_v3.mobilenet(
|
||||
tf.compat.v1.placeholder(tf.float32, (1, 224, 224, 3)),
|
||||
conv_defs=mobilenet_v3.V3_LARGE,
|
||||
base_only=True,
|
||||
final_endpoint='layer_17')
|
||||
# Get the latest layer before average pool.
|
||||
self.assertEqual(endpoints['layer_17'].shape, [1, 7, 7, 960])
|
||||
self.assertEqual(result, endpoints['layer_17'])
|
||||
|
||||
def testMobilenetV3BaseOnly_VariableInput(self):
|
||||
result, endpoints = mobilenet_v3.mobilenet(
|
||||
tf.placeholder(tf.float32, (None, None, None, 3)),
|
||||
conv_defs=mobilenet_v3.V3_LARGE,
|
||||
base_only=True,
|
||||
final_endpoint='layer_17')
|
||||
# Get the latest layer before average pool.
|
||||
self.assertEqual(endpoints['layer_17'].shape.as_list(),
|
||||
[None, None, None, 960])
|
||||
self.assertEqual(result, endpoints['layer_17'])
|
||||
|
||||
if __name__ == '__main__':
|
||||
absltest.main()
|
||||
+136
@@ -0,0 +1,136 @@
|
||||
# MobilenetV2 and above
|
||||
For MobilenetV2+ see this file [mobilenet/README.md](mobilenet/README.md)
|
||||
|
||||
# MobileNetV1
|
||||
|
||||
[MobileNets](https://arxiv.org/abs/1704.04861) are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices with [TensorFlow Mobile](https://www.tensorflow.org/mobile/).
|
||||
|
||||
MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature.
|
||||
|
||||

|
||||
|
||||
# Pre-trained Models
|
||||
|
||||
Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. These MobileNet models have been trained on the
|
||||
[ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/)
|
||||
image classification dataset. Accuracies were computed by evaluating using a single image crop.
|
||||
|
||||
Model | Million MACs | Million Parameters | Top-1 Accuracy| Top-5 Accuracy |
|
||||
:----:|:------------:|:----------:|:-------:|:-------:|
|
||||
[MobileNet_v1_1.0_224](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224.tgz)|569|4.24|70.9|89.9|
|
||||
[MobileNet_v1_1.0_192](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_192.tgz)|418|4.24|70.0|89.2|
|
||||
[MobileNet_v1_1.0_160](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_160.tgz)|291|4.24|68.0|87.7|
|
||||
[MobileNet_v1_1.0_128](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_128.tgz)|186|4.24|65.2|85.8|
|
||||
[MobileNet_v1_0.75_224](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_224.tgz)|317|2.59|68.4|88.2|
|
||||
[MobileNet_v1_0.75_192](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_192.tgz)|233|2.59|67.2|87.3|
|
||||
[MobileNet_v1_0.75_160](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_160.tgz)|162|2.59|65.3|86.0|
|
||||
[MobileNet_v1_0.75_128](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_128.tgz)|104|2.59|62.1|83.9|
|
||||
[MobileNet_v1_0.50_224](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_224.tgz)|150|1.34|63.3|84.9|
|
||||
[MobileNet_v1_0.50_192](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_192.tgz)|110|1.34|61.7|83.6|
|
||||
[MobileNet_v1_0.50_160](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_160.tgz)|77|1.34|59.1|81.9|
|
||||
[MobileNet_v1_0.50_128](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_128.tgz)|49|1.34|56.3|79.4|
|
||||
[MobileNet_v1_0.25_224](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_224.tgz)|41|0.47|49.8|74.2|
|
||||
[MobileNet_v1_0.25_192](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_192.tgz)|34|0.47|47.7|72.3|
|
||||
[MobileNet_v1_0.25_160](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_160.tgz)|21|0.47|45.5|70.3|
|
||||
[MobileNet_v1_0.25_128](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_128.tgz)|14|0.47|41.5|66.3|
|
||||
[MobileNet_v1_1.0_224_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz)|569|4.24|70.1|88.9|
|
||||
[MobileNet_v1_1.0_192_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_192_quant.tgz)|418|4.24|69.2|88.3|
|
||||
[MobileNet_v1_1.0_160_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_160_quant.tgz)|291|4.24|67.2|86.7|
|
||||
[MobileNet_v1_1.0_128_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_128_quant.tgz)|186|4.24|63.4|84.2|
|
||||
[MobileNet_v1_0.75_224_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_224_quant.tgz)|317|2.59|66.8|87.0|
|
||||
[MobileNet_v1_0.75_192_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_192_quant.tgz)|233|2.59|66.1|86.4|
|
||||
[MobileNet_v1_0.75_160_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_160_quant.tgz)|162|2.59|62.3|83.8|
|
||||
[MobileNet_v1_0.75_128_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_128_quant.tgz)|104|2.59|55.8|78.8|
|
||||
[MobileNet_v1_0.50_224_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_224_quant.tgz)|150|1.34|60.7|83.2|
|
||||
[MobileNet_v1_0.50_192_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_192_quant.tgz)|110|1.34|60.0|82.2|
|
||||
[MobileNet_v1_0.50_160_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_160_quant.tgz)|77|1.34|57.7|80.4|
|
||||
[MobileNet_v1_0.50_128_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_128_quant.tgz)|49|1.34|54.5|77.7|
|
||||
[MobileNet_v1_0.25_224_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_224_quant.tgz)|41|0.47|48.0|72.8|
|
||||
[MobileNet_v1_0.25_192_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_192_quant.tgz)|34|0.47|46.0|71.2|
|
||||
[MobileNet_v1_0.25_160_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_160_quant.tgz)|21|0.47|43.4|68.5|
|
||||
[MobileNet_v1_0.25_128_quant](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_128_quant.tgz)|14|0.47|39.5|64.4|
|
||||
|
||||
Revisions to models:
|
||||
* July 12, 2018: Update to TFLite models that fixes an accuracy issue resolved by making conversion support weights with narrow_range. We now report validation on the actual TensorFlow Lite model rather than the emulated quantization number of TensorFlow.
|
||||
* August 2, 2018: Update to TFLite models that fixes an accuracy issue resolved by making sure the numerics of quantization match TF quantized training accurately.
|
||||
|
||||
The linked model tar files contain the following:
|
||||
* Trained model checkpoints
|
||||
* Eval graph text protos (to be easily viewed)
|
||||
* Frozen trained models
|
||||
* Info file containing input and output information
|
||||
* Converted [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/) flatbuffer model
|
||||
|
||||
Note that quantized model GraphDefs are still float models, they just have FakeQuantization
|
||||
operation embedded to simulate quantization. These are converted by [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/)
|
||||
to be fully quantized. The final effect of quantization can be seen by comparing the frozen fake
|
||||
quantized graph to the size of the TFLite flatbuffer, i.e. The TFLite flatbuffer is about 1/4
|
||||
the size.
|
||||
For more information on the quantization techniques used here, see
|
||||
[here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize).
|
||||
|
||||
Here is an example of how to download the MobileNet_v1_1.0_224 checkpoint:
|
||||
|
||||
```shell
|
||||
$ CHECKPOINT_DIR=/tmp/checkpoints
|
||||
$ mkdir ${CHECKPOINT_DIR}
|
||||
$ wget http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz
|
||||
$ tar -xvf mobilenet_v1_1.0_224.tgz
|
||||
$ mv mobilenet_v1_1.0_224.ckpt.* ${CHECKPOINT_DIR}
|
||||
```
|
||||
|
||||
# MobileNet V1 scripts
|
||||
|
||||
This package contains scripts for training floating point and eight-bit fixed
|
||||
point TensorFlow models.
|
||||
|
||||
Quantization tools used are described in [contrib/quantize](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize).
|
||||
|
||||
Conversion to fully quantized models for mobile can be done through [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/).
|
||||
|
||||
## Usage
|
||||
|
||||
### Build for GPU
|
||||
|
||||
```
|
||||
$ bazel build -c opt --config=cuda mobilenet_v1_{eval,train}
|
||||
```
|
||||
|
||||
### Running
|
||||
|
||||
#### Float Training and Eval
|
||||
|
||||
Train:
|
||||
|
||||
```
|
||||
$ ./bazel-bin/mobilenet_v1_train --dataset_dir "path/to/dataset" --checkpoint_dir "path/to/checkpoints"
|
||||
```
|
||||
|
||||
Eval:
|
||||
|
||||
```
|
||||
$ ./bazel-bin/mobilenet_v1_eval --dataset_dir "path/to/dataset" --checkpoint_dir "path/to/checkpoints"
|
||||
```
|
||||
|
||||
#### Quantized Training and Eval
|
||||
|
||||
Train from preexisting float checkpoint:
|
||||
|
||||
```
|
||||
$ ./bazel-bin/mobilenet_v1_train --dataset_dir "path/to/dataset" --checkpoint_dir "path/to/checkpoints" \
|
||||
--quantize=True --fine_tune_checkpoint=float/checkpoint/path
|
||||
```
|
||||
|
||||
Train from scratch:
|
||||
|
||||
```
|
||||
$ ./bazel-bin/mobilenet_v1_train --dataset_dir "path/to/dataset" --checkpoint_dir "path/to/checkpoints" --quantize=True
|
||||
```
|
||||
|
||||
Eval:
|
||||
|
||||
```
|
||||
$ ./bazel-bin/mobilenet_v1_eval --dataset_dir "path/to/dataset" --checkpoint_dir "path/to/checkpoints" --quantize=True
|
||||
```
|
||||
|
||||
The resulting float and quantized models can be run on-device via [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/).
|
||||
+482
@@ -0,0 +1,482 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# =============================================================================
|
||||
"""MobileNet v1.
|
||||
|
||||
MobileNet is a general architecture and can be used for multiple use cases.
|
||||
Depending on the use case, it can use different input layer size and different
|
||||
head (for example: embeddings, localization and classification).
|
||||
|
||||
As described in https://arxiv.org/abs/1704.04861.
|
||||
|
||||
MobileNets: Efficient Convolutional Neural Networks for
|
||||
Mobile Vision Applications
|
||||
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang,
|
||||
Tobias Weyand, Marco Andreetto, Hartwig Adam
|
||||
|
||||
100% Mobilenet V1 (base) with input size 224x224:
|
||||
|
||||
See mobilenet_v1()
|
||||
|
||||
Layer params macs
|
||||
--------------------------------------------------------------------------------
|
||||
MobilenetV1/Conv2d_0/Conv2D: 864 10,838,016
|
||||
MobilenetV1/Conv2d_1_depthwise/depthwise: 288 3,612,672
|
||||
MobilenetV1/Conv2d_1_pointwise/Conv2D: 2,048 25,690,112
|
||||
MobilenetV1/Conv2d_2_depthwise/depthwise: 576 1,806,336
|
||||
MobilenetV1/Conv2d_2_pointwise/Conv2D: 8,192 25,690,112
|
||||
MobilenetV1/Conv2d_3_depthwise/depthwise: 1,152 3,612,672
|
||||
MobilenetV1/Conv2d_3_pointwise/Conv2D: 16,384 51,380,224
|
||||
MobilenetV1/Conv2d_4_depthwise/depthwise: 1,152 903,168
|
||||
MobilenetV1/Conv2d_4_pointwise/Conv2D: 32,768 25,690,112
|
||||
MobilenetV1/Conv2d_5_depthwise/depthwise: 2,304 1,806,336
|
||||
MobilenetV1/Conv2d_5_pointwise/Conv2D: 65,536 51,380,224
|
||||
MobilenetV1/Conv2d_6_depthwise/depthwise: 2,304 451,584
|
||||
MobilenetV1/Conv2d_6_pointwise/Conv2D: 131,072 25,690,112
|
||||
MobilenetV1/Conv2d_7_depthwise/depthwise: 4,608 903,168
|
||||
MobilenetV1/Conv2d_7_pointwise/Conv2D: 262,144 51,380,224
|
||||
MobilenetV1/Conv2d_8_depthwise/depthwise: 4,608 903,168
|
||||
MobilenetV1/Conv2d_8_pointwise/Conv2D: 262,144 51,380,224
|
||||
MobilenetV1/Conv2d_9_depthwise/depthwise: 4,608 903,168
|
||||
MobilenetV1/Conv2d_9_pointwise/Conv2D: 262,144 51,380,224
|
||||
MobilenetV1/Conv2d_10_depthwise/depthwise: 4,608 903,168
|
||||
MobilenetV1/Conv2d_10_pointwise/Conv2D: 262,144 51,380,224
|
||||
MobilenetV1/Conv2d_11_depthwise/depthwise: 4,608 903,168
|
||||
MobilenetV1/Conv2d_11_pointwise/Conv2D: 262,144 51,380,224
|
||||
MobilenetV1/Conv2d_12_depthwise/depthwise: 4,608 225,792
|
||||
MobilenetV1/Conv2d_12_pointwise/Conv2D: 524,288 25,690,112
|
||||
MobilenetV1/Conv2d_13_depthwise/depthwise: 9,216 451,584
|
||||
MobilenetV1/Conv2d_13_pointwise/Conv2D: 1,048,576 51,380,224
|
||||
--------------------------------------------------------------------------------
|
||||
Total: 3,185,088 567,716,352
|
||||
|
||||
|
||||
75% Mobilenet V1 (base) with input size 128x128:
|
||||
|
||||
See mobilenet_v1_075()
|
||||
|
||||
Layer params macs
|
||||
--------------------------------------------------------------------------------
|
||||
MobilenetV1/Conv2d_0/Conv2D: 648 2,654,208
|
||||
MobilenetV1/Conv2d_1_depthwise/depthwise: 216 884,736
|
||||
MobilenetV1/Conv2d_1_pointwise/Conv2D: 1,152 4,718,592
|
||||
MobilenetV1/Conv2d_2_depthwise/depthwise: 432 442,368
|
||||
MobilenetV1/Conv2d_2_pointwise/Conv2D: 4,608 4,718,592
|
||||
MobilenetV1/Conv2d_3_depthwise/depthwise: 864 884,736
|
||||
MobilenetV1/Conv2d_3_pointwise/Conv2D: 9,216 9,437,184
|
||||
MobilenetV1/Conv2d_4_depthwise/depthwise: 864 221,184
|
||||
MobilenetV1/Conv2d_4_pointwise/Conv2D: 18,432 4,718,592
|
||||
MobilenetV1/Conv2d_5_depthwise/depthwise: 1,728 442,368
|
||||
MobilenetV1/Conv2d_5_pointwise/Conv2D: 36,864 9,437,184
|
||||
MobilenetV1/Conv2d_6_depthwise/depthwise: 1,728 110,592
|
||||
MobilenetV1/Conv2d_6_pointwise/Conv2D: 73,728 4,718,592
|
||||
MobilenetV1/Conv2d_7_depthwise/depthwise: 3,456 221,184
|
||||
MobilenetV1/Conv2d_7_pointwise/Conv2D: 147,456 9,437,184
|
||||
MobilenetV1/Conv2d_8_depthwise/depthwise: 3,456 221,184
|
||||
MobilenetV1/Conv2d_8_pointwise/Conv2D: 147,456 9,437,184
|
||||
MobilenetV1/Conv2d_9_depthwise/depthwise: 3,456 221,184
|
||||
MobilenetV1/Conv2d_9_pointwise/Conv2D: 147,456 9,437,184
|
||||
MobilenetV1/Conv2d_10_depthwise/depthwise: 3,456 221,184
|
||||
MobilenetV1/Conv2d_10_pointwise/Conv2D: 147,456 9,437,184
|
||||
MobilenetV1/Conv2d_11_depthwise/depthwise: 3,456 221,184
|
||||
MobilenetV1/Conv2d_11_pointwise/Conv2D: 147,456 9,437,184
|
||||
MobilenetV1/Conv2d_12_depthwise/depthwise: 3,456 55,296
|
||||
MobilenetV1/Conv2d_12_pointwise/Conv2D: 294,912 4,718,592
|
||||
MobilenetV1/Conv2d_13_depthwise/depthwise: 6,912 110,592
|
||||
MobilenetV1/Conv2d_13_pointwise/Conv2D: 589,824 9,437,184
|
||||
--------------------------------------------------------------------------------
|
||||
Total: 1,800,144 106,002,432
|
||||
|
||||
"""
|
||||
|
||||
# Tensorflow mandates these.
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from collections import namedtuple
|
||||
import functools
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# Conv and DepthSepConv namedtuple define layers of the MobileNet architecture
|
||||
# Conv defines 3x3 convolution layers
|
||||
# DepthSepConv defines 3x3 depthwise convolution followed by 1x1 convolution.
|
||||
# stride is the stride of the convolution
|
||||
# depth is the number of channels or filters in a layer
|
||||
Conv = namedtuple('Conv', ['kernel', 'stride', 'depth'])
|
||||
DepthSepConv = namedtuple('DepthSepConv', ['kernel', 'stride', 'depth'])
|
||||
|
||||
# MOBILENETV1_CONV_DEFS specifies the MobileNet body
|
||||
MOBILENETV1_CONV_DEFS = [
|
||||
Conv(kernel=[3, 3], stride=2, depth=32),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=64),
|
||||
DepthSepConv(kernel=[3, 3], stride=2, depth=128),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=128),
|
||||
DepthSepConv(kernel=[3, 3], stride=2, depth=256),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=256),
|
||||
DepthSepConv(kernel=[3, 3], stride=2, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
|
||||
DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
|
||||
DepthSepConv(kernel=[3, 3], stride=1, depth=1024)
|
||||
]
|
||||
|
||||
|
||||
def _fixed_padding(inputs, kernel_size, rate=1):
|
||||
"""Pads the input along the spatial dimensions independently of input size.
|
||||
|
||||
Pads the input such that if it was used in a convolution with 'VALID' padding,
|
||||
the output would have the same dimensions as if the unpadded input was used
|
||||
in a convolution with 'SAME' padding.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch, height_in, width_in, channels].
|
||||
kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
|
||||
rate: An integer, rate for atrous convolution.
|
||||
|
||||
Returns:
|
||||
output: A tensor of size [batch, height_out, width_out, channels] with the
|
||||
input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
|
||||
"""
|
||||
kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
|
||||
kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
|
||||
pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
|
||||
pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
|
||||
pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
|
||||
padded_inputs = tf.pad(
|
||||
tensor=inputs,
|
||||
paddings=[[0, 0], [pad_beg[0], pad_end[0]], [pad_beg[1], pad_end[1]],
|
||||
[0, 0]])
|
||||
return padded_inputs
|
||||
|
||||
|
||||
def mobilenet_v1_base(inputs,
|
||||
final_endpoint='Conv2d_13_pointwise',
|
||||
min_depth=8,
|
||||
depth_multiplier=1.0,
|
||||
conv_defs=None,
|
||||
output_stride=None,
|
||||
use_explicit_padding=False,
|
||||
scope=None):
|
||||
"""Mobilenet v1.
|
||||
|
||||
Constructs a Mobilenet v1 network from inputs to the given final endpoint.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
final_endpoint: specifies the endpoint to construct the network up to. It
|
||||
can be one of ['Conv2d_0', 'Conv2d_1_pointwise', 'Conv2d_2_pointwise',
|
||||
'Conv2d_3_pointwise', 'Conv2d_4_pointwise', 'Conv2d_5'_pointwise,
|
||||
'Conv2d_6_pointwise', 'Conv2d_7_pointwise', 'Conv2d_8_pointwise',
|
||||
'Conv2d_9_pointwise', 'Conv2d_10_pointwise', 'Conv2d_11_pointwise',
|
||||
'Conv2d_12_pointwise', 'Conv2d_13_pointwise'].
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
|
||||
output_stride: An integer that specifies the requested ratio of input to
|
||||
output spatial resolution. If not None, then we invoke atrous convolution
|
||||
if necessary to prevent the network from reducing the spatial resolution
|
||||
of the activation maps. Allowed values are 8 (accurate fully convolutional
|
||||
mode), 16 (fast fully convolutional mode), 32 (classification mode).
|
||||
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
|
||||
inputs so that the output dimensions are the same as if 'SAME' padding
|
||||
were used.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
tensor_out: output tensor corresponding to the final_endpoint.
|
||||
end_points: a set of activations for external use, for example summaries or
|
||||
losses.
|
||||
|
||||
Raises:
|
||||
ValueError: if final_endpoint is not set to one of the predefined values,
|
||||
or depth_multiplier <= 0, or the target output_stride is not
|
||||
allowed.
|
||||
"""
|
||||
depth = lambda d: max(int(d * depth_multiplier), min_depth)
|
||||
end_points = {}
|
||||
|
||||
# Used to find thinned depths for each layer.
|
||||
if depth_multiplier <= 0:
|
||||
raise ValueError('depth_multiplier is not greater than zero.')
|
||||
|
||||
if conv_defs is None:
|
||||
conv_defs = MOBILENETV1_CONV_DEFS
|
||||
|
||||
if output_stride is not None and output_stride not in [8, 16, 32]:
|
||||
raise ValueError('Only allowed output_stride values are 8, 16, 32.')
|
||||
|
||||
padding = 'SAME'
|
||||
if use_explicit_padding:
|
||||
padding = 'VALID'
|
||||
with tf.compat.v1.variable_scope(scope, 'MobilenetV1', [inputs]):
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d], padding=padding):
|
||||
# The current_stride variable keeps track of the output stride of the
|
||||
# activations, i.e., the running product of convolution strides up to the
|
||||
# current network layer. This allows us to invoke atrous convolution
|
||||
# whenever applying the next convolution would result in the activations
|
||||
# having output stride larger than the target output_stride.
|
||||
current_stride = 1
|
||||
|
||||
# The atrous convolution rate parameter.
|
||||
rate = 1
|
||||
|
||||
net = inputs
|
||||
for i, conv_def in enumerate(conv_defs):
|
||||
end_point_base = 'Conv2d_%d' % i
|
||||
|
||||
if output_stride is not None and current_stride == output_stride:
|
||||
# If we have reached the target output_stride, then we need to employ
|
||||
# atrous convolution with stride=1 and multiply the atrous rate by the
|
||||
# current unit's stride for use in subsequent layers.
|
||||
layer_stride = 1
|
||||
layer_rate = rate
|
||||
rate *= conv_def.stride
|
||||
else:
|
||||
layer_stride = conv_def.stride
|
||||
layer_rate = 1
|
||||
current_stride *= conv_def.stride
|
||||
|
||||
if isinstance(conv_def, Conv):
|
||||
end_point = end_point_base
|
||||
if use_explicit_padding:
|
||||
net = _fixed_padding(net, conv_def.kernel)
|
||||
net = slim.conv2d(net, depth(conv_def.depth), conv_def.kernel,
|
||||
stride=conv_def.stride,
|
||||
scope=end_point)
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
|
||||
elif isinstance(conv_def, DepthSepConv):
|
||||
end_point = end_point_base + '_depthwise'
|
||||
|
||||
# By passing filters=None
|
||||
# separable_conv2d produces only a depthwise convolution layer
|
||||
if use_explicit_padding:
|
||||
net = _fixed_padding(net, conv_def.kernel, layer_rate)
|
||||
net = slim.separable_conv2d(net, None, conv_def.kernel,
|
||||
depth_multiplier=1,
|
||||
stride=layer_stride,
|
||||
rate=layer_rate,
|
||||
scope=end_point)
|
||||
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
|
||||
end_point = end_point_base + '_pointwise'
|
||||
|
||||
net = slim.conv2d(net, depth(conv_def.depth), [1, 1],
|
||||
stride=1,
|
||||
scope=end_point)
|
||||
|
||||
end_points[end_point] = net
|
||||
if end_point == final_endpoint:
|
||||
return net, end_points
|
||||
else:
|
||||
raise ValueError('Unknown convolution type %s for layer %d'
|
||||
% (conv_def.ltype, i))
|
||||
raise ValueError('Unknown final endpoint %s' % final_endpoint)
|
||||
|
||||
|
||||
def mobilenet_v1(inputs,
|
||||
num_classes=1000,
|
||||
dropout_keep_prob=0.999,
|
||||
is_training=True,
|
||||
min_depth=8,
|
||||
depth_multiplier=1.0,
|
||||
conv_defs=None,
|
||||
prediction_fn=contrib_layers.softmax,
|
||||
spatial_squeeze=True,
|
||||
reuse=None,
|
||||
scope='MobilenetV1',
|
||||
global_pool=False):
|
||||
"""Mobilenet v1 model for classification.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of shape [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer
|
||||
is omitted and the input features to the logits layer (before dropout)
|
||||
are returned instead.
|
||||
dropout_keep_prob: the percentage of activation values that are retained.
|
||||
is_training: whether is training or not.
|
||||
min_depth: Minimum depth value (number of channels) for all convolution ops.
|
||||
Enforced when depth_multiplier < 1, and not an active constraint when
|
||||
depth_multiplier >= 1.
|
||||
depth_multiplier: Float multiplier for the depth (number of channels)
|
||||
for all convolution ops. The value must be greater than zero. Typical
|
||||
usage will be to set this value in (0, 1) to reduce the number of
|
||||
parameters or computation cost of the model.
|
||||
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
|
||||
prediction_fn: a function to get predictions out of logits.
|
||||
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
|
||||
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
global_pool: Optional boolean flag to control the avgpooling before the
|
||||
logits layer. If false or unset, pooling is done with a fixed window
|
||||
that reduces default-sized inputs to 1x1, while larger inputs lead to
|
||||
larger outputs. If true, any input size is pooled down to 1x1.
|
||||
|
||||
Returns:
|
||||
net: a 2D Tensor with the logits (pre-softmax activations) if num_classes
|
||||
is a non-zero integer, or the non-dropped-out input to the logits layer
|
||||
if num_classes is 0 or None.
|
||||
end_points: a dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
|
||||
Raises:
|
||||
ValueError: Input rank is invalid.
|
||||
"""
|
||||
input_shape = inputs.get_shape().as_list()
|
||||
if len(input_shape) != 4:
|
||||
raise ValueError('Invalid input tensor rank, expected 4, was: %d' %
|
||||
len(input_shape))
|
||||
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'MobilenetV1', [inputs], reuse=reuse) as scope:
|
||||
with slim.arg_scope([slim.batch_norm, slim.dropout],
|
||||
is_training=is_training):
|
||||
net, end_points = mobilenet_v1_base(inputs, scope=scope,
|
||||
min_depth=min_depth,
|
||||
depth_multiplier=depth_multiplier,
|
||||
conv_defs=conv_defs)
|
||||
with tf.compat.v1.variable_scope('Logits'):
|
||||
if global_pool:
|
||||
# Global average pooling.
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
else:
|
||||
# Pooling with a fixed kernel size.
|
||||
kernel_size = _reduced_kernel_size_for_small_input(net, [7, 7])
|
||||
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
|
||||
scope='AvgPool_1a')
|
||||
end_points['AvgPool_1a'] = net
|
||||
if not num_classes:
|
||||
return net, end_points
|
||||
# 1 x 1 x 1024
|
||||
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
|
||||
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, scope='Conv2d_1c_1x1')
|
||||
if spatial_squeeze:
|
||||
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
|
||||
end_points['Logits'] = logits
|
||||
if prediction_fn:
|
||||
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
|
||||
return logits, end_points
|
||||
|
||||
mobilenet_v1.default_image_size = 224
|
||||
|
||||
|
||||
def wrapped_partial(func, *args, **kwargs):
|
||||
partial_func = functools.partial(func, *args, **kwargs)
|
||||
functools.update_wrapper(partial_func, func)
|
||||
return partial_func
|
||||
|
||||
|
||||
mobilenet_v1_075 = wrapped_partial(mobilenet_v1, depth_multiplier=0.75)
|
||||
mobilenet_v1_050 = wrapped_partial(mobilenet_v1, depth_multiplier=0.50)
|
||||
mobilenet_v1_025 = wrapped_partial(mobilenet_v1, depth_multiplier=0.25)
|
||||
|
||||
|
||||
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
|
||||
"""Define kernel size which is automatically reduced for small input.
|
||||
|
||||
If the shape of the input images is unknown at graph construction time this
|
||||
function assumes that the input images are large enough.
|
||||
|
||||
Args:
|
||||
input_tensor: input tensor of size [batch_size, height, width, channels].
|
||||
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
|
||||
|
||||
Returns:
|
||||
a tensor with the kernel size.
|
||||
"""
|
||||
shape = input_tensor.get_shape().as_list()
|
||||
if shape[1] is None or shape[2] is None:
|
||||
kernel_size_out = kernel_size
|
||||
else:
|
||||
kernel_size_out = [min(shape[1], kernel_size[0]),
|
||||
min(shape[2], kernel_size[1])]
|
||||
return kernel_size_out
|
||||
|
||||
|
||||
def mobilenet_v1_arg_scope(
|
||||
is_training=True,
|
||||
weight_decay=0.00004,
|
||||
stddev=0.09,
|
||||
regularize_depthwise=False,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=0.001,
|
||||
batch_norm_updates_collections=tf.compat.v1.GraphKeys.UPDATE_OPS,
|
||||
normalizer_fn=slim.batch_norm):
|
||||
"""Defines the default MobilenetV1 arg scope.
|
||||
|
||||
Args:
|
||||
is_training: Whether or not we're training the model. If this is set to
|
||||
None, the parameter is not added to the batch_norm arg_scope.
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
stddev: The standard deviation of the trunctated normal weight initializer.
|
||||
regularize_depthwise: Whether or not apply regularization on depthwise.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
batch_norm_updates_collections: Collection for the update ops for
|
||||
batch norm.
|
||||
normalizer_fn: Normalization function to apply after convolution.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the mobilenet v1 model.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
'center': True,
|
||||
'scale': True,
|
||||
'decay': batch_norm_decay,
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'updates_collections': batch_norm_updates_collections,
|
||||
}
|
||||
if is_training is not None:
|
||||
batch_norm_params['is_training'] = is_training
|
||||
|
||||
# Set weight_decay for weights in Conv and DepthSepConv layers.
|
||||
weights_init = tf.compat.v1.truncated_normal_initializer(stddev=stddev)
|
||||
regularizer = contrib_layers.l2_regularizer(weight_decay)
|
||||
if regularize_depthwise:
|
||||
depthwise_regularizer = regularizer
|
||||
else:
|
||||
depthwise_regularizer = None
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
weights_initializer=weights_init,
|
||||
activation_fn=tf.nn.relu6, normalizer_fn=normalizer_fn):
|
||||
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
|
||||
with slim.arg_scope([slim.conv2d], weights_regularizer=regularizer):
|
||||
with slim.arg_scope([slim.separable_conv2d],
|
||||
weights_regularizer=depthwise_regularizer) as sc:
|
||||
return sc
|
||||
+157
@@ -0,0 +1,157 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Validate mobilenet_v1 with options for quantization."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import quantize as contrib_quantize
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_factory
|
||||
from nets import mobilenet_v1
|
||||
from preprocessing import preprocessing_factory
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
flags = tf.compat.v1.app.flags
|
||||
|
||||
flags.DEFINE_string('master', '', 'Session master')
|
||||
flags.DEFINE_integer('batch_size', 250, 'Batch size')
|
||||
flags.DEFINE_integer('num_classes', 1001, 'Number of classes to distinguish')
|
||||
flags.DEFINE_integer('num_examples', 50000, 'Number of examples to evaluate')
|
||||
flags.DEFINE_integer('image_size', 224, 'Input image resolution')
|
||||
flags.DEFINE_float('depth_multiplier', 1.0, 'Depth multiplier for mobilenet')
|
||||
flags.DEFINE_bool('quantize', False, 'Quantize training')
|
||||
flags.DEFINE_string('checkpoint_dir', '', 'The directory for checkpoints')
|
||||
flags.DEFINE_string('eval_dir', '', 'Directory for writing eval event logs')
|
||||
flags.DEFINE_string('dataset_dir', '', 'Location of dataset')
|
||||
|
||||
FLAGS = flags.FLAGS
|
||||
|
||||
|
||||
def imagenet_input(is_training):
|
||||
"""Data reader for imagenet.
|
||||
|
||||
Reads in imagenet data and performs pre-processing on the images.
|
||||
|
||||
Args:
|
||||
is_training: bool specifying if train or validation dataset is needed.
|
||||
Returns:
|
||||
A batch of images and labels.
|
||||
"""
|
||||
if is_training:
|
||||
dataset = dataset_factory.get_dataset('imagenet', 'train',
|
||||
FLAGS.dataset_dir)
|
||||
else:
|
||||
dataset = dataset_factory.get_dataset('imagenet', 'validation',
|
||||
FLAGS.dataset_dir)
|
||||
|
||||
provider = slim.dataset_data_provider.DatasetDataProvider(
|
||||
dataset,
|
||||
shuffle=is_training,
|
||||
common_queue_capacity=2 * FLAGS.batch_size,
|
||||
common_queue_min=FLAGS.batch_size)
|
||||
[image, label] = provider.get(['image', 'label'])
|
||||
|
||||
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
|
||||
'mobilenet_v1', is_training=is_training)
|
||||
|
||||
image = image_preprocessing_fn(image, FLAGS.image_size, FLAGS.image_size)
|
||||
|
||||
images, labels = tf.compat.v1.train.batch(
|
||||
tensors=[image, label],
|
||||
batch_size=FLAGS.batch_size,
|
||||
num_threads=4,
|
||||
capacity=5 * FLAGS.batch_size)
|
||||
return images, labels
|
||||
|
||||
|
||||
def metrics(logits, labels):
|
||||
"""Specify the metrics for eval.
|
||||
|
||||
Args:
|
||||
logits: Logits output from the graph.
|
||||
labels: Ground truth labels for inputs.
|
||||
|
||||
Returns:
|
||||
Eval Op for the graph.
|
||||
"""
|
||||
labels = tf.squeeze(labels)
|
||||
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
|
||||
'Accuracy':
|
||||
tf.compat.v1.metrics.accuracy(
|
||||
tf.argmax(input=logits, axis=1), labels),
|
||||
'Recall_5':
|
||||
tf.compat.v1.metrics.recall_at_k(labels, logits, 5),
|
||||
})
|
||||
for name, value in names_to_values.iteritems():
|
||||
slim.summaries.add_scalar_summary(
|
||||
value, name, prefix='eval', print_summary=True)
|
||||
return names_to_updates.values()
|
||||
|
||||
|
||||
def build_model():
|
||||
"""Build the mobilenet_v1 model for evaluation.
|
||||
|
||||
Returns:
|
||||
g: graph with rewrites after insertion of quantization ops and batch norm
|
||||
folding.
|
||||
eval_ops: eval ops for inference.
|
||||
variables_to_restore: List of variables to restore from checkpoint.
|
||||
"""
|
||||
g = tf.Graph()
|
||||
with g.as_default():
|
||||
inputs, labels = imagenet_input(is_training=False)
|
||||
|
||||
scope = mobilenet_v1.mobilenet_v1_arg_scope(
|
||||
is_training=False, weight_decay=0.0)
|
||||
with slim.arg_scope(scope):
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(
|
||||
inputs,
|
||||
is_training=False,
|
||||
depth_multiplier=FLAGS.depth_multiplier,
|
||||
num_classes=FLAGS.num_classes)
|
||||
|
||||
if FLAGS.quantize:
|
||||
contrib_quantize.create_eval_graph()
|
||||
|
||||
eval_ops = metrics(logits, labels)
|
||||
|
||||
return g, eval_ops
|
||||
|
||||
|
||||
def eval_model():
|
||||
"""Evaluates mobilenet_v1."""
|
||||
g, eval_ops = build_model()
|
||||
with g.as_default():
|
||||
num_batches = math.ceil(FLAGS.num_examples / float(FLAGS.batch_size))
|
||||
slim.evaluation.evaluate_once(
|
||||
FLAGS.master,
|
||||
FLAGS.checkpoint_dir,
|
||||
logdir=FLAGS.eval_dir,
|
||||
num_evals=num_batches,
|
||||
eval_op=eval_ops)
|
||||
|
||||
|
||||
def main(unused_arg):
|
||||
eval_model()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.compat.v1.app.run(main)
|
||||
+537
@@ -0,0 +1,537 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# =============================================================================
|
||||
"""Tests for MobileNet v1."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import mobilenet_v1
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class MobilenetV1Test(tf.test.TestCase):
|
||||
|
||||
def testBuildClassificationNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith(
|
||||
'MobilenetV1/Logits/SpatialSqueeze'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertTrue('Predictions' in end_points)
|
||||
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
self.assertTrue(net.op.name.startswith('MobilenetV1/Logits/AvgPool'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1, 1, 1024])
|
||||
self.assertFalse('Logits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
|
||||
def testBuildBaseNetwork(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = mobilenet_v1.mobilenet_v1_base(inputs)
|
||||
self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_13'))
|
||||
self.assertListEqual(net.get_shape().as_list(),
|
||||
[batch_size, 7, 7, 1024])
|
||||
expected_endpoints = ['Conv2d_0',
|
||||
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
|
||||
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
|
||||
'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
|
||||
'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
|
||||
'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
|
||||
'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
|
||||
'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
|
||||
'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
|
||||
'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
|
||||
'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
|
||||
'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
|
||||
'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
|
||||
'Conv2d_13_depthwise', 'Conv2d_13_pointwise']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildOnlyUptoFinalEndpoint(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
endpoints = ['Conv2d_0',
|
||||
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
|
||||
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
|
||||
'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
|
||||
'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
|
||||
'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
|
||||
'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
|
||||
'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
|
||||
'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
|
||||
'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
|
||||
'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
|
||||
'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
|
||||
'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
|
||||
'Conv2d_13_depthwise', 'Conv2d_13_pointwise']
|
||||
for index, endpoint in enumerate(endpoints):
|
||||
with tf.Graph().as_default():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
out_tensor, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint=endpoint)
|
||||
self.assertTrue(out_tensor.op.name.startswith(
|
||||
'MobilenetV1/' + endpoint))
|
||||
self.assertItemsEqual(endpoints[:index+1], end_points.keys())
|
||||
|
||||
def testBuildCustomNetworkUsingConvDefs(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
conv_defs = [
|
||||
mobilenet_v1.Conv(kernel=[3, 3], stride=2, depth=32),
|
||||
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=64),
|
||||
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=2, depth=128),
|
||||
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=512)
|
||||
]
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint='Conv2d_3_pointwise', conv_defs=conv_defs)
|
||||
self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_3'))
|
||||
self.assertListEqual(net.get_shape().as_list(),
|
||||
[batch_size, 56, 56, 512])
|
||||
expected_endpoints = ['Conv2d_0',
|
||||
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
|
||||
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
|
||||
'Conv2d_3_depthwise', 'Conv2d_3_pointwise']
|
||||
self.assertItemsEqual(end_points.keys(), expected_endpoints)
|
||||
|
||||
def testBuildAndCheckAllEndPointsUptoConv2d_13(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
normalizer_fn=slim.batch_norm):
|
||||
_, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint='Conv2d_13_pointwise')
|
||||
_, explicit_padding_end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint='Conv2d_13_pointwise',
|
||||
use_explicit_padding=True)
|
||||
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
|
||||
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
|
||||
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
|
||||
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_6_depthwise': [batch_size, 14, 14, 256],
|
||||
'Conv2d_6_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_7_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_7_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_8_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_8_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_9_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_9_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_10_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_10_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_11_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_11_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_12_depthwise': [batch_size, 7, 7, 512],
|
||||
'Conv2d_12_pointwise': [batch_size, 7, 7, 1024],
|
||||
'Conv2d_13_depthwise': [batch_size, 7, 7, 1024],
|
||||
'Conv2d_13_pointwise': [batch_size, 7, 7, 1024]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(),
|
||||
explicit_padding_end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in explicit_padding_end_points)
|
||||
self.assertListEqual(
|
||||
explicit_padding_end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testOutputStride16BuildAndCheckAllEndPointsUptoConv2d_13(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
output_stride = 16
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
normalizer_fn=slim.batch_norm):
|
||||
_, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, output_stride=output_stride,
|
||||
final_endpoint='Conv2d_13_pointwise')
|
||||
_, explicit_padding_end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, output_stride=output_stride,
|
||||
final_endpoint='Conv2d_13_pointwise', use_explicit_padding=True)
|
||||
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
|
||||
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
|
||||
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
|
||||
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_6_depthwise': [batch_size, 14, 14, 256],
|
||||
'Conv2d_6_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_7_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_7_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_8_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_8_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_9_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_9_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_10_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_10_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_11_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_11_pointwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_12_depthwise': [batch_size, 14, 14, 512],
|
||||
'Conv2d_12_pointwise': [batch_size, 14, 14, 1024],
|
||||
'Conv2d_13_depthwise': [batch_size, 14, 14, 1024],
|
||||
'Conv2d_13_pointwise': [batch_size, 14, 14, 1024]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(),
|
||||
explicit_padding_end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in explicit_padding_end_points)
|
||||
self.assertListEqual(
|
||||
explicit_padding_end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testOutputStride8BuildAndCheckAllEndPointsUptoConv2d_13(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
output_stride = 8
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
normalizer_fn=slim.batch_norm):
|
||||
_, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, output_stride=output_stride,
|
||||
final_endpoint='Conv2d_13_pointwise')
|
||||
_, explicit_padding_end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, output_stride=output_stride,
|
||||
final_endpoint='Conv2d_13_pointwise', use_explicit_padding=True)
|
||||
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
|
||||
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
|
||||
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
|
||||
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
|
||||
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
|
||||
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_6_depthwise': [batch_size, 28, 28, 256],
|
||||
'Conv2d_6_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_7_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_7_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_8_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_8_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_9_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_9_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_10_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_10_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_11_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_11_pointwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_12_depthwise': [batch_size, 28, 28, 512],
|
||||
'Conv2d_12_pointwise': [batch_size, 28, 28, 1024],
|
||||
'Conv2d_13_depthwise': [batch_size, 28, 28, 1024],
|
||||
'Conv2d_13_pointwise': [batch_size, 28, 28, 1024]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(),
|
||||
explicit_padding_end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in explicit_padding_end_points)
|
||||
self.assertListEqual(
|
||||
explicit_padding_end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testBuildAndCheckAllEndPointsApproximateFaceNet(self):
|
||||
batch_size = 5
|
||||
height, width = 128, 128
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
normalizer_fn=slim.batch_norm):
|
||||
_, end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint='Conv2d_13_pointwise', depth_multiplier=0.75)
|
||||
_, explicit_padding_end_points = mobilenet_v1.mobilenet_v1_base(
|
||||
inputs, final_endpoint='Conv2d_13_pointwise', depth_multiplier=0.75,
|
||||
use_explicit_padding=True)
|
||||
# For the Conv2d_0 layer FaceNet has depth=16
|
||||
endpoints_shapes = {'Conv2d_0': [batch_size, 64, 64, 24],
|
||||
'Conv2d_1_depthwise': [batch_size, 64, 64, 24],
|
||||
'Conv2d_1_pointwise': [batch_size, 64, 64, 48],
|
||||
'Conv2d_2_depthwise': [batch_size, 32, 32, 48],
|
||||
'Conv2d_2_pointwise': [batch_size, 32, 32, 96],
|
||||
'Conv2d_3_depthwise': [batch_size, 32, 32, 96],
|
||||
'Conv2d_3_pointwise': [batch_size, 32, 32, 96],
|
||||
'Conv2d_4_depthwise': [batch_size, 16, 16, 96],
|
||||
'Conv2d_4_pointwise': [batch_size, 16, 16, 192],
|
||||
'Conv2d_5_depthwise': [batch_size, 16, 16, 192],
|
||||
'Conv2d_5_pointwise': [batch_size, 16, 16, 192],
|
||||
'Conv2d_6_depthwise': [batch_size, 8, 8, 192],
|
||||
'Conv2d_6_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_7_depthwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_7_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_8_depthwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_8_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_9_depthwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_9_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_10_depthwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_10_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_11_depthwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_11_pointwise': [batch_size, 8, 8, 384],
|
||||
'Conv2d_12_depthwise': [batch_size, 4, 4, 384],
|
||||
'Conv2d_12_pointwise': [batch_size, 4, 4, 768],
|
||||
'Conv2d_13_depthwise': [batch_size, 4, 4, 768],
|
||||
'Conv2d_13_pointwise': [batch_size, 4, 4, 768]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(),
|
||||
explicit_padding_end_points.keys())
|
||||
for endpoint_name, expected_shape in endpoints_shapes.items():
|
||||
self.assertTrue(endpoint_name in explicit_padding_end_points)
|
||||
self.assertListEqual(
|
||||
explicit_padding_end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testModelHasExpectedNumberOfParameters(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
normalizer_fn=slim.batch_norm):
|
||||
mobilenet_v1.mobilenet_v1_base(inputs)
|
||||
total_params, _ = slim.model_analyzer.analyze_vars(
|
||||
slim.get_model_variables())
|
||||
self.assertAlmostEqual(3217920, total_params)
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys() if key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = mobilenet_v1.mobilenet_v1(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=0.5)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(0.5 * original_depth, new_depth)
|
||||
|
||||
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
|
||||
endpoint_keys = [key for key in end_points.keys()
|
||||
if key.startswith('Mixed') or key.startswith('Conv')]
|
||||
|
||||
_, end_points_with_multiplier = mobilenet_v1.mobilenet_v1(
|
||||
inputs, num_classes, scope='depth_multiplied_net',
|
||||
depth_multiplier=2.0)
|
||||
|
||||
for key in endpoint_keys:
|
||||
original_depth = end_points[key].get_shape().as_list()[3]
|
||||
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
|
||||
self.assertEqual(2.0 * original_depth, new_depth)
|
||||
|
||||
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with self.assertRaises(ValueError):
|
||||
_ = mobilenet_v1.mobilenet_v1(
|
||||
inputs, num_classes, depth_multiplier=-0.1)
|
||||
with self.assertRaises(ValueError):
|
||||
_ = mobilenet_v1.mobilenet_v1(
|
||||
inputs, num_classes, depth_multiplier=0.0)
|
||||
|
||||
def testHalfSizeImages(self):
|
||||
batch_size = 5
|
||||
height, width = 112, 112
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_13_pointwise']
|
||||
self.assertListEqual(pre_pool.get_shape().as_list(),
|
||||
[batch_size, 4, 4, 1024])
|
||||
|
||||
def testUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_13_pointwise']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
|
||||
|
||||
def testGlobalPoolUnknownImageShape(self):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
batch_size = 1
|
||||
height, width = 250, 300
|
||||
num_classes = 1000
|
||||
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(
|
||||
tf.float32, shape=(batch_size, None, None, 3))
|
||||
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes,
|
||||
global_pool=True)
|
||||
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
pre_pool = end_points['Conv2d_13_pointwise']
|
||||
feed_dict = {inputs: input_np}
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
|
||||
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 10, 1024])
|
||||
|
||||
def testUnknowBatchSize(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(inputs, num_classes)
|
||||
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 5
|
||||
eval_batch_size = 2
|
||||
height, width = 150, 150
|
||||
num_classes = 1000
|
||||
|
||||
train_inputs = tf.random.uniform((train_batch_size, height, width, 3))
|
||||
mobilenet_v1.mobilenet_v1(train_inputs, num_classes)
|
||||
eval_inputs = tf.random.uniform((eval_batch_size, height, width, 3))
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes,
|
||||
reuse=True)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
|
||||
with self.test_session() as sess:
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (eval_batch_size,))
|
||||
|
||||
def testLogitsNotSqueezed(self):
|
||||
num_classes = 25
|
||||
images = tf.random.uniform([1, 224, 224, 3])
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(images,
|
||||
num_classes=num_classes,
|
||||
spatial_squeeze=False)
|
||||
|
||||
with self.test_session() as sess:
|
||||
tf.compat.v1.global_variables_initializer().run()
|
||||
logits_out = sess.run(logits)
|
||||
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
|
||||
|
||||
def testBatchNormScopeDoesNotHaveIsTrainingWhenItsSetToNone(self):
|
||||
sc = mobilenet_v1.mobilenet_v1_arg_scope(is_training=None)
|
||||
self.assertNotIn('is_training', sc[slim.arg_scope_func_key(
|
||||
slim.batch_norm)])
|
||||
|
||||
def testBatchNormScopeDoesHasIsTrainingWhenItsNotNone(self):
|
||||
sc = mobilenet_v1.mobilenet_v1_arg_scope(is_training=True)
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
sc = mobilenet_v1.mobilenet_v1_arg_scope(is_training=False)
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
sc = mobilenet_v1.mobilenet_v1_arg_scope()
|
||||
self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+214
@@ -0,0 +1,214 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Build and train mobilenet_v1 with options for quantization."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import quantize as contrib_quantize
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from datasets import dataset_factory
|
||||
from nets import mobilenet_v1
|
||||
from preprocessing import preprocessing_factory
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
flags = tf.compat.v1.app.flags
|
||||
|
||||
flags.DEFINE_string('master', '', 'Session master')
|
||||
flags.DEFINE_integer('task', 0, 'Task')
|
||||
flags.DEFINE_integer('ps_tasks', 0, 'Number of ps')
|
||||
flags.DEFINE_integer('batch_size', 64, 'Batch size')
|
||||
flags.DEFINE_integer('num_classes', 1001, 'Number of classes to distinguish')
|
||||
flags.DEFINE_integer('number_of_steps', None,
|
||||
'Number of training steps to perform before stopping')
|
||||
flags.DEFINE_integer('image_size', 224, 'Input image resolution')
|
||||
flags.DEFINE_float('depth_multiplier', 1.0, 'Depth multiplier for mobilenet')
|
||||
flags.DEFINE_bool('quantize', False, 'Quantize training')
|
||||
flags.DEFINE_string('fine_tune_checkpoint', '',
|
||||
'Checkpoint from which to start finetuning.')
|
||||
flags.DEFINE_string('checkpoint_dir', '',
|
||||
'Directory for writing training checkpoints and logs')
|
||||
flags.DEFINE_string('dataset_dir', '', 'Location of dataset')
|
||||
flags.DEFINE_integer('log_every_n_steps', 100, 'Number of steps per log')
|
||||
flags.DEFINE_integer('save_summaries_secs', 100,
|
||||
'How often to save summaries, secs')
|
||||
flags.DEFINE_integer('save_interval_secs', 100,
|
||||
'How often to save checkpoints, secs')
|
||||
|
||||
FLAGS = flags.FLAGS
|
||||
|
||||
_LEARNING_RATE_DECAY_FACTOR = 0.94
|
||||
|
||||
|
||||
def get_learning_rate():
|
||||
if FLAGS.fine_tune_checkpoint:
|
||||
# If we are fine tuning a checkpoint we need to start at a lower learning
|
||||
# rate since we are farther along on training.
|
||||
return 1e-4
|
||||
else:
|
||||
return 0.045
|
||||
|
||||
|
||||
def get_quant_delay():
|
||||
if FLAGS.fine_tune_checkpoint:
|
||||
# We can start quantizing immediately if we are finetuning.
|
||||
return 0
|
||||
else:
|
||||
# We need to wait for the model to train a bit before we quantize if we are
|
||||
# training from scratch.
|
||||
return 250000
|
||||
|
||||
|
||||
def imagenet_input(is_training):
|
||||
"""Data reader for imagenet.
|
||||
|
||||
Reads in imagenet data and performs pre-processing on the images.
|
||||
|
||||
Args:
|
||||
is_training: bool specifying if train or validation dataset is needed.
|
||||
Returns:
|
||||
A batch of images and labels.
|
||||
"""
|
||||
if is_training:
|
||||
dataset = dataset_factory.get_dataset('imagenet', 'train',
|
||||
FLAGS.dataset_dir)
|
||||
else:
|
||||
dataset = dataset_factory.get_dataset('imagenet', 'validation',
|
||||
FLAGS.dataset_dir)
|
||||
|
||||
provider = slim.dataset_data_provider.DatasetDataProvider(
|
||||
dataset,
|
||||
shuffle=is_training,
|
||||
common_queue_capacity=2 * FLAGS.batch_size,
|
||||
common_queue_min=FLAGS.batch_size)
|
||||
[image, label] = provider.get(['image', 'label'])
|
||||
|
||||
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
|
||||
'mobilenet_v1', is_training=is_training)
|
||||
|
||||
image = image_preprocessing_fn(image, FLAGS.image_size, FLAGS.image_size)
|
||||
|
||||
images, labels = tf.compat.v1.train.batch([image, label],
|
||||
batch_size=FLAGS.batch_size,
|
||||
num_threads=4,
|
||||
capacity=5 * FLAGS.batch_size)
|
||||
labels = slim.one_hot_encoding(labels, FLAGS.num_classes)
|
||||
return images, labels
|
||||
|
||||
|
||||
def build_model():
|
||||
"""Builds graph for model to train with rewrites for quantization.
|
||||
|
||||
Returns:
|
||||
g: Graph with fake quantization ops and batch norm folding suitable for
|
||||
training quantized weights.
|
||||
train_tensor: Train op for execution during training.
|
||||
"""
|
||||
g = tf.Graph()
|
||||
with g.as_default(), tf.device(
|
||||
tf.compat.v1.train.replica_device_setter(FLAGS.ps_tasks)):
|
||||
inputs, labels = imagenet_input(is_training=True)
|
||||
with slim.arg_scope(mobilenet_v1.mobilenet_v1_arg_scope(is_training=True)):
|
||||
logits, _ = mobilenet_v1.mobilenet_v1(
|
||||
inputs,
|
||||
is_training=True,
|
||||
depth_multiplier=FLAGS.depth_multiplier,
|
||||
num_classes=FLAGS.num_classes)
|
||||
|
||||
tf.compat.v1.losses.softmax_cross_entropy(labels, logits)
|
||||
|
||||
# Call rewriter to produce graph with fake quant ops and folded batch norms
|
||||
# quant_delay delays start of quantization till quant_delay steps, allowing
|
||||
# for better model accuracy.
|
||||
if FLAGS.quantize:
|
||||
contrib_quantize.create_training_graph(quant_delay=get_quant_delay())
|
||||
|
||||
total_loss = tf.compat.v1.losses.get_total_loss(name='total_loss')
|
||||
# Configure the learning rate using an exponential decay.
|
||||
num_epochs_per_decay = 2.5
|
||||
imagenet_size = 1271167
|
||||
decay_steps = int(imagenet_size / FLAGS.batch_size * num_epochs_per_decay)
|
||||
|
||||
learning_rate = tf.compat.v1.train.exponential_decay(
|
||||
get_learning_rate(),
|
||||
tf.compat.v1.train.get_or_create_global_step(),
|
||||
decay_steps,
|
||||
_LEARNING_RATE_DECAY_FACTOR,
|
||||
staircase=True)
|
||||
opt = tf.compat.v1.train.GradientDescentOptimizer(learning_rate)
|
||||
|
||||
train_tensor = slim.learning.create_train_op(
|
||||
total_loss,
|
||||
optimizer=opt)
|
||||
|
||||
slim.summaries.add_scalar_summary(total_loss, 'total_loss', 'losses')
|
||||
slim.summaries.add_scalar_summary(learning_rate, 'learning_rate', 'training')
|
||||
return g, train_tensor
|
||||
|
||||
|
||||
def get_checkpoint_init_fn():
|
||||
"""Returns the checkpoint init_fn if the checkpoint is provided."""
|
||||
if FLAGS.fine_tune_checkpoint:
|
||||
variables_to_restore = slim.get_variables_to_restore()
|
||||
global_step_reset = tf.compat.v1.assign(
|
||||
tf.compat.v1.train.get_or_create_global_step(), 0)
|
||||
# When restoring from a floating point model, the min/max values for
|
||||
# quantized weights and activations are not present.
|
||||
# We instruct slim to ignore variables that are missing during restoration
|
||||
# by setting ignore_missing_vars=True
|
||||
slim_init_fn = slim.assign_from_checkpoint_fn(
|
||||
FLAGS.fine_tune_checkpoint,
|
||||
variables_to_restore,
|
||||
ignore_missing_vars=True)
|
||||
|
||||
def init_fn(sess):
|
||||
slim_init_fn(sess)
|
||||
# If we are restoring from a floating point model, we need to initialize
|
||||
# the global step to zero for the exponential decay to result in
|
||||
# reasonable learning rates.
|
||||
sess.run(global_step_reset)
|
||||
return init_fn
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
def train_model():
|
||||
"""Trains mobilenet_v1."""
|
||||
g, train_tensor = build_model()
|
||||
with g.as_default():
|
||||
slim.learning.train(
|
||||
train_tensor,
|
||||
FLAGS.checkpoint_dir,
|
||||
is_chief=(FLAGS.task == 0),
|
||||
master=FLAGS.master,
|
||||
log_every_n_steps=FLAGS.log_every_n_steps,
|
||||
graph=g,
|
||||
number_of_steps=FLAGS.number_of_steps,
|
||||
save_summaries_secs=FLAGS.save_summaries_secs,
|
||||
save_interval_secs=FLAGS.save_interval_secs,
|
||||
init_fn=get_checkpoint_init_fn(),
|
||||
global_step=tf.compat.v1.train.get_global_step())
|
||||
|
||||
|
||||
def main(unused_arg):
|
||||
train_model()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.compat.v1.app.run(main)
|
||||
+64
@@ -0,0 +1,64 @@
|
||||
# TensorFlow-Slim NASNet-A Implementation/Checkpoints
|
||||
This directory contains the code for the NASNet-A model from the paper
|
||||
[Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012) by Zoph et al.
|
||||
In nasnet.py there are three different configurations of NASNet-A that are implementented. One of the models is the NASNet-A built for CIFAR-10 and the
|
||||
other two are variants of NASNet-A trained on ImageNet, which are listed below.
|
||||
|
||||
# Pre-Trained Models
|
||||
Two NASNet-A checkpoints are available that have been trained on the
|
||||
[ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/)
|
||||
image classification dataset. Accuracies were computed by evaluating using a single image crop.
|
||||
|
||||
Model Checkpoint | Million MACs | Million Parameters | Top-1 Accuracy| Top-5 Accuracy |
|
||||
:----:|:------------:|:----------:|:-------:|:-------:|
|
||||
[NASNet-A_Mobile_224](https://storage.googleapis.com/download.tensorflow.org/models/nasnet-a_mobile_04_10_2017.tar.gz)|564|5.3|74.0|91.6|
|
||||
[NASNet-A_Large_331](https://storage.googleapis.com/download.tensorflow.org/models/nasnet-a_large_04_10_2017.tar.gz)|23800|88.9|82.7|96.2|
|
||||
|
||||
|
||||
Here is an example of how to download the NASNet-A_Mobile_224 checkpoint. The way to download the NASNet-A_Large_331 is the same.
|
||||
|
||||
```shell
|
||||
CHECKPOINT_DIR=/tmp/checkpoints
|
||||
mkdir ${CHECKPOINT_DIR}
|
||||
cd ${CHECKPOINT_DIR}
|
||||
wget https://storage.googleapis.com/download.tensorflow.org/models/nasnet-a_mobile_04_10_2017.tar.gz
|
||||
tar -xvf nasnet-a_mobile_04_10_2017.tar.gz
|
||||
rm nasnet-a_mobile_04_10_2017.tar.gz
|
||||
```
|
||||
More information on integrating NASNet Models into your project can be found at the [TF-Slim Image Classification Library](https://github.com/tensorflow/models/blob/master/research/slim/README.md).
|
||||
|
||||
To get started running models on-device go to [TensorFlow Mobile](https://www.tensorflow.org/mobile/).
|
||||
|
||||
## Sample Commands for using NASNet-A Mobile and Large Checkpoints for Inference
|
||||
-------
|
||||
Run eval with the NASNet-A mobile ImageNet model
|
||||
|
||||
```shell
|
||||
DATASET_DIR=/tmp/imagenet
|
||||
EVAL_DIR=/tmp/tfmodel/eval
|
||||
CHECKPOINT_DIR=/tmp/checkpoints/model.ckpt
|
||||
python tensorflow_models/research/slim/eval_image_classifier \
|
||||
--checkpoint_path=${CHECKPOINT_DIR} \
|
||||
--eval_dir=${EVAL_DIR} \
|
||||
--dataset_dir=${DATASET_DIR} \
|
||||
--dataset_name=imagenet \
|
||||
--dataset_split_name=validation \
|
||||
--model_name=nasnet_mobile \
|
||||
--eval_image_size=224
|
||||
```
|
||||
|
||||
Run eval with the NASNet-A large ImageNet model
|
||||
|
||||
```shell
|
||||
DATASET_DIR=/tmp/imagenet
|
||||
EVAL_DIR=/tmp/tfmodel/eval
|
||||
CHECKPOINT_DIR=/tmp/checkpoints/model.ckpt
|
||||
python tensorflow_models/research/slim/eval_image_classifier \
|
||||
--checkpoint_path=${CHECKPOINT_DIR} \
|
||||
--eval_dir=${EVAL_DIR} \
|
||||
--dataset_dir=${DATASET_DIR} \
|
||||
--dataset_name=imagenet \
|
||||
--dataset_split_name=validation \
|
||||
--model_name=nasnet_large \
|
||||
--eval_image_size=331
|
||||
```
|
||||
+1
@@ -0,0 +1 @@
|
||||
|
||||
+554
@@ -0,0 +1,554 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for the NASNet classification networks.
|
||||
|
||||
Paper: https://arxiv.org/abs/1707.07012
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import copy
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
from tensorflow.contrib import training as contrib_training
|
||||
|
||||
from nets.nasnet import nasnet_utils
|
||||
|
||||
arg_scope = contrib_framework.arg_scope
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
# Notes for training NASNet Cifar Model
|
||||
# -------------------------------------
|
||||
# batch_size: 32
|
||||
# learning rate: 0.025
|
||||
# cosine (single period) learning rate decay
|
||||
# auxiliary head loss weighting: 0.4
|
||||
# clip global norm of all gradients by 5
|
||||
def cifar_config():
|
||||
return contrib_training.HParams(
|
||||
stem_multiplier=3.0,
|
||||
drop_path_keep_prob=0.6,
|
||||
num_cells=18,
|
||||
use_aux_head=1,
|
||||
num_conv_filters=32,
|
||||
dense_dropout_keep_prob=1.0,
|
||||
filter_scaling_rate=2.0,
|
||||
num_reduction_layers=2,
|
||||
data_format='NHWC',
|
||||
skip_reduction_layer_input=0,
|
||||
# 600 epochs with a batch size of 32
|
||||
# This is used for the drop path probabilities since it needs to increase
|
||||
# the drop out probability over the course of training.
|
||||
total_training_steps=937500,
|
||||
use_bounded_activation=False,
|
||||
)
|
||||
|
||||
|
||||
# Notes for training large NASNet model on ImageNet
|
||||
# -------------------------------------
|
||||
# batch size (per replica): 16
|
||||
# learning rate: 0.015 * 100
|
||||
# learning rate decay factor: 0.97
|
||||
# num epochs per decay: 2.4
|
||||
# sync sgd with 100 replicas
|
||||
# auxiliary head loss weighting: 0.4
|
||||
# label smoothing: 0.1
|
||||
# clip global norm of all gradients by 10
|
||||
def large_imagenet_config():
|
||||
return contrib_training.HParams(
|
||||
stem_multiplier=3.0,
|
||||
dense_dropout_keep_prob=0.5,
|
||||
num_cells=18,
|
||||
filter_scaling_rate=2.0,
|
||||
num_conv_filters=168,
|
||||
drop_path_keep_prob=0.7,
|
||||
use_aux_head=1,
|
||||
num_reduction_layers=2,
|
||||
data_format='NHWC',
|
||||
skip_reduction_layer_input=1,
|
||||
total_training_steps=250000,
|
||||
use_bounded_activation=False,
|
||||
)
|
||||
|
||||
|
||||
# Notes for training the mobile NASNet ImageNet model
|
||||
# -------------------------------------
|
||||
# batch size (per replica): 32
|
||||
# learning rate: 0.04 * 50
|
||||
# learning rate scaling factor: 0.97
|
||||
# num epochs per decay: 2.4
|
||||
# sync sgd with 50 replicas
|
||||
# auxiliary head weighting: 0.4
|
||||
# label smoothing: 0.1
|
||||
# clip global norm of all gradients by 10
|
||||
def mobile_imagenet_config():
|
||||
return contrib_training.HParams(
|
||||
stem_multiplier=1.0,
|
||||
dense_dropout_keep_prob=0.5,
|
||||
num_cells=12,
|
||||
filter_scaling_rate=2.0,
|
||||
drop_path_keep_prob=1.0,
|
||||
num_conv_filters=44,
|
||||
use_aux_head=1,
|
||||
num_reduction_layers=2,
|
||||
data_format='NHWC',
|
||||
skip_reduction_layer_input=0,
|
||||
total_training_steps=250000,
|
||||
use_bounded_activation=False,
|
||||
)
|
||||
|
||||
|
||||
def _update_hparams(hparams, is_training):
|
||||
"""Update hparams for given is_training option."""
|
||||
if not is_training:
|
||||
hparams.set_hparam('drop_path_keep_prob', 1.0)
|
||||
|
||||
|
||||
def nasnet_cifar_arg_scope(weight_decay=5e-4,
|
||||
batch_norm_decay=0.9,
|
||||
batch_norm_epsilon=1e-5):
|
||||
"""Defines the default arg scope for the NASNet-A Cifar model.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the NASNet Cifar Model.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
# Decay for the moving averages.
|
||||
'decay': batch_norm_decay,
|
||||
# epsilon to prevent 0s in variance.
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'scale': True,
|
||||
'fused': True,
|
||||
}
|
||||
weights_regularizer = contrib_layers.l2_regularizer(weight_decay)
|
||||
weights_initializer = contrib_layers.variance_scaling_initializer(
|
||||
mode='FAN_OUT')
|
||||
with arg_scope([slim.fully_connected, slim.conv2d, slim.separable_conv2d],
|
||||
weights_regularizer=weights_regularizer,
|
||||
weights_initializer=weights_initializer):
|
||||
with arg_scope([slim.fully_connected],
|
||||
activation_fn=None, scope='FC'):
|
||||
with arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
activation_fn=None, biases_initializer=None):
|
||||
with arg_scope([slim.batch_norm], **batch_norm_params) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def nasnet_mobile_arg_scope(weight_decay=4e-5,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=1e-3):
|
||||
"""Defines the default arg scope for the NASNet-A Mobile ImageNet model.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the NASNet Mobile Model.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
# Decay for the moving averages.
|
||||
'decay': batch_norm_decay,
|
||||
# epsilon to prevent 0s in variance.
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'scale': True,
|
||||
'fused': True,
|
||||
}
|
||||
weights_regularizer = contrib_layers.l2_regularizer(weight_decay)
|
||||
weights_initializer = contrib_layers.variance_scaling_initializer(
|
||||
mode='FAN_OUT')
|
||||
with arg_scope([slim.fully_connected, slim.conv2d, slim.separable_conv2d],
|
||||
weights_regularizer=weights_regularizer,
|
||||
weights_initializer=weights_initializer):
|
||||
with arg_scope([slim.fully_connected],
|
||||
activation_fn=None, scope='FC'):
|
||||
with arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
activation_fn=None, biases_initializer=None):
|
||||
with arg_scope([slim.batch_norm], **batch_norm_params) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def nasnet_large_arg_scope(weight_decay=5e-5,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=1e-3):
|
||||
"""Defines the default arg scope for the NASNet-A Large ImageNet model.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
batch_norm_decay: Decay for batch norm moving average.
|
||||
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
|
||||
in batch norm.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the NASNet Large Model.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
# Decay for the moving averages.
|
||||
'decay': batch_norm_decay,
|
||||
# epsilon to prevent 0s in variance.
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'scale': True,
|
||||
'fused': True,
|
||||
}
|
||||
weights_regularizer = contrib_layers.l2_regularizer(weight_decay)
|
||||
weights_initializer = contrib_layers.variance_scaling_initializer(
|
||||
mode='FAN_OUT')
|
||||
with arg_scope([slim.fully_connected, slim.conv2d, slim.separable_conv2d],
|
||||
weights_regularizer=weights_regularizer,
|
||||
weights_initializer=weights_initializer):
|
||||
with arg_scope([slim.fully_connected],
|
||||
activation_fn=None, scope='FC'):
|
||||
with arg_scope([slim.conv2d, slim.separable_conv2d],
|
||||
activation_fn=None, biases_initializer=None):
|
||||
with arg_scope([slim.batch_norm], **batch_norm_params) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def _build_aux_head(net, end_points, num_classes, hparams, scope):
|
||||
"""Auxiliary head used for all models across all datasets."""
|
||||
activation_fn = tf.nn.relu6 if hparams.use_bounded_activation else tf.nn.relu
|
||||
with tf.compat.v1.variable_scope(scope):
|
||||
aux_logits = tf.identity(net)
|
||||
with tf.compat.v1.variable_scope('aux_logits'):
|
||||
aux_logits = slim.avg_pool2d(
|
||||
aux_logits, [5, 5], stride=3, padding='VALID')
|
||||
aux_logits = slim.conv2d(aux_logits, 128, [1, 1], scope='proj')
|
||||
aux_logits = slim.batch_norm(aux_logits, scope='aux_bn0')
|
||||
aux_logits = activation_fn(aux_logits)
|
||||
# Shape of feature map before the final layer.
|
||||
shape = aux_logits.shape
|
||||
if hparams.data_format == 'NHWC':
|
||||
shape = shape[1:3]
|
||||
else:
|
||||
shape = shape[2:4]
|
||||
aux_logits = slim.conv2d(aux_logits, 768, shape, padding='VALID')
|
||||
aux_logits = slim.batch_norm(aux_logits, scope='aux_bn1')
|
||||
aux_logits = activation_fn(aux_logits)
|
||||
aux_logits = contrib_layers.flatten(aux_logits)
|
||||
aux_logits = slim.fully_connected(aux_logits, num_classes)
|
||||
end_points['AuxLogits'] = aux_logits
|
||||
|
||||
|
||||
def _imagenet_stem(inputs, hparams, stem_cell, current_step=None):
|
||||
"""Stem used for models trained on ImageNet."""
|
||||
num_stem_cells = 2
|
||||
|
||||
# 149 x 149 x 32
|
||||
num_stem_filters = int(32 * hparams.stem_multiplier)
|
||||
net = slim.conv2d(
|
||||
inputs, num_stem_filters, [3, 3], stride=2, scope='conv0',
|
||||
padding='VALID')
|
||||
net = slim.batch_norm(net, scope='conv0_bn')
|
||||
|
||||
# Run the reduction cells
|
||||
cell_outputs = [None, net]
|
||||
filter_scaling = 1.0 / (hparams.filter_scaling_rate**num_stem_cells)
|
||||
for cell_num in range(num_stem_cells):
|
||||
net = stem_cell(
|
||||
net,
|
||||
scope='cell_stem_{}'.format(cell_num),
|
||||
filter_scaling=filter_scaling,
|
||||
stride=2,
|
||||
prev_layer=cell_outputs[-2],
|
||||
cell_num=cell_num,
|
||||
current_step=current_step)
|
||||
cell_outputs.append(net)
|
||||
filter_scaling *= hparams.filter_scaling_rate
|
||||
return net, cell_outputs
|
||||
|
||||
|
||||
def _cifar_stem(inputs, hparams):
|
||||
"""Stem used for models trained on Cifar."""
|
||||
num_stem_filters = int(hparams.num_conv_filters * hparams.stem_multiplier)
|
||||
net = slim.conv2d(
|
||||
inputs,
|
||||
num_stem_filters,
|
||||
3,
|
||||
scope='l1_stem_3x3')
|
||||
net = slim.batch_norm(net, scope='l1_stem_bn')
|
||||
return net, [None, net]
|
||||
|
||||
|
||||
def build_nasnet_cifar(images, num_classes,
|
||||
is_training=True,
|
||||
config=None,
|
||||
current_step=None):
|
||||
"""Build NASNet model for the Cifar Dataset."""
|
||||
hparams = cifar_config() if config is None else copy.deepcopy(config)
|
||||
_update_hparams(hparams, is_training)
|
||||
|
||||
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
|
||||
tf.compat.v1.logging.info(
|
||||
'A GPU is available on the machine, consider using NCHW '
|
||||
'data format for increased speed on GPU.')
|
||||
|
||||
if hparams.data_format == 'NCHW':
|
||||
images = tf.transpose(a=images, perm=[0, 3, 1, 2])
|
||||
|
||||
# Calculate the total number of cells in the network
|
||||
# Add 2 for the reduction cells
|
||||
total_num_cells = hparams.num_cells + 2
|
||||
|
||||
normal_cell = nasnet_utils.NasNetANormalCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
reduction_cell = nasnet_utils.NasNetAReductionCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
with arg_scope([slim.dropout, nasnet_utils.drop_path, slim.batch_norm],
|
||||
is_training=is_training):
|
||||
with arg_scope([slim.avg_pool2d,
|
||||
slim.max_pool2d,
|
||||
slim.conv2d,
|
||||
slim.batch_norm,
|
||||
slim.separable_conv2d,
|
||||
nasnet_utils.factorized_reduction,
|
||||
nasnet_utils.global_avg_pool,
|
||||
nasnet_utils.get_channel_index,
|
||||
nasnet_utils.get_channel_dim],
|
||||
data_format=hparams.data_format):
|
||||
return _build_nasnet_base(images,
|
||||
normal_cell=normal_cell,
|
||||
reduction_cell=reduction_cell,
|
||||
num_classes=num_classes,
|
||||
hparams=hparams,
|
||||
is_training=is_training,
|
||||
stem_type='cifar',
|
||||
current_step=current_step)
|
||||
build_nasnet_cifar.default_image_size = 32
|
||||
|
||||
|
||||
def build_nasnet_mobile(images, num_classes,
|
||||
is_training=True,
|
||||
final_endpoint=None,
|
||||
config=None,
|
||||
current_step=None):
|
||||
"""Build NASNet Mobile model for the ImageNet Dataset."""
|
||||
hparams = (mobile_imagenet_config() if config is None
|
||||
else copy.deepcopy(config))
|
||||
_update_hparams(hparams, is_training)
|
||||
|
||||
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
|
||||
tf.compat.v1.logging.info(
|
||||
'A GPU is available on the machine, consider using NCHW '
|
||||
'data format for increased speed on GPU.')
|
||||
|
||||
if hparams.data_format == 'NCHW':
|
||||
images = tf.transpose(a=images, perm=[0, 3, 1, 2])
|
||||
|
||||
# Calculate the total number of cells in the network
|
||||
# Add 2 for the reduction cells
|
||||
total_num_cells = hparams.num_cells + 2
|
||||
# If ImageNet, then add an additional two for the stem cells
|
||||
total_num_cells += 2
|
||||
|
||||
normal_cell = nasnet_utils.NasNetANormalCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
reduction_cell = nasnet_utils.NasNetAReductionCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
with arg_scope([slim.dropout, nasnet_utils.drop_path, slim.batch_norm],
|
||||
is_training=is_training):
|
||||
with arg_scope([slim.avg_pool2d,
|
||||
slim.max_pool2d,
|
||||
slim.conv2d,
|
||||
slim.batch_norm,
|
||||
slim.separable_conv2d,
|
||||
nasnet_utils.factorized_reduction,
|
||||
nasnet_utils.global_avg_pool,
|
||||
nasnet_utils.get_channel_index,
|
||||
nasnet_utils.get_channel_dim],
|
||||
data_format=hparams.data_format):
|
||||
return _build_nasnet_base(images,
|
||||
normal_cell=normal_cell,
|
||||
reduction_cell=reduction_cell,
|
||||
num_classes=num_classes,
|
||||
hparams=hparams,
|
||||
is_training=is_training,
|
||||
stem_type='imagenet',
|
||||
final_endpoint=final_endpoint,
|
||||
current_step=current_step)
|
||||
build_nasnet_mobile.default_image_size = 224
|
||||
|
||||
|
||||
def build_nasnet_large(images, num_classes,
|
||||
is_training=True,
|
||||
final_endpoint=None,
|
||||
config=None,
|
||||
current_step=None):
|
||||
"""Build NASNet Large model for the ImageNet Dataset."""
|
||||
hparams = (large_imagenet_config() if config is None
|
||||
else copy.deepcopy(config))
|
||||
_update_hparams(hparams, is_training)
|
||||
|
||||
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
|
||||
tf.compat.v1.logging.info(
|
||||
'A GPU is available on the machine, consider using NCHW '
|
||||
'data format for increased speed on GPU.')
|
||||
|
||||
if hparams.data_format == 'NCHW':
|
||||
images = tf.transpose(a=images, perm=[0, 3, 1, 2])
|
||||
|
||||
# Calculate the total number of cells in the network
|
||||
# Add 2 for the reduction cells
|
||||
total_num_cells = hparams.num_cells + 2
|
||||
# If ImageNet, then add an additional two for the stem cells
|
||||
total_num_cells += 2
|
||||
|
||||
normal_cell = nasnet_utils.NasNetANormalCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
reduction_cell = nasnet_utils.NasNetAReductionCell(
|
||||
hparams.num_conv_filters, hparams.drop_path_keep_prob,
|
||||
total_num_cells, hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
with arg_scope([slim.dropout, nasnet_utils.drop_path, slim.batch_norm],
|
||||
is_training=is_training):
|
||||
with arg_scope([slim.avg_pool2d,
|
||||
slim.max_pool2d,
|
||||
slim.conv2d,
|
||||
slim.batch_norm,
|
||||
slim.separable_conv2d,
|
||||
nasnet_utils.factorized_reduction,
|
||||
nasnet_utils.global_avg_pool,
|
||||
nasnet_utils.get_channel_index,
|
||||
nasnet_utils.get_channel_dim],
|
||||
data_format=hparams.data_format):
|
||||
return _build_nasnet_base(images,
|
||||
normal_cell=normal_cell,
|
||||
reduction_cell=reduction_cell,
|
||||
num_classes=num_classes,
|
||||
hparams=hparams,
|
||||
is_training=is_training,
|
||||
stem_type='imagenet',
|
||||
final_endpoint=final_endpoint,
|
||||
current_step=current_step)
|
||||
build_nasnet_large.default_image_size = 331
|
||||
|
||||
|
||||
def _build_nasnet_base(images,
|
||||
normal_cell,
|
||||
reduction_cell,
|
||||
num_classes,
|
||||
hparams,
|
||||
is_training,
|
||||
stem_type,
|
||||
final_endpoint=None,
|
||||
current_step=None):
|
||||
"""Constructs a NASNet image model."""
|
||||
|
||||
end_points = {}
|
||||
def add_and_check_endpoint(endpoint_name, net):
|
||||
end_points[endpoint_name] = net
|
||||
return final_endpoint and (endpoint_name == final_endpoint)
|
||||
|
||||
# Find where to place the reduction cells or stride normal cells
|
||||
reduction_indices = nasnet_utils.calc_reduction_layers(
|
||||
hparams.num_cells, hparams.num_reduction_layers)
|
||||
stem_cell = reduction_cell
|
||||
|
||||
if stem_type == 'imagenet':
|
||||
stem = lambda: _imagenet_stem(images, hparams, stem_cell)
|
||||
elif stem_type == 'cifar':
|
||||
stem = lambda: _cifar_stem(images, hparams)
|
||||
else:
|
||||
raise ValueError('Unknown stem_type: ', stem_type)
|
||||
net, cell_outputs = stem()
|
||||
if add_and_check_endpoint('Stem', net): return net, end_points
|
||||
|
||||
# Setup for building in the auxiliary head.
|
||||
aux_head_cell_idxes = []
|
||||
if len(reduction_indices) >= 2:
|
||||
aux_head_cell_idxes.append(reduction_indices[1] - 1)
|
||||
|
||||
# Run the cells
|
||||
filter_scaling = 1.0
|
||||
# true_cell_num accounts for the stem cells
|
||||
true_cell_num = 2 if stem_type == 'imagenet' else 0
|
||||
activation_fn = tf.nn.relu6 if hparams.use_bounded_activation else tf.nn.relu
|
||||
for cell_num in range(hparams.num_cells):
|
||||
stride = 1
|
||||
if hparams.skip_reduction_layer_input:
|
||||
prev_layer = cell_outputs[-2]
|
||||
if cell_num in reduction_indices:
|
||||
filter_scaling *= hparams.filter_scaling_rate
|
||||
net = reduction_cell(
|
||||
net,
|
||||
scope='reduction_cell_{}'.format(reduction_indices.index(cell_num)),
|
||||
filter_scaling=filter_scaling,
|
||||
stride=2,
|
||||
prev_layer=cell_outputs[-2],
|
||||
cell_num=true_cell_num,
|
||||
current_step=current_step)
|
||||
if add_and_check_endpoint(
|
||||
'Reduction_Cell_{}'.format(reduction_indices.index(cell_num)), net):
|
||||
return net, end_points
|
||||
true_cell_num += 1
|
||||
cell_outputs.append(net)
|
||||
if not hparams.skip_reduction_layer_input:
|
||||
prev_layer = cell_outputs[-2]
|
||||
net = normal_cell(
|
||||
net,
|
||||
scope='cell_{}'.format(cell_num),
|
||||
filter_scaling=filter_scaling,
|
||||
stride=stride,
|
||||
prev_layer=prev_layer,
|
||||
cell_num=true_cell_num,
|
||||
current_step=current_step)
|
||||
|
||||
if add_and_check_endpoint('Cell_{}'.format(cell_num), net):
|
||||
return net, end_points
|
||||
true_cell_num += 1
|
||||
if (hparams.use_aux_head and cell_num in aux_head_cell_idxes and
|
||||
num_classes and is_training):
|
||||
aux_net = activation_fn(net)
|
||||
_build_aux_head(aux_net, end_points, num_classes, hparams,
|
||||
scope='aux_{}'.format(cell_num))
|
||||
cell_outputs.append(net)
|
||||
|
||||
# Final softmax layer
|
||||
with tf.compat.v1.variable_scope('final_layer'):
|
||||
net = activation_fn(net)
|
||||
net = nasnet_utils.global_avg_pool(net)
|
||||
if add_and_check_endpoint('global_pool', net) or not num_classes:
|
||||
return net, end_points
|
||||
net = slim.dropout(net, hparams.dense_dropout_keep_prob, scope='dropout')
|
||||
logits = slim.fully_connected(net, num_classes)
|
||||
|
||||
if add_and_check_endpoint('Logits', logits):
|
||||
return net, end_points
|
||||
|
||||
predictions = tf.nn.softmax(logits, name='predictions')
|
||||
if add_and_check_endpoint('Predictions', predictions):
|
||||
return net, end_points
|
||||
return logits, end_points
|
||||
+413
@@ -0,0 +1,413 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.nasnet."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets.nasnet import nasnet
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class NASNetTest(tf.test.TestCase):
|
||||
|
||||
def testBuildLogitsCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
logits, end_points = nasnet.build_nasnet_cifar(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildLogitsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
logits, end_points = nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildLogitsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
|
||||
logits, end_points = nasnet.build_nasnet_large(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildPreLogitsCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
net, end_points = nasnet.build_nasnet_cifar(inputs, num_classes)
|
||||
self.assertFalse('AuxLogits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
self.assertTrue(net.op.name.startswith('final_layer/Mean'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 768])
|
||||
|
||||
def testBuildPreLogitsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
net, end_points = nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
self.assertFalse('AuxLogits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
self.assertTrue(net.op.name.startswith('final_layer/Mean'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1056])
|
||||
|
||||
def testBuildPreLogitsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
|
||||
net, end_points = nasnet.build_nasnet_large(inputs, num_classes)
|
||||
self.assertFalse('AuxLogits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
self.assertTrue(net.op.name.startswith('final_layer/Mean'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 4032])
|
||||
|
||||
def testAllEndPointsShapesCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_cifar(inputs, num_classes)
|
||||
endpoints_shapes = {'Stem': [batch_size, 32, 32, 96],
|
||||
'Cell_0': [batch_size, 32, 32, 192],
|
||||
'Cell_1': [batch_size, 32, 32, 192],
|
||||
'Cell_2': [batch_size, 32, 32, 192],
|
||||
'Cell_3': [batch_size, 32, 32, 192],
|
||||
'Cell_4': [batch_size, 32, 32, 192],
|
||||
'Cell_5': [batch_size, 32, 32, 192],
|
||||
'Cell_6': [batch_size, 16, 16, 384],
|
||||
'Cell_7': [batch_size, 16, 16, 384],
|
||||
'Cell_8': [batch_size, 16, 16, 384],
|
||||
'Cell_9': [batch_size, 16, 16, 384],
|
||||
'Cell_10': [batch_size, 16, 16, 384],
|
||||
'Cell_11': [batch_size, 16, 16, 384],
|
||||
'Cell_12': [batch_size, 8, 8, 768],
|
||||
'Cell_13': [batch_size, 8, 8, 768],
|
||||
'Cell_14': [batch_size, 8, 8, 768],
|
||||
'Cell_15': [batch_size, 8, 8, 768],
|
||||
'Cell_16': [batch_size, 8, 8, 768],
|
||||
'Cell_17': [batch_size, 8, 8, 768],
|
||||
'Reduction_Cell_0': [batch_size, 16, 16, 256],
|
||||
'Reduction_Cell_1': [batch_size, 8, 8, 512],
|
||||
'global_pool': [batch_size, 768],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, num_classes],
|
||||
'Logits': [batch_size, num_classes],
|
||||
'Predictions': [batch_size, num_classes]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
tf.compat.v1.logging.info('Endpoint name: {}'.format(endpoint_name))
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testNoAuxHeadCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
for use_aux_head in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.cifar_config()
|
||||
config.set_hparam('use_aux_head', int(use_aux_head))
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_cifar(inputs, num_classes,
|
||||
config=config)
|
||||
self.assertEqual('AuxLogits' in end_points, use_aux_head)
|
||||
|
||||
def testAllEndPointsShapesMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
endpoints_shapes = {'Stem': [batch_size, 28, 28, 88],
|
||||
'Cell_0': [batch_size, 28, 28, 264],
|
||||
'Cell_1': [batch_size, 28, 28, 264],
|
||||
'Cell_2': [batch_size, 28, 28, 264],
|
||||
'Cell_3': [batch_size, 28, 28, 264],
|
||||
'Cell_4': [batch_size, 14, 14, 528],
|
||||
'Cell_5': [batch_size, 14, 14, 528],
|
||||
'Cell_6': [batch_size, 14, 14, 528],
|
||||
'Cell_7': [batch_size, 14, 14, 528],
|
||||
'Cell_8': [batch_size, 7, 7, 1056],
|
||||
'Cell_9': [batch_size, 7, 7, 1056],
|
||||
'Cell_10': [batch_size, 7, 7, 1056],
|
||||
'Cell_11': [batch_size, 7, 7, 1056],
|
||||
'Reduction_Cell_0': [batch_size, 14, 14, 352],
|
||||
'Reduction_Cell_1': [batch_size, 7, 7, 704],
|
||||
'global_pool': [batch_size, 1056],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, num_classes],
|
||||
'Logits': [batch_size, num_classes],
|
||||
'Predictions': [batch_size, num_classes]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
tf.compat.v1.logging.info('Endpoint name: {}'.format(endpoint_name))
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testNoAuxHeadMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
for use_aux_head in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.mobile_imagenet_config()
|
||||
config.set_hparam('use_aux_head', int(use_aux_head))
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_mobile(inputs, num_classes,
|
||||
config=config)
|
||||
self.assertEqual('AuxLogits' in end_points, use_aux_head)
|
||||
|
||||
def testAllEndPointsShapesLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_large(inputs, num_classes)
|
||||
endpoints_shapes = {'Stem': [batch_size, 42, 42, 336],
|
||||
'Cell_0': [batch_size, 42, 42, 1008],
|
||||
'Cell_1': [batch_size, 42, 42, 1008],
|
||||
'Cell_2': [batch_size, 42, 42, 1008],
|
||||
'Cell_3': [batch_size, 42, 42, 1008],
|
||||
'Cell_4': [batch_size, 42, 42, 1008],
|
||||
'Cell_5': [batch_size, 42, 42, 1008],
|
||||
'Cell_6': [batch_size, 21, 21, 2016],
|
||||
'Cell_7': [batch_size, 21, 21, 2016],
|
||||
'Cell_8': [batch_size, 21, 21, 2016],
|
||||
'Cell_9': [batch_size, 21, 21, 2016],
|
||||
'Cell_10': [batch_size, 21, 21, 2016],
|
||||
'Cell_11': [batch_size, 21, 21, 2016],
|
||||
'Cell_12': [batch_size, 11, 11, 4032],
|
||||
'Cell_13': [batch_size, 11, 11, 4032],
|
||||
'Cell_14': [batch_size, 11, 11, 4032],
|
||||
'Cell_15': [batch_size, 11, 11, 4032],
|
||||
'Cell_16': [batch_size, 11, 11, 4032],
|
||||
'Cell_17': [batch_size, 11, 11, 4032],
|
||||
'Reduction_Cell_0': [batch_size, 21, 21, 1344],
|
||||
'Reduction_Cell_1': [batch_size, 11, 11, 2688],
|
||||
'global_pool': [batch_size, 4032],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, num_classes],
|
||||
'Logits': [batch_size, num_classes],
|
||||
'Predictions': [batch_size, num_classes]}
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
tf.compat.v1.logging.info('Endpoint name: {}'.format(endpoint_name))
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertTrue(endpoint_name in end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testNoAuxHeadLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
for use_aux_head in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.large_imagenet_config()
|
||||
config.set_hparam('use_aux_head', int(use_aux_head))
|
||||
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_large(inputs, num_classes,
|
||||
config=config)
|
||||
self.assertEqual('AuxLogits' in end_points, use_aux_head)
|
||||
|
||||
def testVariablesSetDeviceMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
# Force all Variables to reside on the device.
|
||||
with tf.compat.v1.variable_scope('on_cpu'), tf.device('/cpu:0'):
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
with tf.compat.v1.variable_scope('on_gpu'), tf.device('/gpu:0'):
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'):
|
||||
self.assertDeviceEqual(v.device, '/cpu:0')
|
||||
for v in tf.compat.v1.get_collection(
|
||||
tf.compat.v1.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'):
|
||||
self.assertDeviceEqual(v.device, '/gpu:0')
|
||||
|
||||
def testUnknownBatchSizeMobileModel(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (None, height, width, 3))
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
logits, _ = nasnet.build_nasnet_mobile(inputs, num_classes)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[None, num_classes])
|
||||
images = tf.random.uniform((batch_size, height, width, 3))
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits, {inputs: images.eval()})
|
||||
self.assertEquals(output.shape, (batch_size, num_classes))
|
||||
|
||||
def testEvaluationMobileModel(self):
|
||||
batch_size = 2
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
with self.test_session() as sess:
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
logits, _ = nasnet.build_nasnet_mobile(eval_inputs,
|
||||
num_classes,
|
||||
is_training=False)
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(predictions)
|
||||
self.assertEquals(output.shape, (batch_size,))
|
||||
|
||||
def testOverrideHParamsCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.cifar_config()
|
||||
config.set_hparam('data_format', 'NCHW')
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_cifar(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertListEqual(
|
||||
end_points['Stem'].shape.as_list(), [batch_size, 96, 32, 32])
|
||||
|
||||
def testOverrideHParamsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.mobile_imagenet_config()
|
||||
config.set_hparam('data_format', 'NCHW')
|
||||
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_mobile(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertListEqual(
|
||||
end_points['Stem'].shape.as_list(), [batch_size, 88, 28, 28])
|
||||
|
||||
def testOverrideHParamsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = nasnet.large_imagenet_config()
|
||||
config.set_hparam('data_format', 'NCHW')
|
||||
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
|
||||
_, end_points = nasnet.build_nasnet_large(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertListEqual(
|
||||
end_points['Stem'].shape.as_list(), [batch_size, 336, 42, 42])
|
||||
|
||||
def testCurrentStepCifarModel(self):
|
||||
batch_size = 5
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
global_step = tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
logits, end_points = nasnet.build_nasnet_cifar(inputs,
|
||||
num_classes,
|
||||
current_step=global_step)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testUseBoundedAcitvationCifarModel(self):
|
||||
batch_size = 1
|
||||
height, width = 32, 32
|
||||
num_classes = 10
|
||||
for use_bounded_activation in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
config = nasnet.cifar_config()
|
||||
config.set_hparam('use_bounded_activation', use_bounded_activation)
|
||||
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
|
||||
_, _ = nasnet.build_nasnet_cifar(
|
||||
inputs, num_classes, config=config)
|
||||
for node in tf.compat.v1.get_default_graph().as_graph_def().node:
|
||||
if node.op.startswith('Relu'):
|
||||
self.assertEqual(node.op == 'Relu6', use_bounded_activation)
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+534
@@ -0,0 +1,534 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""A custom module for some common operations used by NASNet.
|
||||
|
||||
Functions exposed in this file:
|
||||
- calc_reduction_layers
|
||||
- get_channel_index
|
||||
- get_channel_dim
|
||||
- global_avg_pool
|
||||
- factorized_reduction
|
||||
- drop_path
|
||||
|
||||
Classes exposed in this file:
|
||||
- NasNetABaseCell
|
||||
- NasNetANormalCell
|
||||
- NasNetAReductionCell
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
arg_scope = contrib_framework.arg_scope
|
||||
slim = contrib_slim
|
||||
|
||||
DATA_FORMAT_NCHW = 'NCHW'
|
||||
DATA_FORMAT_NHWC = 'NHWC'
|
||||
INVALID = 'null'
|
||||
# The cap for tf.clip_by_value, it's hinted from the activation distribution
|
||||
# that the majority of activation values are in the range [-6, 6].
|
||||
CLIP_BY_VALUE_CAP = 6
|
||||
|
||||
|
||||
def calc_reduction_layers(num_cells, num_reduction_layers):
|
||||
"""Figure out what layers should have reductions."""
|
||||
reduction_layers = []
|
||||
for pool_num in range(1, num_reduction_layers + 1):
|
||||
layer_num = (float(pool_num) / (num_reduction_layers + 1)) * num_cells
|
||||
layer_num = int(layer_num)
|
||||
reduction_layers.append(layer_num)
|
||||
return reduction_layers
|
||||
|
||||
|
||||
@contrib_framework.add_arg_scope
|
||||
def get_channel_index(data_format=INVALID):
|
||||
assert data_format != INVALID
|
||||
axis = 3 if data_format == 'NHWC' else 1
|
||||
return axis
|
||||
|
||||
|
||||
@contrib_framework.add_arg_scope
|
||||
def get_channel_dim(shape, data_format=INVALID):
|
||||
assert data_format != INVALID
|
||||
assert len(shape) == 4
|
||||
if data_format == 'NHWC':
|
||||
return int(shape[3])
|
||||
elif data_format == 'NCHW':
|
||||
return int(shape[1])
|
||||
else:
|
||||
raise ValueError('Not a valid data_format', data_format)
|
||||
|
||||
|
||||
@contrib_framework.add_arg_scope
|
||||
def global_avg_pool(x, data_format=INVALID):
|
||||
"""Average pool away the height and width spatial dimensions of x."""
|
||||
assert data_format != INVALID
|
||||
assert data_format in ['NHWC', 'NCHW']
|
||||
assert x.shape.ndims == 4
|
||||
if data_format == 'NHWC':
|
||||
return tf.reduce_mean(input_tensor=x, axis=[1, 2])
|
||||
else:
|
||||
return tf.reduce_mean(input_tensor=x, axis=[2, 3])
|
||||
|
||||
|
||||
@contrib_framework.add_arg_scope
|
||||
def factorized_reduction(net, output_filters, stride, data_format=INVALID):
|
||||
"""Reduces the shape of net without information loss due to striding."""
|
||||
assert data_format != INVALID
|
||||
if stride == 1:
|
||||
net = slim.conv2d(net, output_filters, 1, scope='path_conv')
|
||||
net = slim.batch_norm(net, scope='path_bn')
|
||||
return net
|
||||
if data_format == 'NHWC':
|
||||
stride_spec = [1, stride, stride, 1]
|
||||
else:
|
||||
stride_spec = [1, 1, stride, stride]
|
||||
|
||||
# Skip path 1
|
||||
path1 = tf.compat.v2.nn.avg_pool2d(
|
||||
input=net,
|
||||
ksize=[1, 1, 1, 1],
|
||||
strides=stride_spec,
|
||||
padding='VALID',
|
||||
data_format=data_format)
|
||||
path1 = slim.conv2d(path1, int(output_filters / 2), 1, scope='path1_conv')
|
||||
|
||||
# Skip path 2
|
||||
# First pad with 0's on the right and bottom, then shift the filter to
|
||||
# include those 0's that were added.
|
||||
if data_format == 'NHWC':
|
||||
pad_arr = [[0, 0], [0, 1], [0, 1], [0, 0]]
|
||||
path2 = tf.pad(tensor=net, paddings=pad_arr)[:, 1:, 1:, :]
|
||||
concat_axis = 3
|
||||
else:
|
||||
pad_arr = [[0, 0], [0, 0], [0, 1], [0, 1]]
|
||||
path2 = tf.pad(tensor=net, paddings=pad_arr)[:, :, 1:, 1:]
|
||||
concat_axis = 1
|
||||
|
||||
path2 = tf.compat.v2.nn.avg_pool2d(
|
||||
input=path2,
|
||||
ksize=[1, 1, 1, 1],
|
||||
strides=stride_spec,
|
||||
padding='VALID',
|
||||
data_format=data_format)
|
||||
|
||||
# If odd number of filters, add an additional one to the second path.
|
||||
final_filter_size = int(output_filters / 2) + int(output_filters % 2)
|
||||
path2 = slim.conv2d(path2, final_filter_size, 1, scope='path2_conv')
|
||||
|
||||
# Concat and apply BN
|
||||
final_path = tf.concat(values=[path1, path2], axis=concat_axis)
|
||||
final_path = slim.batch_norm(final_path, scope='final_path_bn')
|
||||
return final_path
|
||||
|
||||
|
||||
@contrib_framework.add_arg_scope
|
||||
def drop_path(net, keep_prob, is_training=True):
|
||||
"""Drops out a whole example hiddenstate with the specified probability."""
|
||||
if is_training:
|
||||
batch_size = tf.shape(input=net)[0]
|
||||
noise_shape = [batch_size, 1, 1, 1]
|
||||
random_tensor = keep_prob
|
||||
random_tensor += tf.random.uniform(noise_shape, dtype=tf.float32)
|
||||
binary_tensor = tf.cast(tf.floor(random_tensor), net.dtype)
|
||||
keep_prob_inv = tf.cast(1.0 / keep_prob, net.dtype)
|
||||
net = net * keep_prob_inv * binary_tensor
|
||||
|
||||
return net
|
||||
|
||||
|
||||
def _operation_to_filter_shape(operation):
|
||||
splitted_operation = operation.split('x')
|
||||
filter_shape = int(splitted_operation[0][-1])
|
||||
assert filter_shape == int(
|
||||
splitted_operation[1][0]), 'Rectangular filters not supported.'
|
||||
return filter_shape
|
||||
|
||||
|
||||
def _operation_to_num_layers(operation):
|
||||
splitted_operation = operation.split('_')
|
||||
if 'x' in splitted_operation[-1]:
|
||||
return 1
|
||||
return int(splitted_operation[-1])
|
||||
|
||||
|
||||
def _operation_to_info(operation):
|
||||
"""Takes in operation name and returns meta information.
|
||||
|
||||
An example would be 'separable_3x3_4' -> (3, 4).
|
||||
|
||||
Args:
|
||||
operation: String that corresponds to convolution operation.
|
||||
|
||||
Returns:
|
||||
Tuple of (filter shape, num layers).
|
||||
"""
|
||||
num_layers = _operation_to_num_layers(operation)
|
||||
filter_shape = _operation_to_filter_shape(operation)
|
||||
return num_layers, filter_shape
|
||||
|
||||
|
||||
def _stacked_separable_conv(net, stride, operation, filter_size,
|
||||
use_bounded_activation):
|
||||
"""Takes in an operations and parses it to the correct sep operation."""
|
||||
num_layers, kernel_size = _operation_to_info(operation)
|
||||
activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
|
||||
for layer_num in range(num_layers - 1):
|
||||
net = activation_fn(net)
|
||||
net = slim.separable_conv2d(
|
||||
net,
|
||||
filter_size,
|
||||
kernel_size,
|
||||
depth_multiplier=1,
|
||||
scope='separable_{0}x{0}_{1}'.format(kernel_size, layer_num + 1),
|
||||
stride=stride)
|
||||
net = slim.batch_norm(
|
||||
net, scope='bn_sep_{0}x{0}_{1}'.format(kernel_size, layer_num + 1))
|
||||
stride = 1
|
||||
net = activation_fn(net)
|
||||
net = slim.separable_conv2d(
|
||||
net,
|
||||
filter_size,
|
||||
kernel_size,
|
||||
depth_multiplier=1,
|
||||
scope='separable_{0}x{0}_{1}'.format(kernel_size, num_layers),
|
||||
stride=stride)
|
||||
net = slim.batch_norm(
|
||||
net, scope='bn_sep_{0}x{0}_{1}'.format(kernel_size, num_layers))
|
||||
return net
|
||||
|
||||
|
||||
def _operation_to_pooling_type(operation):
|
||||
"""Takes in the operation string and returns the pooling type."""
|
||||
splitted_operation = operation.split('_')
|
||||
return splitted_operation[0]
|
||||
|
||||
|
||||
def _operation_to_pooling_shape(operation):
|
||||
"""Takes in the operation string and returns the pooling kernel shape."""
|
||||
splitted_operation = operation.split('_')
|
||||
shape = splitted_operation[-1]
|
||||
assert 'x' in shape
|
||||
filter_height, filter_width = shape.split('x')
|
||||
assert filter_height == filter_width
|
||||
return int(filter_height)
|
||||
|
||||
|
||||
def _operation_to_pooling_info(operation):
|
||||
"""Parses the pooling operation string to return its type and shape."""
|
||||
pooling_type = _operation_to_pooling_type(operation)
|
||||
pooling_shape = _operation_to_pooling_shape(operation)
|
||||
return pooling_type, pooling_shape
|
||||
|
||||
|
||||
def _pooling(net, stride, operation, use_bounded_activation):
|
||||
"""Parses operation and performs the correct pooling operation on net."""
|
||||
padding = 'SAME'
|
||||
pooling_type, pooling_shape = _operation_to_pooling_info(operation)
|
||||
if use_bounded_activation:
|
||||
net = tf.nn.relu6(net)
|
||||
if pooling_type == 'avg':
|
||||
net = slim.avg_pool2d(net, pooling_shape, stride=stride, padding=padding)
|
||||
elif pooling_type == 'max':
|
||||
net = slim.max_pool2d(net, pooling_shape, stride=stride, padding=padding)
|
||||
else:
|
||||
raise NotImplementedError('Unimplemented pooling type: ', pooling_type)
|
||||
return net
|
||||
|
||||
|
||||
class NasNetABaseCell(object):
|
||||
"""NASNet Cell class that is used as a 'layer' in image architectures.
|
||||
|
||||
Args:
|
||||
num_conv_filters: The number of filters for each convolution operation.
|
||||
operations: List of operations that are performed in the NASNet Cell in
|
||||
order.
|
||||
used_hiddenstates: Binary array that signals if the hiddenstate was used
|
||||
within the cell. This is used to determine what outputs of the cell
|
||||
should be concatenated together.
|
||||
hiddenstate_indices: Determines what hiddenstates should be combined
|
||||
together with the specified operations to create the NASNet cell.
|
||||
use_bounded_activation: Whether or not to use bounded activations. Bounded
|
||||
activations better lend themselves to quantized inference.
|
||||
"""
|
||||
|
||||
def __init__(self, num_conv_filters, operations, used_hiddenstates,
|
||||
hiddenstate_indices, drop_path_keep_prob, total_num_cells,
|
||||
total_training_steps, use_bounded_activation=False):
|
||||
self._num_conv_filters = num_conv_filters
|
||||
self._operations = operations
|
||||
self._used_hiddenstates = used_hiddenstates
|
||||
self._hiddenstate_indices = hiddenstate_indices
|
||||
self._drop_path_keep_prob = drop_path_keep_prob
|
||||
self._total_num_cells = total_num_cells
|
||||
self._total_training_steps = total_training_steps
|
||||
self._use_bounded_activation = use_bounded_activation
|
||||
|
||||
def _reduce_prev_layer(self, prev_layer, curr_layer):
|
||||
"""Matches dimension of prev_layer to the curr_layer."""
|
||||
# Set the prev layer to the current layer if it is none
|
||||
if prev_layer is None:
|
||||
return curr_layer
|
||||
curr_num_filters = self._filter_size
|
||||
prev_num_filters = get_channel_dim(prev_layer.shape)
|
||||
curr_filter_shape = int(curr_layer.shape[2])
|
||||
prev_filter_shape = int(prev_layer.shape[2])
|
||||
activation_fn = tf.nn.relu6 if self._use_bounded_activation else tf.nn.relu
|
||||
if curr_filter_shape != prev_filter_shape:
|
||||
prev_layer = activation_fn(prev_layer)
|
||||
prev_layer = factorized_reduction(
|
||||
prev_layer, curr_num_filters, stride=2)
|
||||
elif curr_num_filters != prev_num_filters:
|
||||
prev_layer = activation_fn(prev_layer)
|
||||
prev_layer = slim.conv2d(
|
||||
prev_layer, curr_num_filters, 1, scope='prev_1x1')
|
||||
prev_layer = slim.batch_norm(prev_layer, scope='prev_bn')
|
||||
return prev_layer
|
||||
|
||||
def _cell_base(self, net, prev_layer):
|
||||
"""Runs the beginning of the conv cell before the predicted ops are run."""
|
||||
num_filters = self._filter_size
|
||||
|
||||
# Check to be sure prev layer stuff is setup correctly
|
||||
prev_layer = self._reduce_prev_layer(prev_layer, net)
|
||||
|
||||
net = tf.nn.relu6(net) if self._use_bounded_activation else tf.nn.relu(net)
|
||||
net = slim.conv2d(net, num_filters, 1, scope='1x1')
|
||||
net = slim.batch_norm(net, scope='beginning_bn')
|
||||
# num_or_size_splits=1
|
||||
net = [net]
|
||||
net.append(prev_layer)
|
||||
return net
|
||||
|
||||
def __call__(self, net, scope=None, filter_scaling=1, stride=1,
|
||||
prev_layer=None, cell_num=-1, current_step=None):
|
||||
"""Runs the conv cell."""
|
||||
self._cell_num = cell_num
|
||||
self._filter_scaling = filter_scaling
|
||||
self._filter_size = int(self._num_conv_filters * filter_scaling)
|
||||
|
||||
i = 0
|
||||
with tf.compat.v1.variable_scope(scope):
|
||||
net = self._cell_base(net, prev_layer)
|
||||
for iteration in range(5):
|
||||
with tf.compat.v1.variable_scope('comb_iter_{}'.format(iteration)):
|
||||
left_hiddenstate_idx, right_hiddenstate_idx = (
|
||||
self._hiddenstate_indices[i],
|
||||
self._hiddenstate_indices[i + 1])
|
||||
original_input_left = left_hiddenstate_idx < 2
|
||||
original_input_right = right_hiddenstate_idx < 2
|
||||
h1 = net[left_hiddenstate_idx]
|
||||
h2 = net[right_hiddenstate_idx]
|
||||
|
||||
operation_left = self._operations[i]
|
||||
operation_right = self._operations[i+1]
|
||||
i += 2
|
||||
# Apply conv operations
|
||||
with tf.compat.v1.variable_scope('left'):
|
||||
h1 = self._apply_conv_operation(h1, operation_left,
|
||||
stride, original_input_left,
|
||||
current_step)
|
||||
with tf.compat.v1.variable_scope('right'):
|
||||
h2 = self._apply_conv_operation(h2, operation_right,
|
||||
stride, original_input_right,
|
||||
current_step)
|
||||
|
||||
# Combine hidden states using 'add'.
|
||||
with tf.compat.v1.variable_scope('combine'):
|
||||
h = h1 + h2
|
||||
if self._use_bounded_activation:
|
||||
h = tf.nn.relu6(h)
|
||||
|
||||
# Add hiddenstate to the list of hiddenstates we can choose from
|
||||
net.append(h)
|
||||
|
||||
with tf.compat.v1.variable_scope('cell_output'):
|
||||
net = self._combine_unused_states(net)
|
||||
|
||||
return net
|
||||
|
||||
def _apply_conv_operation(self, net, operation,
|
||||
stride, is_from_original_input, current_step):
|
||||
"""Applies the predicted conv operation to net."""
|
||||
# Dont stride if this is not one of the original hiddenstates
|
||||
if stride > 1 and not is_from_original_input:
|
||||
stride = 1
|
||||
input_filters = get_channel_dim(net.shape)
|
||||
filter_size = self._filter_size
|
||||
if 'separable' in operation:
|
||||
net = _stacked_separable_conv(net, stride, operation, filter_size,
|
||||
self._use_bounded_activation)
|
||||
if self._use_bounded_activation:
|
||||
net = tf.clip_by_value(net, -CLIP_BY_VALUE_CAP, CLIP_BY_VALUE_CAP)
|
||||
elif operation in ['none']:
|
||||
if self._use_bounded_activation:
|
||||
net = tf.nn.relu6(net)
|
||||
# Check if a stride is needed, then use a strided 1x1 here
|
||||
if stride > 1 or (input_filters != filter_size):
|
||||
if not self._use_bounded_activation:
|
||||
net = tf.nn.relu(net)
|
||||
net = slim.conv2d(net, filter_size, 1, stride=stride, scope='1x1')
|
||||
net = slim.batch_norm(net, scope='bn_1')
|
||||
if self._use_bounded_activation:
|
||||
net = tf.clip_by_value(net, -CLIP_BY_VALUE_CAP, CLIP_BY_VALUE_CAP)
|
||||
elif 'pool' in operation:
|
||||
net = _pooling(net, stride, operation, self._use_bounded_activation)
|
||||
if input_filters != filter_size:
|
||||
net = slim.conv2d(net, filter_size, 1, stride=1, scope='1x1')
|
||||
net = slim.batch_norm(net, scope='bn_1')
|
||||
if self._use_bounded_activation:
|
||||
net = tf.clip_by_value(net, -CLIP_BY_VALUE_CAP, CLIP_BY_VALUE_CAP)
|
||||
else:
|
||||
raise ValueError('Unimplemented operation', operation)
|
||||
|
||||
if operation != 'none':
|
||||
net = self._apply_drop_path(net, current_step=current_step)
|
||||
return net
|
||||
|
||||
def _combine_unused_states(self, net):
|
||||
"""Concatenate the unused hidden states of the cell."""
|
||||
used_hiddenstates = self._used_hiddenstates
|
||||
|
||||
final_height = int(net[-1].shape[2])
|
||||
final_num_filters = get_channel_dim(net[-1].shape)
|
||||
assert len(used_hiddenstates) == len(net)
|
||||
for idx, used_h in enumerate(used_hiddenstates):
|
||||
curr_height = int(net[idx].shape[2])
|
||||
curr_num_filters = get_channel_dim(net[idx].shape)
|
||||
|
||||
# Determine if a reduction should be applied to make the number of
|
||||
# filters match.
|
||||
should_reduce = final_num_filters != curr_num_filters
|
||||
should_reduce = (final_height != curr_height) or should_reduce
|
||||
should_reduce = should_reduce and not used_h
|
||||
if should_reduce:
|
||||
stride = 2 if final_height != curr_height else 1
|
||||
with tf.compat.v1.variable_scope('reduction_{}'.format(idx)):
|
||||
net[idx] = factorized_reduction(
|
||||
net[idx], final_num_filters, stride)
|
||||
|
||||
states_to_combine = (
|
||||
[h for h, is_used in zip(net, used_hiddenstates) if not is_used])
|
||||
|
||||
# Return the concat of all the states
|
||||
concat_axis = get_channel_index()
|
||||
net = tf.concat(values=states_to_combine, axis=concat_axis)
|
||||
return net
|
||||
|
||||
@contrib_framework.add_arg_scope # No public API. For internal use only.
|
||||
def _apply_drop_path(self, net, current_step=None,
|
||||
use_summaries=False, drop_connect_version='v3'):
|
||||
"""Apply drop_path regularization.
|
||||
|
||||
Args:
|
||||
net: the Tensor that gets drop_path regularization applied.
|
||||
current_step: a float32 Tensor with the current global_step value,
|
||||
to be divided by hparams.total_training_steps. Usually None, which
|
||||
defaults to tf.train.get_or_create_global_step() properly casted.
|
||||
use_summaries: a Python boolean. If set to False, no summaries are output.
|
||||
drop_connect_version: one of 'v1', 'v2', 'v3', controlling whether
|
||||
the dropout rate is scaled by current_step (v1), layer (v2), or
|
||||
both (v3, the default).
|
||||
|
||||
Returns:
|
||||
The dropped-out value of `net`.
|
||||
"""
|
||||
drop_path_keep_prob = self._drop_path_keep_prob
|
||||
if drop_path_keep_prob < 1.0:
|
||||
assert drop_connect_version in ['v1', 'v2', 'v3']
|
||||
if drop_connect_version in ['v2', 'v3']:
|
||||
# Scale keep prob by layer number
|
||||
assert self._cell_num != -1
|
||||
# The added 2 is for the reduction cells
|
||||
num_cells = self._total_num_cells
|
||||
layer_ratio = (self._cell_num + 1)/float(num_cells)
|
||||
if use_summaries:
|
||||
with tf.device('/cpu:0'):
|
||||
tf.compat.v1.summary.scalar('layer_ratio', layer_ratio)
|
||||
drop_path_keep_prob = 1 - layer_ratio * (1 - drop_path_keep_prob)
|
||||
if drop_connect_version in ['v1', 'v3']:
|
||||
# Decrease the keep probability over time
|
||||
if current_step is None:
|
||||
current_step = tf.compat.v1.train.get_or_create_global_step()
|
||||
current_step = tf.cast(current_step, tf.float32)
|
||||
drop_path_burn_in_steps = self._total_training_steps
|
||||
current_ratio = current_step / drop_path_burn_in_steps
|
||||
current_ratio = tf.minimum(1.0, current_ratio)
|
||||
if use_summaries:
|
||||
with tf.device('/cpu:0'):
|
||||
tf.compat.v1.summary.scalar('current_ratio', current_ratio)
|
||||
drop_path_keep_prob = (1 - current_ratio * (1 - drop_path_keep_prob))
|
||||
if use_summaries:
|
||||
with tf.device('/cpu:0'):
|
||||
tf.compat.v1.summary.scalar('drop_path_keep_prob',
|
||||
drop_path_keep_prob)
|
||||
net = drop_path(net, drop_path_keep_prob)
|
||||
return net
|
||||
|
||||
|
||||
class NasNetANormalCell(NasNetABaseCell):
|
||||
"""NASNetA Normal Cell."""
|
||||
|
||||
def __init__(self, num_conv_filters, drop_path_keep_prob, total_num_cells,
|
||||
total_training_steps, use_bounded_activation=False):
|
||||
operations = ['separable_5x5_2',
|
||||
'separable_3x3_2',
|
||||
'separable_5x5_2',
|
||||
'separable_3x3_2',
|
||||
'avg_pool_3x3',
|
||||
'none',
|
||||
'avg_pool_3x3',
|
||||
'avg_pool_3x3',
|
||||
'separable_3x3_2',
|
||||
'none']
|
||||
used_hiddenstates = [1, 0, 0, 0, 0, 0, 0]
|
||||
hiddenstate_indices = [0, 1, 1, 1, 0, 1, 1, 1, 0, 0]
|
||||
super(NasNetANormalCell, self).__init__(num_conv_filters, operations,
|
||||
used_hiddenstates,
|
||||
hiddenstate_indices,
|
||||
drop_path_keep_prob,
|
||||
total_num_cells,
|
||||
total_training_steps,
|
||||
use_bounded_activation)
|
||||
|
||||
|
||||
class NasNetAReductionCell(NasNetABaseCell):
|
||||
"""NASNetA Reduction Cell."""
|
||||
|
||||
def __init__(self, num_conv_filters, drop_path_keep_prob, total_num_cells,
|
||||
total_training_steps, use_bounded_activation=False):
|
||||
operations = ['separable_5x5_2',
|
||||
'separable_7x7_2',
|
||||
'max_pool_3x3',
|
||||
'separable_7x7_2',
|
||||
'avg_pool_3x3',
|
||||
'separable_5x5_2',
|
||||
'none',
|
||||
'avg_pool_3x3',
|
||||
'separable_3x3_2',
|
||||
'max_pool_3x3']
|
||||
used_hiddenstates = [1, 1, 1, 0, 0, 0, 0]
|
||||
hiddenstate_indices = [0, 1, 0, 1, 0, 1, 3, 2, 2, 0]
|
||||
super(NasNetAReductionCell, self).__init__(num_conv_filters, operations,
|
||||
used_hiddenstates,
|
||||
hiddenstate_indices,
|
||||
drop_path_keep_prob,
|
||||
total_num_cells,
|
||||
total_training_steps,
|
||||
use_bounded_activation)
|
||||
+62
@@ -0,0 +1,62 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.nets.nasnet.nasnet_utils."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from nets.nasnet import nasnet_utils
|
||||
|
||||
|
||||
class NasnetUtilsTest(tf.test.TestCase):
|
||||
|
||||
def testCalcReductionLayers(self):
|
||||
num_cells = 18
|
||||
num_reduction_layers = 2
|
||||
reduction_layers = nasnet_utils.calc_reduction_layers(
|
||||
num_cells, num_reduction_layers)
|
||||
self.assertEqual(len(reduction_layers), 2)
|
||||
self.assertEqual(reduction_layers[0], 6)
|
||||
self.assertEqual(reduction_layers[1], 12)
|
||||
|
||||
def testGetChannelIndex(self):
|
||||
data_formats = ['NHWC', 'NCHW']
|
||||
for data_format in data_formats:
|
||||
index = nasnet_utils.get_channel_index(data_format)
|
||||
correct_index = 3 if data_format == 'NHWC' else 1
|
||||
self.assertEqual(index, correct_index)
|
||||
|
||||
def testGetChannelDim(self):
|
||||
data_formats = ['NHWC', 'NCHW']
|
||||
shape = [10, 20, 30, 40]
|
||||
for data_format in data_formats:
|
||||
dim = nasnet_utils.get_channel_dim(shape, data_format)
|
||||
correct_dim = shape[3] if data_format == 'NHWC' else shape[1]
|
||||
self.assertEqual(dim, correct_dim)
|
||||
|
||||
def testGlobalAvgPool(self):
|
||||
data_formats = ['NHWC', 'NCHW']
|
||||
inputs = tf.compat.v1.placeholder(tf.float32, (5, 10, 20, 10))
|
||||
for data_format in data_formats:
|
||||
output = nasnet_utils.global_avg_pool(
|
||||
inputs, data_format)
|
||||
self.assertEqual(output.shape, [5, 10])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+285
@@ -0,0 +1,285 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the definition for the PNASNet classification networks.
|
||||
|
||||
Paper: https://arxiv.org/abs/1712.00559
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import copy
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
from tensorflow.contrib import training as contrib_training
|
||||
|
||||
from nets.nasnet import nasnet
|
||||
from nets.nasnet import nasnet_utils
|
||||
|
||||
arg_scope = contrib_framework.arg_scope
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
def large_imagenet_config():
|
||||
"""Large ImageNet configuration based on PNASNet-5."""
|
||||
return contrib_training.HParams(
|
||||
stem_multiplier=3.0,
|
||||
dense_dropout_keep_prob=0.5,
|
||||
num_cells=12,
|
||||
filter_scaling_rate=2.0,
|
||||
num_conv_filters=216,
|
||||
drop_path_keep_prob=0.6,
|
||||
use_aux_head=1,
|
||||
num_reduction_layers=2,
|
||||
data_format='NHWC',
|
||||
skip_reduction_layer_input=1,
|
||||
total_training_steps=250000,
|
||||
use_bounded_activation=False,
|
||||
)
|
||||
|
||||
|
||||
def mobile_imagenet_config():
|
||||
"""Mobile ImageNet configuration based on PNASNet-5."""
|
||||
return contrib_training.HParams(
|
||||
stem_multiplier=1.0,
|
||||
dense_dropout_keep_prob=0.5,
|
||||
num_cells=9,
|
||||
filter_scaling_rate=2.0,
|
||||
num_conv_filters=54,
|
||||
drop_path_keep_prob=1.0,
|
||||
use_aux_head=1,
|
||||
num_reduction_layers=2,
|
||||
data_format='NHWC',
|
||||
skip_reduction_layer_input=1,
|
||||
total_training_steps=250000,
|
||||
use_bounded_activation=False,
|
||||
)
|
||||
|
||||
|
||||
def pnasnet_large_arg_scope(weight_decay=4e-5, batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=0.001):
|
||||
"""Default arg scope for the PNASNet Large ImageNet model."""
|
||||
return nasnet.nasnet_large_arg_scope(
|
||||
weight_decay, batch_norm_decay, batch_norm_epsilon)
|
||||
|
||||
|
||||
def pnasnet_mobile_arg_scope(weight_decay=4e-5,
|
||||
batch_norm_decay=0.9997,
|
||||
batch_norm_epsilon=0.001):
|
||||
"""Default arg scope for the PNASNet Mobile ImageNet model."""
|
||||
return nasnet.nasnet_mobile_arg_scope(weight_decay, batch_norm_decay,
|
||||
batch_norm_epsilon)
|
||||
|
||||
|
||||
def _build_pnasnet_base(images,
|
||||
normal_cell,
|
||||
num_classes,
|
||||
hparams,
|
||||
is_training,
|
||||
final_endpoint=None):
|
||||
"""Constructs a PNASNet image model."""
|
||||
|
||||
end_points = {}
|
||||
|
||||
def add_and_check_endpoint(endpoint_name, net):
|
||||
end_points[endpoint_name] = net
|
||||
return final_endpoint and (endpoint_name == final_endpoint)
|
||||
|
||||
# Find where to place the reduction cells or stride normal cells
|
||||
reduction_indices = nasnet_utils.calc_reduction_layers(
|
||||
hparams.num_cells, hparams.num_reduction_layers)
|
||||
|
||||
# pylint: disable=protected-access
|
||||
stem = lambda: nasnet._imagenet_stem(images, hparams, normal_cell)
|
||||
# pylint: enable=protected-access
|
||||
net, cell_outputs = stem()
|
||||
if add_and_check_endpoint('Stem', net):
|
||||
return net, end_points
|
||||
|
||||
# Setup for building in the auxiliary head.
|
||||
aux_head_cell_idxes = []
|
||||
if len(reduction_indices) >= 2:
|
||||
aux_head_cell_idxes.append(reduction_indices[1] - 1)
|
||||
|
||||
# Run the cells
|
||||
filter_scaling = 1.0
|
||||
# true_cell_num accounts for the stem cells
|
||||
true_cell_num = 2
|
||||
activation_fn = tf.nn.relu6 if hparams.use_bounded_activation else tf.nn.relu
|
||||
for cell_num in range(hparams.num_cells):
|
||||
is_reduction = cell_num in reduction_indices
|
||||
stride = 2 if is_reduction else 1
|
||||
if is_reduction: filter_scaling *= hparams.filter_scaling_rate
|
||||
if hparams.skip_reduction_layer_input or not is_reduction:
|
||||
prev_layer = cell_outputs[-2]
|
||||
net = normal_cell(
|
||||
net,
|
||||
scope='cell_{}'.format(cell_num),
|
||||
filter_scaling=filter_scaling,
|
||||
stride=stride,
|
||||
prev_layer=prev_layer,
|
||||
cell_num=true_cell_num)
|
||||
if add_and_check_endpoint('Cell_{}'.format(cell_num), net):
|
||||
return net, end_points
|
||||
true_cell_num += 1
|
||||
cell_outputs.append(net)
|
||||
|
||||
if (hparams.use_aux_head and cell_num in aux_head_cell_idxes and
|
||||
num_classes and is_training):
|
||||
aux_net = activation_fn(net)
|
||||
# pylint: disable=protected-access
|
||||
nasnet._build_aux_head(aux_net, end_points, num_classes, hparams,
|
||||
scope='aux_{}'.format(cell_num))
|
||||
# pylint: enable=protected-access
|
||||
|
||||
# Final softmax layer
|
||||
with tf.compat.v1.variable_scope('final_layer'):
|
||||
net = activation_fn(net)
|
||||
net = nasnet_utils.global_avg_pool(net)
|
||||
if add_and_check_endpoint('global_pool', net) or not num_classes:
|
||||
return net, end_points
|
||||
net = slim.dropout(net, hparams.dense_dropout_keep_prob, scope='dropout')
|
||||
logits = slim.fully_connected(net, num_classes)
|
||||
|
||||
if add_and_check_endpoint('Logits', logits):
|
||||
return net, end_points
|
||||
|
||||
predictions = tf.nn.softmax(logits, name='predictions')
|
||||
if add_and_check_endpoint('Predictions', predictions):
|
||||
return net, end_points
|
||||
return logits, end_points
|
||||
|
||||
|
||||
def build_pnasnet_large(images,
|
||||
num_classes,
|
||||
is_training=True,
|
||||
final_endpoint=None,
|
||||
config=None):
|
||||
"""Build PNASNet Large model for the ImageNet Dataset."""
|
||||
hparams = copy.deepcopy(config) if config else large_imagenet_config()
|
||||
# pylint: disable=protected-access
|
||||
nasnet._update_hparams(hparams, is_training)
|
||||
# pylint: enable=protected-access
|
||||
|
||||
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
|
||||
tf.compat.v1.logging.info(
|
||||
'A GPU is available on the machine, consider using NCHW '
|
||||
'data format for increased speed on GPU.')
|
||||
|
||||
if hparams.data_format == 'NCHW':
|
||||
images = tf.transpose(a=images, perm=[0, 3, 1, 2])
|
||||
|
||||
# Calculate the total number of cells in the network.
|
||||
# There is no distinction between reduction and normal cells in PNAS so the
|
||||
# total number of cells is equal to the number normal cells plus the number
|
||||
# of stem cells (two by default).
|
||||
total_num_cells = hparams.num_cells + 2
|
||||
|
||||
normal_cell = PNasNetNormalCell(hparams.num_conv_filters,
|
||||
hparams.drop_path_keep_prob, total_num_cells,
|
||||
hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
with arg_scope(
|
||||
[slim.dropout, nasnet_utils.drop_path, slim.batch_norm],
|
||||
is_training=is_training):
|
||||
with arg_scope([slim.avg_pool2d, slim.max_pool2d, slim.conv2d,
|
||||
slim.batch_norm, slim.separable_conv2d,
|
||||
nasnet_utils.factorized_reduction,
|
||||
nasnet_utils.global_avg_pool,
|
||||
nasnet_utils.get_channel_index,
|
||||
nasnet_utils.get_channel_dim],
|
||||
data_format=hparams.data_format):
|
||||
return _build_pnasnet_base(
|
||||
images,
|
||||
normal_cell=normal_cell,
|
||||
num_classes=num_classes,
|
||||
hparams=hparams,
|
||||
is_training=is_training,
|
||||
final_endpoint=final_endpoint)
|
||||
build_pnasnet_large.default_image_size = 331
|
||||
|
||||
|
||||
def build_pnasnet_mobile(images,
|
||||
num_classes,
|
||||
is_training=True,
|
||||
final_endpoint=None,
|
||||
config=None):
|
||||
"""Build PNASNet Mobile model for the ImageNet Dataset."""
|
||||
hparams = copy.deepcopy(config) if config else mobile_imagenet_config()
|
||||
# pylint: disable=protected-access
|
||||
nasnet._update_hparams(hparams, is_training)
|
||||
# pylint: enable=protected-access
|
||||
|
||||
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
|
||||
tf.compat.v1.logging.info(
|
||||
'A GPU is available on the machine, consider using NCHW '
|
||||
'data format for increased speed on GPU.')
|
||||
|
||||
if hparams.data_format == 'NCHW':
|
||||
images = tf.transpose(a=images, perm=[0, 3, 1, 2])
|
||||
|
||||
# Calculate the total number of cells in the network.
|
||||
# There is no distinction between reduction and normal cells in PNAS so the
|
||||
# total number of cells is equal to the number normal cells plus the number
|
||||
# of stem cells (two by default).
|
||||
total_num_cells = hparams.num_cells + 2
|
||||
|
||||
normal_cell = PNasNetNormalCell(hparams.num_conv_filters,
|
||||
hparams.drop_path_keep_prob, total_num_cells,
|
||||
hparams.total_training_steps,
|
||||
hparams.use_bounded_activation)
|
||||
with arg_scope(
|
||||
[slim.dropout, nasnet_utils.drop_path, slim.batch_norm],
|
||||
is_training=is_training):
|
||||
with arg_scope(
|
||||
[
|
||||
slim.avg_pool2d, slim.max_pool2d, slim.conv2d, slim.batch_norm,
|
||||
slim.separable_conv2d, nasnet_utils.factorized_reduction,
|
||||
nasnet_utils.global_avg_pool, nasnet_utils.get_channel_index,
|
||||
nasnet_utils.get_channel_dim
|
||||
],
|
||||
data_format=hparams.data_format):
|
||||
return _build_pnasnet_base(
|
||||
images,
|
||||
normal_cell=normal_cell,
|
||||
num_classes=num_classes,
|
||||
hparams=hparams,
|
||||
is_training=is_training,
|
||||
final_endpoint=final_endpoint)
|
||||
|
||||
|
||||
build_pnasnet_mobile.default_image_size = 224
|
||||
|
||||
|
||||
class PNasNetNormalCell(nasnet_utils.NasNetABaseCell):
|
||||
"""PNASNet Normal Cell."""
|
||||
|
||||
def __init__(self, num_conv_filters, drop_path_keep_prob, total_num_cells,
|
||||
total_training_steps, use_bounded_activation=False):
|
||||
# Configuration for the PNASNet-5 model.
|
||||
operations = [
|
||||
'separable_5x5_2', 'max_pool_3x3', 'separable_7x7_2', 'max_pool_3x3',
|
||||
'separable_5x5_2', 'separable_3x3_2', 'separable_3x3_2', 'max_pool_3x3',
|
||||
'separable_3x3_2', 'none'
|
||||
]
|
||||
used_hiddenstates = [1, 1, 0, 0, 0, 0, 0]
|
||||
hiddenstate_indices = [1, 1, 0, 0, 0, 0, 4, 0, 1, 0]
|
||||
|
||||
super(PNasNetNormalCell, self).__init__(
|
||||
num_conv_filters, operations, used_hiddenstates, hiddenstate_indices,
|
||||
drop_path_keep_prob, total_num_cells, total_training_steps,
|
||||
use_bounded_activation)
|
||||
+257
@@ -0,0 +1,257 @@
|
||||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.pnasnet."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets.nasnet import pnasnet
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class PNASNetTest(tf.test.TestCase):
|
||||
|
||||
def testBuildLogitsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
logits, end_points = pnasnet.build_pnasnet_large(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildLogitsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
logits, end_points = pnasnet.build_pnasnet_mobile(inputs, num_classes)
|
||||
auxlogits = end_points['AuxLogits']
|
||||
predictions = end_points['Predictions']
|
||||
self.assertListEqual(auxlogits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
self.assertListEqual(predictions.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testBuildNonExistingLayerLargeModel(self):
|
||||
"""Tests that the model is built correctly without unnecessary layers."""
|
||||
inputs = tf.random.uniform((5, 331, 331, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
pnasnet.build_pnasnet_large(inputs, 1000)
|
||||
vars_names = [x.op.name for x in tf.compat.v1.trainable_variables()]
|
||||
self.assertIn('cell_stem_0/1x1/weights', vars_names)
|
||||
self.assertNotIn('cell_stem_1/comb_iter_0/right/1x1/weights', vars_names)
|
||||
|
||||
def testBuildNonExistingLayerMobileModel(self):
|
||||
"""Tests that the model is built correctly without unnecessary layers."""
|
||||
inputs = tf.random.uniform((5, 224, 224, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
pnasnet.build_pnasnet_mobile(inputs, 1000)
|
||||
vars_names = [x.op.name for x in tf.compat.v1.trainable_variables()]
|
||||
self.assertIn('cell_stem_0/1x1/weights', vars_names)
|
||||
self.assertNotIn('cell_stem_1/comb_iter_0/right/1x1/weights', vars_names)
|
||||
|
||||
def testBuildPreLogitsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
net, end_points = pnasnet.build_pnasnet_large(inputs, num_classes)
|
||||
self.assertFalse('AuxLogits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
self.assertTrue(net.op.name.startswith('final_layer/Mean'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 4320])
|
||||
|
||||
def testBuildPreLogitsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = None
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
net, end_points = pnasnet.build_pnasnet_mobile(inputs, num_classes)
|
||||
self.assertFalse('AuxLogits' in end_points)
|
||||
self.assertFalse('Predictions' in end_points)
|
||||
self.assertTrue(net.op.name.startswith('final_layer/Mean'))
|
||||
self.assertListEqual(net.get_shape().as_list(), [batch_size, 1080])
|
||||
|
||||
def testAllEndPointsShapesLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_large(inputs, num_classes)
|
||||
|
||||
endpoints_shapes = {'Stem': [batch_size, 42, 42, 540],
|
||||
'Cell_0': [batch_size, 42, 42, 1080],
|
||||
'Cell_1': [batch_size, 42, 42, 1080],
|
||||
'Cell_2': [batch_size, 42, 42, 1080],
|
||||
'Cell_3': [batch_size, 42, 42, 1080],
|
||||
'Cell_4': [batch_size, 21, 21, 2160],
|
||||
'Cell_5': [batch_size, 21, 21, 2160],
|
||||
'Cell_6': [batch_size, 21, 21, 2160],
|
||||
'Cell_7': [batch_size, 21, 21, 2160],
|
||||
'Cell_8': [batch_size, 11, 11, 4320],
|
||||
'Cell_9': [batch_size, 11, 11, 4320],
|
||||
'Cell_10': [batch_size, 11, 11, 4320],
|
||||
'Cell_11': [batch_size, 11, 11, 4320],
|
||||
'global_pool': [batch_size, 4320],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, 1000],
|
||||
'Predictions': [batch_size, 1000],
|
||||
'Logits': [batch_size, 1000],
|
||||
}
|
||||
self.assertEqual(len(end_points), 17)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
tf.compat.v1.logging.info('Endpoint name: {}'.format(endpoint_name))
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertIn(endpoint_name, end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testAllEndPointsShapesMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_mobile(inputs, num_classes)
|
||||
|
||||
endpoints_shapes = {
|
||||
'Stem': [batch_size, 28, 28, 135],
|
||||
'Cell_0': [batch_size, 28, 28, 270],
|
||||
'Cell_1': [batch_size, 28, 28, 270],
|
||||
'Cell_2': [batch_size, 28, 28, 270],
|
||||
'Cell_3': [batch_size, 14, 14, 540],
|
||||
'Cell_4': [batch_size, 14, 14, 540],
|
||||
'Cell_5': [batch_size, 14, 14, 540],
|
||||
'Cell_6': [batch_size, 7, 7, 1080],
|
||||
'Cell_7': [batch_size, 7, 7, 1080],
|
||||
'Cell_8': [batch_size, 7, 7, 1080],
|
||||
'global_pool': [batch_size, 1080],
|
||||
# Logits and predictions
|
||||
'AuxLogits': [batch_size, num_classes],
|
||||
'Predictions': [batch_size, num_classes],
|
||||
'Logits': [batch_size, num_classes],
|
||||
}
|
||||
self.assertEqual(len(end_points), 14)
|
||||
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
|
||||
for endpoint_name in endpoints_shapes:
|
||||
tf.compat.v1.logging.info('Endpoint name: {}'.format(endpoint_name))
|
||||
expected_shape = endpoints_shapes[endpoint_name]
|
||||
self.assertIn(endpoint_name, end_points)
|
||||
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
|
||||
expected_shape)
|
||||
|
||||
def testNoAuxHeadLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
for use_aux_head in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = pnasnet.large_imagenet_config()
|
||||
config.set_hparam('use_aux_head', int(use_aux_head))
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_large(inputs, num_classes,
|
||||
config=config)
|
||||
self.assertEqual('AuxLogits' in end_points, use_aux_head)
|
||||
|
||||
def testNoAuxHeadMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
for use_aux_head in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = pnasnet.mobile_imagenet_config()
|
||||
config.set_hparam('use_aux_head', int(use_aux_head))
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_mobile(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertEqual('AuxLogits' in end_points, use_aux_head)
|
||||
|
||||
def testOverrideHParamsLargeModel(self):
|
||||
batch_size = 5
|
||||
height, width = 331, 331
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = pnasnet.large_imagenet_config()
|
||||
config.set_hparam('data_format', 'NCHW')
|
||||
with slim.arg_scope(pnasnet.pnasnet_large_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_large(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertListEqual(
|
||||
end_points['Stem'].shape.as_list(), [batch_size, 540, 42, 42])
|
||||
|
||||
def testOverrideHParamsMobileModel(self):
|
||||
batch_size = 5
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
tf.compat.v1.train.create_global_step()
|
||||
config = pnasnet.mobile_imagenet_config()
|
||||
config.set_hparam('data_format', 'NCHW')
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
_, end_points = pnasnet.build_pnasnet_mobile(
|
||||
inputs, num_classes, config=config)
|
||||
self.assertListEqual(end_points['Stem'].shape.as_list(),
|
||||
[batch_size, 135, 28, 28])
|
||||
|
||||
def testUseBoundedAcitvationMobileModel(self):
|
||||
batch_size = 1
|
||||
height, width = 224, 224
|
||||
num_classes = 1000
|
||||
for use_bounded_activation in (True, False):
|
||||
tf.compat.v1.reset_default_graph()
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
config = pnasnet.mobile_imagenet_config()
|
||||
config.set_hparam('use_bounded_activation', use_bounded_activation)
|
||||
with slim.arg_scope(pnasnet.pnasnet_mobile_arg_scope()):
|
||||
_, _ = pnasnet.build_pnasnet_mobile(
|
||||
inputs, num_classes, config=config)
|
||||
for node in tf.compat.v1.get_default_graph().as_graph_def().node:
|
||||
if node.op.startswith('Relu'):
|
||||
self.assertEqual(node.op == 'Relu6', use_bounded_activation)
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+172
@@ -0,0 +1,172 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains a factory for building various models."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
import functools
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import alexnet
|
||||
from nets import cifarnet
|
||||
from nets import i3d
|
||||
from nets import inception
|
||||
from nets import lenet
|
||||
from nets import mobilenet_v1
|
||||
from nets import overfeat
|
||||
from nets import resnet_v1
|
||||
from nets import resnet_v2
|
||||
from nets import s3dg
|
||||
from nets import vgg
|
||||
from nets.mobilenet import mobilenet_v2
|
||||
from nets.mobilenet import mobilenet_v3
|
||||
from nets.nasnet import nasnet
|
||||
from nets.nasnet import pnasnet
|
||||
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
networks_map = {
|
||||
'alexnet_v2': alexnet.alexnet_v2,
|
||||
'cifarnet': cifarnet.cifarnet,
|
||||
'overfeat': overfeat.overfeat,
|
||||
'vgg_a': vgg.vgg_a,
|
||||
'vgg_16': vgg.vgg_16,
|
||||
'vgg_19': vgg.vgg_19,
|
||||
'inception_v1': inception.inception_v1,
|
||||
'inception_v2': inception.inception_v2,
|
||||
'inception_v3': inception.inception_v3,
|
||||
'inception_v4': inception.inception_v4,
|
||||
'inception_resnet_v2': inception.inception_resnet_v2,
|
||||
'i3d': i3d.i3d,
|
||||
's3dg': s3dg.s3dg,
|
||||
'lenet': lenet.lenet,
|
||||
'resnet_v1_50': resnet_v1.resnet_v1_50,
|
||||
'resnet_v1_101': resnet_v1.resnet_v1_101,
|
||||
'resnet_v1_152': resnet_v1.resnet_v1_152,
|
||||
'resnet_v1_200': resnet_v1.resnet_v1_200,
|
||||
'resnet_v2_50': resnet_v2.resnet_v2_50,
|
||||
'resnet_v2_101': resnet_v2.resnet_v2_101,
|
||||
'resnet_v2_152': resnet_v2.resnet_v2_152,
|
||||
'resnet_v2_200': resnet_v2.resnet_v2_200,
|
||||
'mobilenet_v1': mobilenet_v1.mobilenet_v1,
|
||||
'mobilenet_v1_075': mobilenet_v1.mobilenet_v1_075,
|
||||
'mobilenet_v1_050': mobilenet_v1.mobilenet_v1_050,
|
||||
'mobilenet_v1_025': mobilenet_v1.mobilenet_v1_025,
|
||||
'mobilenet_v2': mobilenet_v2.mobilenet,
|
||||
'mobilenet_v2_140': mobilenet_v2.mobilenet_v2_140,
|
||||
'mobilenet_v2_035': mobilenet_v2.mobilenet_v2_035,
|
||||
'mobilenet_v3_small': mobilenet_v3.small,
|
||||
'mobilenet_v3_large': mobilenet_v3.large,
|
||||
'mobilenet_v3_small_minimalistic': mobilenet_v3.small_minimalistic,
|
||||
'mobilenet_v3_large_minimalistic': mobilenet_v3.large_minimalistic,
|
||||
'mobilenet_edgetpu': mobilenet_v3.edge_tpu,
|
||||
'mobilenet_edgetpu_075': mobilenet_v3.edge_tpu_075,
|
||||
'nasnet_cifar': nasnet.build_nasnet_cifar,
|
||||
'nasnet_mobile': nasnet.build_nasnet_mobile,
|
||||
'nasnet_large': nasnet.build_nasnet_large,
|
||||
'pnasnet_large': pnasnet.build_pnasnet_large,
|
||||
'pnasnet_mobile': pnasnet.build_pnasnet_mobile,
|
||||
}
|
||||
|
||||
arg_scopes_map = {
|
||||
'alexnet_v2': alexnet.alexnet_v2_arg_scope,
|
||||
'cifarnet': cifarnet.cifarnet_arg_scope,
|
||||
'overfeat': overfeat.overfeat_arg_scope,
|
||||
'vgg_a': vgg.vgg_arg_scope,
|
||||
'vgg_16': vgg.vgg_arg_scope,
|
||||
'vgg_19': vgg.vgg_arg_scope,
|
||||
'inception_v1': inception.inception_v3_arg_scope,
|
||||
'inception_v2': inception.inception_v3_arg_scope,
|
||||
'inception_v3': inception.inception_v3_arg_scope,
|
||||
'inception_v4': inception.inception_v4_arg_scope,
|
||||
'inception_resnet_v2': inception.inception_resnet_v2_arg_scope,
|
||||
'i3d': i3d.i3d_arg_scope,
|
||||
's3dg': s3dg.s3dg_arg_scope,
|
||||
'lenet': lenet.lenet_arg_scope,
|
||||
'resnet_v1_50': resnet_v1.resnet_arg_scope,
|
||||
'resnet_v1_101': resnet_v1.resnet_arg_scope,
|
||||
'resnet_v1_152': resnet_v1.resnet_arg_scope,
|
||||
'resnet_v1_200': resnet_v1.resnet_arg_scope,
|
||||
'resnet_v2_50': resnet_v2.resnet_arg_scope,
|
||||
'resnet_v2_101': resnet_v2.resnet_arg_scope,
|
||||
'resnet_v2_152': resnet_v2.resnet_arg_scope,
|
||||
'resnet_v2_200': resnet_v2.resnet_arg_scope,
|
||||
'mobilenet_v1': mobilenet_v1.mobilenet_v1_arg_scope,
|
||||
'mobilenet_v1_075': mobilenet_v1.mobilenet_v1_arg_scope,
|
||||
'mobilenet_v1_050': mobilenet_v1.mobilenet_v1_arg_scope,
|
||||
'mobilenet_v1_025': mobilenet_v1.mobilenet_v1_arg_scope,
|
||||
'mobilenet_v2': mobilenet_v2.training_scope,
|
||||
'mobilenet_v2_035': mobilenet_v2.training_scope,
|
||||
'mobilenet_v2_140': mobilenet_v2.training_scope,
|
||||
'mobilenet_v3_small': mobilenet_v3.training_scope,
|
||||
'mobilenet_v3_large': mobilenet_v3.training_scope,
|
||||
'mobilenet_v3_small_minimalistic': mobilenet_v3.training_scope,
|
||||
'mobilenet_v3_large_minimalistic': mobilenet_v3.training_scope,
|
||||
'mobilenet_edgetpu': mobilenet_v3.training_scope,
|
||||
'mobilenet_edgetpu_075': mobilenet_v3.training_scope,
|
||||
'nasnet_cifar': nasnet.nasnet_cifar_arg_scope,
|
||||
'nasnet_mobile': nasnet.nasnet_mobile_arg_scope,
|
||||
'nasnet_large': nasnet.nasnet_large_arg_scope,
|
||||
'pnasnet_large': pnasnet.pnasnet_large_arg_scope,
|
||||
'pnasnet_mobile': pnasnet.pnasnet_mobile_arg_scope,
|
||||
}
|
||||
|
||||
|
||||
def get_network_fn(name, num_classes, weight_decay=0.0, is_training=False):
|
||||
"""Returns a network_fn such as `logits, end_points = network_fn(images)`.
|
||||
|
||||
Args:
|
||||
name: The name of the network.
|
||||
num_classes: The number of classes to use for classification. If 0 or None,
|
||||
the logits layer is omitted and its input features are returned instead.
|
||||
weight_decay: The l2 coefficient for the model weights.
|
||||
is_training: `True` if the model is being used for training and `False`
|
||||
otherwise.
|
||||
|
||||
Returns:
|
||||
network_fn: A function that applies the model to a batch of images. It has
|
||||
the following signature:
|
||||
net, end_points = network_fn(images)
|
||||
The `images` input is a tensor of shape [batch_size, height, width, 3 or
|
||||
1] with height = width = network_fn.default_image_size. (The
|
||||
permissibility and treatment of other sizes depends on the network_fn.)
|
||||
The returned `end_points` are a dictionary of intermediate activations.
|
||||
The returned `net` is the topmost layer, depending on `num_classes`:
|
||||
If `num_classes` was a non-zero integer, `net` is a logits tensor
|
||||
of shape [batch_size, num_classes].
|
||||
If `num_classes` was 0 or `None`, `net` is a tensor with the input
|
||||
to the logits layer of shape [batch_size, 1, 1, num_features] or
|
||||
[batch_size, num_features]. Dropout has not been applied to this
|
||||
(even if the network's original classification does); it remains for
|
||||
the caller to do this or not.
|
||||
|
||||
Raises:
|
||||
ValueError: If network `name` is not recognized.
|
||||
"""
|
||||
if name not in networks_map:
|
||||
raise ValueError('Name of network unknown %s' % name)
|
||||
func = networks_map[name]
|
||||
@functools.wraps(func)
|
||||
def network_fn(images, **kwargs):
|
||||
arg_scope = arg_scopes_map[name](weight_decay=weight_decay)
|
||||
with slim.arg_scope(arg_scope):
|
||||
return func(images, num_classes=num_classes, is_training=is_training,
|
||||
**kwargs)
|
||||
if hasattr(func, 'default_image_size'):
|
||||
network_fn.default_image_size = func.default_image_size
|
||||
|
||||
return network_fn
|
||||
+78
@@ -0,0 +1,78 @@
|
||||
# Copyright 2016 Google Inc. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
|
||||
"""Tests for slim.inception."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from nets import nets_factory
|
||||
|
||||
|
||||
class NetworksTest(tf.test.TestCase):
|
||||
|
||||
def testGetNetworkFnFirstHalf(self):
|
||||
batch_size = 5
|
||||
num_classes = 1000
|
||||
for net in list(nets_factory.networks_map.keys())[:10]:
|
||||
with tf.Graph().as_default() as g, self.test_session(g):
|
||||
net_fn = nets_factory.get_network_fn(net, num_classes=num_classes)
|
||||
# Most networks use 224 as their default_image_size
|
||||
image_size = getattr(net_fn, 'default_image_size', 224)
|
||||
if net not in ['i3d', 's3dg']:
|
||||
inputs = tf.random.uniform((batch_size, image_size, image_size, 3))
|
||||
logits, end_points = net_fn(inputs)
|
||||
self.assertTrue(isinstance(logits, tf.Tensor))
|
||||
self.assertTrue(isinstance(end_points, dict))
|
||||
self.assertEqual(logits.get_shape().as_list()[0], batch_size)
|
||||
self.assertEqual(logits.get_shape().as_list()[-1], num_classes)
|
||||
|
||||
def testGetNetworkFnSecondHalf(self):
|
||||
batch_size = 5
|
||||
num_classes = 1000
|
||||
for net in list(nets_factory.networks_map.keys())[10:]:
|
||||
with tf.Graph().as_default() as g, self.test_session(g):
|
||||
net_fn = nets_factory.get_network_fn(net, num_classes=num_classes)
|
||||
# Most networks use 224 as their default_image_size
|
||||
image_size = getattr(net_fn, 'default_image_size', 224)
|
||||
if net not in ['i3d', 's3dg']:
|
||||
inputs = tf.random.uniform((batch_size, image_size, image_size, 3))
|
||||
logits, end_points = net_fn(inputs)
|
||||
self.assertTrue(isinstance(logits, tf.Tensor))
|
||||
self.assertTrue(isinstance(end_points, dict))
|
||||
self.assertEqual(logits.get_shape().as_list()[0], batch_size)
|
||||
self.assertEqual(logits.get_shape().as_list()[-1], num_classes)
|
||||
|
||||
def testGetNetworkFnVideoModels(self):
|
||||
batch_size = 5
|
||||
num_classes = 400
|
||||
for net in ['i3d', 's3dg']:
|
||||
with tf.Graph().as_default() as g, self.test_session(g):
|
||||
net_fn = nets_factory.get_network_fn(net, num_classes=num_classes)
|
||||
# Most networks use 224 as their default_image_size
|
||||
image_size = getattr(net_fn, 'default_image_size', 224) // 2
|
||||
inputs = tf.random.uniform((batch_size, 10, image_size, image_size, 3))
|
||||
logits, end_points = net_fn(inputs)
|
||||
self.assertTrue(isinstance(logits, tf.Tensor))
|
||||
self.assertTrue(isinstance(end_points, dict))
|
||||
self.assertEqual(logits.get_shape().as_list()[0], batch_size)
|
||||
self.assertEqual(logits.get_shape().as_list()[-1], num_classes)
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+139
@@ -0,0 +1,139 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains the model definition for the OverFeat network.
|
||||
|
||||
The definition for the network was obtained from:
|
||||
OverFeat: Integrated Recognition, Localization and Detection using
|
||||
Convolutional Networks
|
||||
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus and
|
||||
Yann LeCun, 2014
|
||||
http://arxiv.org/abs/1312.6229
|
||||
|
||||
Usage:
|
||||
with slim.arg_scope(overfeat.overfeat_arg_scope()):
|
||||
outputs, end_points = overfeat.overfeat(inputs)
|
||||
|
||||
@@overfeat
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
# pylint: disable=g-long-lambda
|
||||
trunc_normal = lambda stddev: tf.compat.v1.truncated_normal_initializer(
|
||||
0.0, stddev)
|
||||
|
||||
|
||||
def overfeat_arg_scope(weight_decay=0.0005):
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected],
|
||||
activation_fn=tf.nn.relu,
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
biases_initializer=tf.compat.v1.zeros_initializer()):
|
||||
with slim.arg_scope([slim.conv2d], padding='SAME'):
|
||||
with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
|
||||
return arg_sc
|
||||
|
||||
|
||||
def overfeat(inputs,
|
||||
num_classes=1000,
|
||||
is_training=True,
|
||||
dropout_keep_prob=0.5,
|
||||
spatial_squeeze=True,
|
||||
scope='overfeat',
|
||||
global_pool=False):
|
||||
"""Contains the model definition for the OverFeat network.
|
||||
|
||||
The definition for the network was obtained from:
|
||||
OverFeat: Integrated Recognition, Localization and Detection using
|
||||
Convolutional Networks
|
||||
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus and
|
||||
Yann LeCun, 2014
|
||||
http://arxiv.org/abs/1312.6229
|
||||
|
||||
Note: All the fully_connected layers have been transformed to conv2d layers.
|
||||
To use in classification mode, resize input to 231x231. To use in fully
|
||||
convolutional mode, set spatial_squeeze to false.
|
||||
|
||||
Args:
|
||||
inputs: a tensor of size [batch_size, height, width, channels].
|
||||
num_classes: number of predicted classes. If 0 or None, the logits layer is
|
||||
omitted and the input features to the logits layer are returned instead.
|
||||
is_training: whether or not the model is being trained.
|
||||
dropout_keep_prob: the probability that activations are kept in the dropout
|
||||
layers during training.
|
||||
spatial_squeeze: whether or not should squeeze the spatial dimensions of the
|
||||
outputs. Useful to remove unnecessary dimensions for classification.
|
||||
scope: Optional scope for the variables.
|
||||
global_pool: Optional boolean flag. If True, the input to the classification
|
||||
layer is avgpooled to size 1x1, for any input size. (This is not part
|
||||
of the original OverFeat.)
|
||||
|
||||
Returns:
|
||||
net: the output of the logits layer (if num_classes is a non-zero integer),
|
||||
or the non-dropped-out input to the logits layer (if num_classes is 0 or
|
||||
None).
|
||||
end_points: a dict of tensors with intermediate activations.
|
||||
"""
|
||||
with tf.compat.v1.variable_scope(scope, 'overfeat', [inputs]) as sc:
|
||||
end_points_collection = sc.original_name_scope + '_end_points'
|
||||
# Collect outputs for conv2d, fully_connected and max_pool2d
|
||||
with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
|
||||
outputs_collections=end_points_collection):
|
||||
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
|
||||
scope='conv1')
|
||||
net = slim.max_pool2d(net, [2, 2], scope='pool1')
|
||||
net = slim.conv2d(net, 256, [5, 5], padding='VALID', scope='conv2')
|
||||
net = slim.max_pool2d(net, [2, 2], scope='pool2')
|
||||
net = slim.conv2d(net, 512, [3, 3], scope='conv3')
|
||||
net = slim.conv2d(net, 1024, [3, 3], scope='conv4')
|
||||
net = slim.conv2d(net, 1024, [3, 3], scope='conv5')
|
||||
net = slim.max_pool2d(net, [2, 2], scope='pool5')
|
||||
|
||||
# Use conv2d instead of fully_connected layers.
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d],
|
||||
weights_initializer=trunc_normal(0.005),
|
||||
biases_initializer=tf.compat.v1.constant_initializer(0.1)):
|
||||
net = slim.conv2d(net, 3072, [6, 6], padding='VALID', scope='fc6')
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='dropout6')
|
||||
net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
|
||||
# Convert end_points_collection into a end_point dict.
|
||||
end_points = slim.utils.convert_collection_to_dict(
|
||||
end_points_collection)
|
||||
if global_pool:
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], keepdims=True, name='global_pool')
|
||||
end_points['global_pool'] = net
|
||||
if num_classes:
|
||||
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
|
||||
scope='dropout7')
|
||||
net = slim.conv2d(
|
||||
net,
|
||||
num_classes, [1, 1],
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
biases_initializer=tf.compat.v1.zeros_initializer(),
|
||||
scope='fc8')
|
||||
if spatial_squeeze:
|
||||
net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
|
||||
end_points[sc.name + '/fc8'] = net
|
||||
return net, end_points
|
||||
overfeat.default_image_size = 231
|
||||
+179
@@ -0,0 +1,179 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Tests for slim.nets.overfeat."""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import overfeat
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class OverFeatTest(tf.test.TestCase):
|
||||
|
||||
def testBuild(self):
|
||||
batch_size = 5
|
||||
height, width = 231, 231
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = overfeat.overfeat(inputs, num_classes)
|
||||
self.assertEquals(logits.op.name, 'overfeat/fc8/squeezed')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
|
||||
def testFullyConvolutional(self):
|
||||
batch_size = 1
|
||||
height, width = 281, 281
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = overfeat.overfeat(inputs, num_classes, spatial_squeeze=False)
|
||||
self.assertEquals(logits.op.name, 'overfeat/fc8/BiasAdd')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, 2, 2, num_classes])
|
||||
|
||||
def testGlobalPool(self):
|
||||
batch_size = 1
|
||||
height, width = 281, 281
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = overfeat.overfeat(inputs, num_classes, spatial_squeeze=False,
|
||||
global_pool=True)
|
||||
self.assertEquals(logits.op.name, 'overfeat/fc8/BiasAdd')
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, 1, 1, num_classes])
|
||||
|
||||
def testEndPoints(self):
|
||||
batch_size = 5
|
||||
height, width = 231, 231
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
_, end_points = overfeat.overfeat(inputs, num_classes)
|
||||
expected_names = ['overfeat/conv1',
|
||||
'overfeat/pool1',
|
||||
'overfeat/conv2',
|
||||
'overfeat/pool2',
|
||||
'overfeat/conv3',
|
||||
'overfeat/conv4',
|
||||
'overfeat/conv5',
|
||||
'overfeat/pool5',
|
||||
'overfeat/fc6',
|
||||
'overfeat/fc7',
|
||||
'overfeat/fc8'
|
||||
]
|
||||
self.assertSetEqual(set(end_points.keys()), set(expected_names))
|
||||
|
||||
def testNoClasses(self):
|
||||
batch_size = 5
|
||||
height, width = 231, 231
|
||||
num_classes = None
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
net, end_points = overfeat.overfeat(inputs, num_classes)
|
||||
expected_names = ['overfeat/conv1',
|
||||
'overfeat/pool1',
|
||||
'overfeat/conv2',
|
||||
'overfeat/pool2',
|
||||
'overfeat/conv3',
|
||||
'overfeat/conv4',
|
||||
'overfeat/conv5',
|
||||
'overfeat/pool5',
|
||||
'overfeat/fc6',
|
||||
'overfeat/fc7'
|
||||
]
|
||||
self.assertSetEqual(set(end_points.keys()), set(expected_names))
|
||||
self.assertTrue(net.op.name.startswith('overfeat/fc7'))
|
||||
|
||||
def testModelVariables(self):
|
||||
batch_size = 5
|
||||
height, width = 231, 231
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
overfeat.overfeat(inputs, num_classes)
|
||||
expected_names = ['overfeat/conv1/weights',
|
||||
'overfeat/conv1/biases',
|
||||
'overfeat/conv2/weights',
|
||||
'overfeat/conv2/biases',
|
||||
'overfeat/conv3/weights',
|
||||
'overfeat/conv3/biases',
|
||||
'overfeat/conv4/weights',
|
||||
'overfeat/conv4/biases',
|
||||
'overfeat/conv5/weights',
|
||||
'overfeat/conv5/biases',
|
||||
'overfeat/fc6/weights',
|
||||
'overfeat/fc6/biases',
|
||||
'overfeat/fc7/weights',
|
||||
'overfeat/fc7/biases',
|
||||
'overfeat/fc8/weights',
|
||||
'overfeat/fc8/biases',
|
||||
]
|
||||
model_variables = [v.op.name for v in slim.get_model_variables()]
|
||||
self.assertSetEqual(set(model_variables), set(expected_names))
|
||||
|
||||
def testEvaluation(self):
|
||||
batch_size = 2
|
||||
height, width = 231, 231
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
eval_inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = overfeat.overfeat(eval_inputs, is_training=False)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[batch_size, num_classes])
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
self.assertListEqual(predictions.get_shape().as_list(), [batch_size])
|
||||
|
||||
def testTrainEvalWithReuse(self):
|
||||
train_batch_size = 2
|
||||
eval_batch_size = 1
|
||||
train_height, train_width = 231, 231
|
||||
eval_height, eval_width = 281, 281
|
||||
num_classes = 1000
|
||||
with self.test_session():
|
||||
train_inputs = tf.random.uniform(
|
||||
(train_batch_size, train_height, train_width, 3))
|
||||
logits, _ = overfeat.overfeat(train_inputs)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[train_batch_size, num_classes])
|
||||
tf.compat.v1.get_variable_scope().reuse_variables()
|
||||
eval_inputs = tf.random.uniform(
|
||||
(eval_batch_size, eval_height, eval_width, 3))
|
||||
logits, _ = overfeat.overfeat(eval_inputs, is_training=False,
|
||||
spatial_squeeze=False)
|
||||
self.assertListEqual(logits.get_shape().as_list(),
|
||||
[eval_batch_size, 2, 2, num_classes])
|
||||
logits = tf.reduce_mean(input_tensor=logits, axis=[1, 2])
|
||||
predictions = tf.argmax(input=logits, axis=1)
|
||||
self.assertEquals(predictions.get_shape().as_list(), [eval_batch_size])
|
||||
|
||||
def testForward(self):
|
||||
batch_size = 1
|
||||
height, width = 231, 231
|
||||
with self.test_session() as sess:
|
||||
inputs = tf.random.uniform((batch_size, height, width, 3))
|
||||
logits, _ = overfeat.overfeat(inputs)
|
||||
sess.run(tf.compat.v1.global_variables_initializer())
|
||||
output = sess.run(logits)
|
||||
self.assertTrue(output.any())
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+297
@@ -0,0 +1,297 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# =============================================================================
|
||||
"""Implementation of the Image-to-Image Translation model.
|
||||
|
||||
This network represents a port of the following work:
|
||||
|
||||
Image-to-Image Translation with Conditional Adversarial Networks
|
||||
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou and Alexei A. Efros
|
||||
Arxiv, 2017
|
||||
https://phillipi.github.io/pix2pix/
|
||||
|
||||
A reference implementation written in Lua can be found at:
|
||||
https://github.com/phillipi/pix2pix/blob/master/models.lua
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import collections
|
||||
import functools
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from tensorflow.contrib import layers as contrib_layers
|
||||
|
||||
layers = contrib_layers
|
||||
|
||||
|
||||
def pix2pix_arg_scope():
|
||||
"""Returns a default argument scope for isola_net.
|
||||
|
||||
Returns:
|
||||
An arg scope.
|
||||
"""
|
||||
# These parameters come from the online port, which don't necessarily match
|
||||
# those in the paper.
|
||||
# TODO(nsilberman): confirm these values with Philip.
|
||||
instance_norm_params = {
|
||||
'center': True,
|
||||
'scale': True,
|
||||
'epsilon': 0.00001,
|
||||
}
|
||||
|
||||
with contrib_framework.arg_scope(
|
||||
[layers.conv2d, layers.conv2d_transpose],
|
||||
normalizer_fn=layers.instance_norm,
|
||||
normalizer_params=instance_norm_params,
|
||||
weights_initializer=tf.compat.v1.random_normal_initializer(0,
|
||||
0.02)) as sc:
|
||||
return sc
|
||||
|
||||
|
||||
def upsample(net, num_outputs, kernel_size, method='nn_upsample_conv'):
|
||||
"""Upsamples the given inputs.
|
||||
|
||||
Args:
|
||||
net: A `Tensor` of size [batch_size, height, width, filters].
|
||||
num_outputs: The number of output filters.
|
||||
kernel_size: A list of 2 scalars or a 1x2 `Tensor` indicating the scale,
|
||||
relative to the inputs, of the output dimensions. For example, if kernel
|
||||
size is [2, 3], then the output height and width will be twice and three
|
||||
times the input size.
|
||||
method: The upsampling method.
|
||||
|
||||
Returns:
|
||||
An `Tensor` which was upsampled using the specified method.
|
||||
|
||||
Raises:
|
||||
ValueError: if `method` is not recognized.
|
||||
"""
|
||||
net_shape = tf.shape(input=net)
|
||||
height = net_shape[1]
|
||||
width = net_shape[2]
|
||||
|
||||
if method == 'nn_upsample_conv':
|
||||
net = tf.image.resize(
|
||||
net, [kernel_size[0] * height, kernel_size[1] * width],
|
||||
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
|
||||
net = layers.conv2d(net, num_outputs, [4, 4], activation_fn=None)
|
||||
elif method == 'conv2d_transpose':
|
||||
net = layers.conv2d_transpose(
|
||||
net, num_outputs, [4, 4], stride=kernel_size, activation_fn=None)
|
||||
else:
|
||||
raise ValueError('Unknown method: [%s]' % method)
|
||||
|
||||
return net
|
||||
|
||||
|
||||
class Block(
|
||||
collections.namedtuple('Block', ['num_filters', 'decoder_keep_prob'])):
|
||||
"""Represents a single block of encoder and decoder processing.
|
||||
|
||||
The Image-to-Image translation paper works a bit differently than the original
|
||||
U-Net model. In particular, each block represents a single operation in the
|
||||
encoder which is concatenated with the corresponding decoder representation.
|
||||
A dropout layer follows the concatenation and convolution of the concatenated
|
||||
features.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
def _default_generator_blocks():
|
||||
"""Returns the default generator block definitions.
|
||||
|
||||
Returns:
|
||||
A list of generator blocks.
|
||||
"""
|
||||
return [
|
||||
Block(64, 0.5),
|
||||
Block(128, 0.5),
|
||||
Block(256, 0.5),
|
||||
Block(512, 0),
|
||||
Block(512, 0),
|
||||
Block(512, 0),
|
||||
Block(512, 0),
|
||||
]
|
||||
|
||||
|
||||
def pix2pix_generator(net,
|
||||
num_outputs,
|
||||
blocks=None,
|
||||
upsample_method='nn_upsample_conv',
|
||||
is_training=False): # pylint: disable=unused-argument
|
||||
"""Defines the network architecture.
|
||||
|
||||
Args:
|
||||
net: A `Tensor` of size [batch, height, width, channels]. Note that the
|
||||
generator currently requires square inputs (e.g. height=width).
|
||||
num_outputs: The number of (per-pixel) outputs.
|
||||
blocks: A list of generator blocks or `None` to use the default generator
|
||||
definition.
|
||||
upsample_method: The method of upsampling images, one of 'nn_upsample_conv'
|
||||
or 'conv2d_transpose'
|
||||
is_training: Whether or not we're in training or testing mode.
|
||||
|
||||
Returns:
|
||||
A `Tensor` representing the model output and a dictionary of model end
|
||||
points.
|
||||
|
||||
Raises:
|
||||
ValueError: if the input heights do not match their widths.
|
||||
"""
|
||||
end_points = {}
|
||||
|
||||
blocks = blocks or _default_generator_blocks()
|
||||
|
||||
input_size = net.get_shape().as_list()
|
||||
|
||||
input_size[3] = num_outputs
|
||||
|
||||
upsample_fn = functools.partial(upsample, method=upsample_method)
|
||||
|
||||
encoder_activations = []
|
||||
|
||||
###########
|
||||
# Encoder #
|
||||
###########
|
||||
with tf.compat.v1.variable_scope('encoder'):
|
||||
with contrib_framework.arg_scope([layers.conv2d],
|
||||
kernel_size=[4, 4],
|
||||
stride=2,
|
||||
activation_fn=tf.nn.leaky_relu):
|
||||
|
||||
for block_id, block in enumerate(blocks):
|
||||
# No normalizer for the first encoder layers as per 'Image-to-Image',
|
||||
# Section 5.1.1
|
||||
if block_id == 0:
|
||||
# First layer doesn't use normalizer_fn
|
||||
net = layers.conv2d(net, block.num_filters, normalizer_fn=None)
|
||||
elif block_id < len(blocks) - 1:
|
||||
net = layers.conv2d(net, block.num_filters)
|
||||
else:
|
||||
# Last layer doesn't use activation_fn nor normalizer_fn
|
||||
net = layers.conv2d(
|
||||
net, block.num_filters, activation_fn=None, normalizer_fn=None)
|
||||
|
||||
encoder_activations.append(net)
|
||||
end_points['encoder%d' % block_id] = net
|
||||
|
||||
###########
|
||||
# Decoder #
|
||||
###########
|
||||
reversed_blocks = list(blocks)
|
||||
reversed_blocks.reverse()
|
||||
|
||||
with tf.compat.v1.variable_scope('decoder'):
|
||||
# Dropout is used at both train and test time as per 'Image-to-Image',
|
||||
# Section 2.1 (last paragraph).
|
||||
with contrib_framework.arg_scope([layers.dropout], is_training=True):
|
||||
|
||||
for block_id, block in enumerate(reversed_blocks):
|
||||
if block_id > 0:
|
||||
net = tf.concat([net, encoder_activations[-block_id - 1]], axis=3)
|
||||
|
||||
# The Relu comes BEFORE the upsample op:
|
||||
net = tf.nn.relu(net)
|
||||
net = upsample_fn(net, block.num_filters, [2, 2])
|
||||
if block.decoder_keep_prob > 0:
|
||||
net = layers.dropout(net, keep_prob=block.decoder_keep_prob)
|
||||
end_points['decoder%d' % block_id] = net
|
||||
|
||||
with tf.compat.v1.variable_scope('output'):
|
||||
# Explicitly set the normalizer_fn to None to override any default value
|
||||
# that may come from an arg_scope, such as pix2pix_arg_scope.
|
||||
logits = layers.conv2d(
|
||||
net, num_outputs, [4, 4], activation_fn=None, normalizer_fn=None)
|
||||
logits = tf.reshape(logits, input_size)
|
||||
|
||||
end_points['logits'] = logits
|
||||
end_points['predictions'] = tf.tanh(logits)
|
||||
|
||||
return logits, end_points
|
||||
|
||||
|
||||
def pix2pix_discriminator(net, num_filters, padding=2, pad_mode='REFLECT',
|
||||
activation_fn=tf.nn.leaky_relu, is_training=False):
|
||||
"""Creates the Image2Image Translation Discriminator.
|
||||
|
||||
Args:
|
||||
net: A `Tensor` of size [batch_size, height, width, channels] representing
|
||||
the input.
|
||||
num_filters: A list of the filters in the discriminator. The length of the
|
||||
list determines the number of layers in the discriminator.
|
||||
padding: Amount of reflection padding applied before each convolution.
|
||||
pad_mode: mode for tf.pad, one of "CONSTANT", "REFLECT", or "SYMMETRIC".
|
||||
activation_fn: activation fn for layers.conv2d.
|
||||
is_training: Whether or not the model is training or testing.
|
||||
|
||||
Returns:
|
||||
A logits `Tensor` of size [batch_size, N, N, 1] where N is the number of
|
||||
'patches' we're attempting to discriminate and a dictionary of model end
|
||||
points.
|
||||
"""
|
||||
del is_training
|
||||
end_points = {}
|
||||
|
||||
num_layers = len(num_filters)
|
||||
|
||||
def padded(net, scope):
|
||||
if padding:
|
||||
with tf.compat.v1.variable_scope(scope):
|
||||
spatial_pad = tf.constant(
|
||||
[[0, 0], [padding, padding], [padding, padding], [0, 0]],
|
||||
dtype=tf.int32)
|
||||
return tf.pad(tensor=net, paddings=spatial_pad, mode=pad_mode)
|
||||
else:
|
||||
return net
|
||||
|
||||
with contrib_framework.arg_scope([layers.conv2d],
|
||||
kernel_size=[4, 4],
|
||||
stride=2,
|
||||
padding='valid',
|
||||
activation_fn=activation_fn):
|
||||
|
||||
# No normalization on the input layer.
|
||||
net = layers.conv2d(
|
||||
padded(net, 'conv0'), num_filters[0], normalizer_fn=None, scope='conv0')
|
||||
|
||||
end_points['conv0'] = net
|
||||
|
||||
for i in range(1, num_layers - 1):
|
||||
net = layers.conv2d(
|
||||
padded(net, 'conv%d' % i), num_filters[i], scope='conv%d' % i)
|
||||
end_points['conv%d' % i] = net
|
||||
|
||||
# Stride 1 on the last layer.
|
||||
net = layers.conv2d(
|
||||
padded(net, 'conv%d' % (num_layers - 1)),
|
||||
num_filters[-1],
|
||||
stride=1,
|
||||
scope='conv%d' % (num_layers - 1))
|
||||
end_points['conv%d' % (num_layers - 1)] = net
|
||||
|
||||
# 1-dim logits, stride 1, no activation, no normalization.
|
||||
logits = layers.conv2d(
|
||||
padded(net, 'conv%d' % num_layers),
|
||||
1,
|
||||
stride=1,
|
||||
activation_fn=None,
|
||||
normalizer_fn=None,
|
||||
scope='conv%d' % num_layers)
|
||||
end_points['logits'] = logits
|
||||
end_points['predictions'] = tf.sigmoid(logits)
|
||||
return logits, end_points
|
||||
+157
@@ -0,0 +1,157 @@
|
||||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# =============================================================================
|
||||
"""Tests for pix2pix."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import framework as contrib_framework
|
||||
from nets import pix2pix
|
||||
|
||||
|
||||
class GeneratorTest(tf.test.TestCase):
|
||||
|
||||
def _reduced_default_blocks(self):
|
||||
"""Returns the default blocks, scaled down to make test run faster."""
|
||||
return [pix2pix.Block(b.num_filters // 32, b.decoder_keep_prob)
|
||||
for b in pix2pix._default_generator_blocks()]
|
||||
|
||||
def test_output_size_nn_upsample_conv(self):
|
||||
batch_size = 2
|
||||
height, width = 256, 256
|
||||
num_outputs = 4
|
||||
|
||||
images = tf.ones((batch_size, height, width, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
logits, _ = pix2pix.pix2pix_generator(
|
||||
images, num_outputs, blocks=self._reduced_default_blocks(),
|
||||
upsample_method='nn_upsample_conv')
|
||||
|
||||
with self.test_session() as session:
|
||||
session.run(tf.compat.v1.global_variables_initializer())
|
||||
np_outputs = session.run(logits)
|
||||
self.assertListEqual([batch_size, height, width, num_outputs],
|
||||
list(np_outputs.shape))
|
||||
|
||||
def test_output_size_conv2d_transpose(self):
|
||||
batch_size = 2
|
||||
height, width = 256, 256
|
||||
num_outputs = 4
|
||||
|
||||
images = tf.ones((batch_size, height, width, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
logits, _ = pix2pix.pix2pix_generator(
|
||||
images, num_outputs, blocks=self._reduced_default_blocks(),
|
||||
upsample_method='conv2d_transpose')
|
||||
|
||||
with self.test_session() as session:
|
||||
session.run(tf.compat.v1.global_variables_initializer())
|
||||
np_outputs = session.run(logits)
|
||||
self.assertListEqual([batch_size, height, width, num_outputs],
|
||||
list(np_outputs.shape))
|
||||
|
||||
def test_block_number_dictates_number_of_layers(self):
|
||||
batch_size = 2
|
||||
height, width = 256, 256
|
||||
num_outputs = 4
|
||||
|
||||
images = tf.ones((batch_size, height, width, 3))
|
||||
blocks = [
|
||||
pix2pix.Block(64, 0.5),
|
||||
pix2pix.Block(128, 0),
|
||||
]
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
_, end_points = pix2pix.pix2pix_generator(
|
||||
images, num_outputs, blocks)
|
||||
|
||||
num_encoder_layers = 0
|
||||
num_decoder_layers = 0
|
||||
for end_point in end_points:
|
||||
if end_point.startswith('encoder'):
|
||||
num_encoder_layers += 1
|
||||
elif end_point.startswith('decoder'):
|
||||
num_decoder_layers += 1
|
||||
|
||||
self.assertEqual(num_encoder_layers, len(blocks))
|
||||
self.assertEqual(num_decoder_layers, len(blocks))
|
||||
|
||||
|
||||
class DiscriminatorTest(tf.test.TestCase):
|
||||
|
||||
def _layer_output_size(self, input_size, kernel_size=4, stride=2, pad=2):
|
||||
return (input_size + pad * 2 - kernel_size) // stride + 1
|
||||
|
||||
def test_four_layers(self):
|
||||
batch_size = 2
|
||||
input_size = 256
|
||||
|
||||
output_size = self._layer_output_size(input_size)
|
||||
output_size = self._layer_output_size(output_size)
|
||||
output_size = self._layer_output_size(output_size)
|
||||
output_size = self._layer_output_size(output_size, stride=1)
|
||||
output_size = self._layer_output_size(output_size, stride=1)
|
||||
|
||||
images = tf.ones((batch_size, input_size, input_size, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
logits, end_points = pix2pix.pix2pix_discriminator(
|
||||
images, num_filters=[64, 128, 256, 512])
|
||||
self.assertListEqual([batch_size, output_size, output_size, 1],
|
||||
logits.shape.as_list())
|
||||
self.assertListEqual([batch_size, output_size, output_size, 1],
|
||||
end_points['predictions'].shape.as_list())
|
||||
|
||||
def test_four_layers_no_padding(self):
|
||||
batch_size = 2
|
||||
input_size = 256
|
||||
|
||||
output_size = self._layer_output_size(input_size, pad=0)
|
||||
output_size = self._layer_output_size(output_size, pad=0)
|
||||
output_size = self._layer_output_size(output_size, pad=0)
|
||||
output_size = self._layer_output_size(output_size, stride=1, pad=0)
|
||||
output_size = self._layer_output_size(output_size, stride=1, pad=0)
|
||||
|
||||
images = tf.ones((batch_size, input_size, input_size, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
logits, end_points = pix2pix.pix2pix_discriminator(
|
||||
images, num_filters=[64, 128, 256, 512], padding=0)
|
||||
self.assertListEqual([batch_size, output_size, output_size, 1],
|
||||
logits.shape.as_list())
|
||||
self.assertListEqual([batch_size, output_size, output_size, 1],
|
||||
end_points['predictions'].shape.as_list())
|
||||
|
||||
def test_four_layers_wrog_paddig(self):
|
||||
batch_size = 2
|
||||
input_size = 256
|
||||
|
||||
images = tf.ones((batch_size, input_size, input_size, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
with self.assertRaises(TypeError):
|
||||
pix2pix.pix2pix_discriminator(
|
||||
images, num_filters=[64, 128, 256, 512], padding=1.5)
|
||||
|
||||
def test_four_layers_negative_padding(self):
|
||||
batch_size = 2
|
||||
input_size = 256
|
||||
|
||||
images = tf.ones((batch_size, input_size, input_size, 3))
|
||||
with contrib_framework.arg_scope(pix2pix.pix2pix_arg_scope()):
|
||||
with self.assertRaises(ValueError):
|
||||
pix2pix.pix2pix_discriminator(
|
||||
images, num_filters=[64, 128, 256, 512], padding=-1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
tf.test.main()
|
||||
+181
@@ -0,0 +1,181 @@
|
||||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Export quantized tflite model from a trained checkpoint."""
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import functools
|
||||
from absl import app
|
||||
from absl import flags
|
||||
import tensorflow as tf
|
||||
import tensorflow_datasets as tfds
|
||||
from nets import nets_factory
|
||||
from preprocessing import preprocessing_factory
|
||||
|
||||
flags.DEFINE_string("model_name", None,
|
||||
"The name of the architecture to quantize.")
|
||||
flags.DEFINE_string("checkpoint_path", None, "Path to the training checkpoint.")
|
||||
flags.DEFINE_string("dataset_name", "imagenet2012",
|
||||
"Name of the dataset to use for quantization calibration.")
|
||||
flags.DEFINE_string("dataset_dir", None, "Dataset location.")
|
||||
flags.DEFINE_string(
|
||||
"dataset_split", "train",
|
||||
"The dataset split (train, validation etc.) to use for calibration.")
|
||||
flags.DEFINE_string("output_tflite", None, "Path to output tflite file.")
|
||||
flags.DEFINE_boolean(
|
||||
"use_model_specific_preprocessing", False,
|
||||
"When true, uses the preprocessing corresponding to the model as specified "
|
||||
"in preprocessing factory.")
|
||||
flags.DEFINE_boolean("enable_ema", True,
|
||||
"Load exponential moving average version of variables.")
|
||||
flags.DEFINE_integer(
|
||||
"num_steps", 1000,
|
||||
"Number of post-training quantization calibration steps to run.")
|
||||
flags.DEFINE_integer("image_size", 224, "Size of the input image.")
|
||||
flags.DEFINE_integer("num_classes", 1001,
|
||||
"Number of output classes for the model.")
|
||||
|
||||
FLAGS = flags.FLAGS
|
||||
|
||||
# Mean and standard deviation used for normalizing the image tensor.
|
||||
_MEAN_RGB = 127.5
|
||||
_STD_RGB = 127.5
|
||||
|
||||
|
||||
def _preprocess_for_quantization(image_data, image_size, crop_padding=32):
|
||||
"""Crops to center of image with padding then scales, normalizes image_size.
|
||||
|
||||
Args:
|
||||
image_data: A 3D Tensor representing the RGB image data. Image can be of
|
||||
arbitrary height and width.
|
||||
image_size: image height/width dimension.
|
||||
crop_padding: the padding size to use when centering the crop.
|
||||
|
||||
Returns:
|
||||
A decoded and cropped image Tensor. Image is normalized to [-1,1].
|
||||
|
||||
"""
|
||||
|
||||
shape = tf.shape(image_data)
|
||||
image_height = shape[0]
|
||||
image_width = shape[1]
|
||||
|
||||
padded_center_crop_size = tf.cast(
|
||||
(image_size * 1.0 / (image_size + crop_padding)) *
|
||||
tf.cast(tf.minimum(image_height, image_width), tf.float32), tf.int32)
|
||||
|
||||
offset_height = ((image_height - padded_center_crop_size) + 1) // 2
|
||||
offset_width = ((image_width - padded_center_crop_size) + 1) // 2
|
||||
|
||||
image = tf.image.crop_to_bounding_box(
|
||||
image_data,
|
||||
offset_height=offset_height,
|
||||
offset_width=offset_width,
|
||||
target_height=padded_center_crop_size,
|
||||
target_width=padded_center_crop_size)
|
||||
|
||||
image = tf.image.resize([image], [image_size, image_size],
|
||||
method=tf.image.ResizeMethod.BICUBIC)[0]
|
||||
image = tf.cast(image, tf.float32)
|
||||
image -= tf.constant(_MEAN_RGB)
|
||||
image /= tf.constant(_STD_RGB)
|
||||
return image
|
||||
|
||||
|
||||
def restore_model(sess, checkpoint_path, enable_ema=True):
|
||||
"""Restore variables from the checkpoint into the provided session.
|
||||
|
||||
Args:
|
||||
sess: A tensorflow session where the checkpoint will be loaded.
|
||||
checkpoint_path: Path to the trained checkpoint.
|
||||
enable_ema: (optional) Whether to load the exponential moving average (ema)
|
||||
version of the tensorflow variables. Defaults to True.
|
||||
"""
|
||||
if enable_ema:
|
||||
ema = tf.train.ExponentialMovingAverage(decay=0.0)
|
||||
ema_vars = tf.trainable_variables() + tf.get_collection("moving_vars")
|
||||
for v in tf.global_variables():
|
||||
if "moving_mean" in v.name or "moving_variance" in v.name:
|
||||
ema_vars.append(v)
|
||||
ema_vars = list(set(ema_vars))
|
||||
var_dict = ema.variables_to_restore(ema_vars)
|
||||
else:
|
||||
var_dict = None
|
||||
|
||||
sess.run(tf.global_variables_initializer())
|
||||
saver = tf.train.Saver(var_dict, max_to_keep=1)
|
||||
saver.restore(sess, checkpoint_path)
|
||||
|
||||
|
||||
def _representative_dataset_gen():
|
||||
"""Gets a python generator of numpy arrays for the given dataset."""
|
||||
image_size = FLAGS.image_size
|
||||
dataset = tfds.builder(FLAGS.dataset_name, data_dir=FLAGS.dataset_dir)
|
||||
dataset.download_and_prepare()
|
||||
data = dataset.as_dataset()[FLAGS.dataset_split]
|
||||
iterator = tf.compat.v1.data.make_one_shot_iterator(data)
|
||||
if FLAGS.use_model_specific_preprocessing:
|
||||
preprocess_fn = functools.partial(
|
||||
preprocessing_factory.get_preprocessing(name=FLAGS.model_name),
|
||||
output_height=image_size,
|
||||
output_width=image_size)
|
||||
else:
|
||||
preprocess_fn = functools.partial(
|
||||
_preprocess_for_quantization, image_size=image_size)
|
||||
features = iterator.get_next()
|
||||
image = features["image"]
|
||||
image = preprocess_fn(image)
|
||||
image = tf.reshape(image, [1, image_size, image_size, 3])
|
||||
for _ in range(FLAGS.num_steps):
|
||||
yield [image.eval()]
|
||||
|
||||
|
||||
def main(_):
|
||||
with tf.Graph().as_default(), tf.Session() as sess:
|
||||
network_fn = nets_factory.get_network_fn(
|
||||
FLAGS.model_name, num_classes=FLAGS.num_classes, is_training=False)
|
||||
image_size = FLAGS.image_size
|
||||
images = tf.placeholder(
|
||||
tf.float32, shape=(1, image_size, image_size, 3), name="images")
|
||||
|
||||
logits, _ = network_fn(images)
|
||||
|
||||
output_tensor = tf.nn.softmax(logits)
|
||||
restore_model(sess, FLAGS.checkpoint_path, enable_ema=FLAGS.enable_ema)
|
||||
|
||||
converter = tf.lite.TFLiteConverter.from_session(sess, [images],
|
||||
[output_tensor])
|
||||
|
||||
converter.representative_dataset = tf.lite.RepresentativeDataset(
|
||||
_representative_dataset_gen)
|
||||
converter.optimizations = [tf.lite.Optimize.DEFAULT]
|
||||
converter.inference_input_type = tf.int8
|
||||
converter.inference_output_type = tf.int8
|
||||
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
|
||||
|
||||
tflite_buffer = converter.convert()
|
||||
with tf.gfile.GFile(FLAGS.output_tflite, "wb") as output_tflite:
|
||||
output_tflite.write(tflite_buffer)
|
||||
print("tflite model written to %s" % FLAGS.output_tflite)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
flags.mark_flag_as_required("model_name")
|
||||
flags.mark_flag_as_required("checkpoint_path")
|
||||
flags.mark_flag_as_required("dataset_dir")
|
||||
flags.mark_flag_as_required("output_tflite")
|
||||
app.run(main)
|
||||
+278
@@ -0,0 +1,278 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains building blocks for various versions of Residual Networks.
|
||||
|
||||
Residual networks (ResNets) were proposed in:
|
||||
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
|
||||
Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015
|
||||
|
||||
More variants were introduced in:
|
||||
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
|
||||
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027, 2016
|
||||
|
||||
We can obtain different ResNet variants by changing the network depth, width,
|
||||
and form of residual unit. This module implements the infrastructure for
|
||||
building them. Concrete ResNet units and full ResNet networks are implemented in
|
||||
the accompanying resnet_v1.py and resnet_v2.py modules.
|
||||
|
||||
Compared to https://github.com/KaimingHe/deep-residual-networks, in the current
|
||||
implementation we subsample the output activations in the last residual unit of
|
||||
each block, instead of subsampling the input activations in the first residual
|
||||
unit of each block. The two implementations give identical results but our
|
||||
implementation is more memory efficient.
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import collections
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
|
||||
"""A named tuple describing a ResNet block.
|
||||
|
||||
Its parts are:
|
||||
scope: The scope of the `Block`.
|
||||
unit_fn: The ResNet unit function which takes as input a `Tensor` and
|
||||
returns another `Tensor` with the output of the ResNet unit.
|
||||
args: A list of length equal to the number of units in the `Block`. The list
|
||||
contains one (depth, depth_bottleneck, stride) tuple for each unit in the
|
||||
block to serve as argument to unit_fn.
|
||||
"""
|
||||
|
||||
|
||||
def subsample(inputs, factor, scope=None):
|
||||
"""Subsamples the input along the spatial dimensions.
|
||||
|
||||
Args:
|
||||
inputs: A `Tensor` of size [batch, height_in, width_in, channels].
|
||||
factor: The subsampling factor.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
output: A `Tensor` of size [batch, height_out, width_out, channels] with the
|
||||
input, either intact (if factor == 1) or subsampled (if factor > 1).
|
||||
"""
|
||||
if factor == 1:
|
||||
return inputs
|
||||
else:
|
||||
return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)
|
||||
|
||||
|
||||
def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
|
||||
"""Strided 2-D convolution with 'SAME' padding.
|
||||
|
||||
When stride > 1, then we do explicit zero-padding, followed by conv2d with
|
||||
'VALID' padding.
|
||||
|
||||
Note that
|
||||
|
||||
net = conv2d_same(inputs, num_outputs, 3, stride=stride)
|
||||
|
||||
is equivalent to
|
||||
|
||||
net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
|
||||
net = subsample(net, factor=stride)
|
||||
|
||||
whereas
|
||||
|
||||
net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
|
||||
|
||||
is different when the input's height or width is even, which is why we add the
|
||||
current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
|
||||
|
||||
Args:
|
||||
inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
|
||||
num_outputs: An integer, the number of output filters.
|
||||
kernel_size: An int with the kernel_size of the filters.
|
||||
stride: An integer, the output stride.
|
||||
rate: An integer, rate for atrous convolution.
|
||||
scope: Scope.
|
||||
|
||||
Returns:
|
||||
output: A 4-D tensor of size [batch, height_out, width_out, channels] with
|
||||
the convolution output.
|
||||
"""
|
||||
if stride == 1:
|
||||
return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, rate=rate,
|
||||
padding='SAME', scope=scope)
|
||||
else:
|
||||
kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
|
||||
pad_total = kernel_size_effective - 1
|
||||
pad_beg = pad_total // 2
|
||||
pad_end = pad_total - pad_beg
|
||||
inputs = tf.pad(
|
||||
tensor=inputs,
|
||||
paddings=[[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
|
||||
return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
|
||||
rate=rate, padding='VALID', scope=scope)
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def stack_blocks_dense(net, blocks, output_stride=None,
|
||||
store_non_strided_activations=False,
|
||||
outputs_collections=None):
|
||||
"""Stacks ResNet `Blocks` and controls output feature density.
|
||||
|
||||
First, this function creates scopes for the ResNet in the form of
|
||||
'block_name/unit_1', 'block_name/unit_2', etc.
|
||||
|
||||
Second, this function allows the user to explicitly control the ResNet
|
||||
output_stride, which is the ratio of the input to output spatial resolution.
|
||||
This is useful for dense prediction tasks such as semantic segmentation or
|
||||
object detection.
|
||||
|
||||
Most ResNets consist of 4 ResNet blocks and subsample the activations by a
|
||||
factor of 2 when transitioning between consecutive ResNet blocks. This results
|
||||
to a nominal ResNet output_stride equal to 8. If we set the output_stride to
|
||||
half the nominal network stride (e.g., output_stride=4), then we compute
|
||||
responses twice.
|
||||
|
||||
Control of the output feature density is implemented by atrous convolution.
|
||||
|
||||
Args:
|
||||
net: A `Tensor` of size [batch, height, width, channels].
|
||||
blocks: A list of length equal to the number of ResNet `Blocks`. Each
|
||||
element is a ResNet `Block` object describing the units in the `Block`.
|
||||
output_stride: If `None`, then the output will be computed at the nominal
|
||||
network stride. If output_stride is not `None`, it specifies the requested
|
||||
ratio of input to output spatial resolution, which needs to be equal to
|
||||
the product of unit strides from the start up to some level of the ResNet.
|
||||
For example, if the ResNet employs units with strides 1, 2, 1, 3, 4, 1,
|
||||
then valid values for the output_stride are 1, 2, 6, 24 or None (which
|
||||
is equivalent to output_stride=24).
|
||||
store_non_strided_activations: If True, we compute non-strided (undecimated)
|
||||
activations at the last unit of each block and store them in the
|
||||
`outputs_collections` before subsampling them. This gives us access to
|
||||
higher resolution intermediate activations which are useful in some
|
||||
dense prediction problems but increases 4x the computation and memory cost
|
||||
at the last unit of each block.
|
||||
outputs_collections: Collection to add the ResNet block outputs.
|
||||
|
||||
Returns:
|
||||
net: Output tensor with stride equal to the specified output_stride.
|
||||
|
||||
Raises:
|
||||
ValueError: If the target output_stride is not valid.
|
||||
"""
|
||||
# The current_stride variable keeps track of the effective stride of the
|
||||
# activations. This allows us to invoke atrous convolution whenever applying
|
||||
# the next residual unit would result in the activations having stride larger
|
||||
# than the target output_stride.
|
||||
current_stride = 1
|
||||
|
||||
# The atrous convolution rate parameter.
|
||||
rate = 1
|
||||
|
||||
for block in blocks:
|
||||
with tf.compat.v1.variable_scope(block.scope, 'block', [net]) as sc:
|
||||
block_stride = 1
|
||||
for i, unit in enumerate(block.args):
|
||||
if store_non_strided_activations and i == len(block.args) - 1:
|
||||
# Move stride from the block's last unit to the end of the block.
|
||||
block_stride = unit.get('stride', 1)
|
||||
unit = dict(unit, stride=1)
|
||||
|
||||
with tf.compat.v1.variable_scope('unit_%d' % (i + 1), values=[net]):
|
||||
# If we have reached the target output_stride, then we need to employ
|
||||
# atrous convolution with stride=1 and multiply the atrous rate by the
|
||||
# current unit's stride for use in subsequent layers.
|
||||
if output_stride is not None and current_stride == output_stride:
|
||||
net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
|
||||
rate *= unit.get('stride', 1)
|
||||
|
||||
else:
|
||||
net = block.unit_fn(net, rate=1, **unit)
|
||||
current_stride *= unit.get('stride', 1)
|
||||
if output_stride is not None and current_stride > output_stride:
|
||||
raise ValueError('The target output_stride cannot be reached.')
|
||||
|
||||
# Collect activations at the block's end before performing subsampling.
|
||||
net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
|
||||
|
||||
# Subsampling of the block's output activations.
|
||||
if output_stride is not None and current_stride == output_stride:
|
||||
rate *= block_stride
|
||||
else:
|
||||
net = subsample(net, block_stride)
|
||||
current_stride *= block_stride
|
||||
if output_stride is not None and current_stride > output_stride:
|
||||
raise ValueError('The target output_stride cannot be reached.')
|
||||
|
||||
if output_stride is not None and current_stride != output_stride:
|
||||
raise ValueError('The target output_stride cannot be reached.')
|
||||
|
||||
return net
|
||||
|
||||
|
||||
def resnet_arg_scope(
|
||||
weight_decay=0.0001,
|
||||
batch_norm_decay=0.997,
|
||||
batch_norm_epsilon=1e-5,
|
||||
batch_norm_scale=True,
|
||||
activation_fn=tf.nn.relu,
|
||||
use_batch_norm=True,
|
||||
batch_norm_updates_collections=tf.compat.v1.GraphKeys.UPDATE_OPS):
|
||||
"""Defines the default ResNet arg scope.
|
||||
|
||||
TODO(gpapan): The batch-normalization related default values above are
|
||||
appropriate for use in conjunction with the reference ResNet models
|
||||
released at https://github.com/KaimingHe/deep-residual-networks. When
|
||||
training ResNets from scratch, they might need to be tuned.
|
||||
|
||||
Args:
|
||||
weight_decay: The weight decay to use for regularizing the model.
|
||||
batch_norm_decay: The moving average decay when estimating layer activation
|
||||
statistics in batch normalization.
|
||||
batch_norm_epsilon: Small constant to prevent division by zero when
|
||||
normalizing activations by their variance in batch normalization.
|
||||
batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
|
||||
activations in the batch normalization layer.
|
||||
activation_fn: The activation function which is used in ResNet.
|
||||
use_batch_norm: Whether or not to use batch normalization.
|
||||
batch_norm_updates_collections: Collection for the update ops for
|
||||
batch norm.
|
||||
|
||||
Returns:
|
||||
An `arg_scope` to use for the resnet models.
|
||||
"""
|
||||
batch_norm_params = {
|
||||
'decay': batch_norm_decay,
|
||||
'epsilon': batch_norm_epsilon,
|
||||
'scale': batch_norm_scale,
|
||||
'updates_collections': batch_norm_updates_collections,
|
||||
'fused': None, # Use fused batch norm if possible.
|
||||
}
|
||||
|
||||
with slim.arg_scope(
|
||||
[slim.conv2d],
|
||||
weights_regularizer=slim.l2_regularizer(weight_decay),
|
||||
weights_initializer=slim.variance_scaling_initializer(),
|
||||
activation_fn=activation_fn,
|
||||
normalizer_fn=slim.batch_norm if use_batch_norm else None,
|
||||
normalizer_params=batch_norm_params):
|
||||
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
|
||||
# The following implies padding='SAME' for pool1, which makes feature
|
||||
# alignment easier for dense prediction tasks. This is also used in
|
||||
# https://github.com/facebook/fb.resnet.torch. However the accompanying
|
||||
# code of 'Deep Residual Learning for Image Recognition' uses
|
||||
# padding='VALID' for pool1. You can switch to that choice by setting
|
||||
# slim.arg_scope([slim.max_pool2d], padding='VALID').
|
||||
with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
|
||||
return arg_sc
|
||||
+406
@@ -0,0 +1,406 @@
|
||||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""Contains definitions for the original form of Residual Networks.
|
||||
|
||||
The 'v1' residual networks (ResNets) implemented in this module were proposed
|
||||
by:
|
||||
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
|
||||
Deep Residual Learning for Image Recognition. arXiv:1512.03385
|
||||
|
||||
Other variants were introduced in:
|
||||
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
|
||||
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
|
||||
|
||||
The networks defined in this module utilize the bottleneck building block of
|
||||
[1] with projection shortcuts only for increasing depths. They employ batch
|
||||
normalization *after* every weight layer. This is the architecture used by
|
||||
MSRA in the Imagenet and MSCOCO 2016 competition models ResNet-101 and
|
||||
ResNet-152. See [2; Fig. 1a] for a comparison between the current 'v1'
|
||||
architecture and the alternative 'v2' architecture of [2] which uses batch
|
||||
normalization *before* every weight layer in the so-called full pre-activation
|
||||
units.
|
||||
|
||||
Typical use:
|
||||
|
||||
from tensorflow.contrib.slim.nets import resnet_v1
|
||||
|
||||
ResNet-101 for image classification into 1000 classes:
|
||||
|
||||
# inputs has shape [batch, 224, 224, 3]
|
||||
with slim.arg_scope(resnet_v1.resnet_arg_scope()):
|
||||
net, end_points = resnet_v1.resnet_v1_101(inputs, 1000, is_training=False)
|
||||
|
||||
ResNet-101 for semantic segmentation into 21 classes:
|
||||
|
||||
# inputs has shape [batch, 513, 513, 3]
|
||||
with slim.arg_scope(resnet_v1.resnet_arg_scope()):
|
||||
net, end_points = resnet_v1.resnet_v1_101(inputs,
|
||||
21,
|
||||
is_training=False,
|
||||
global_pool=False,
|
||||
output_stride=16)
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import tensorflow as tf
|
||||
from tensorflow.contrib import slim as contrib_slim
|
||||
|
||||
from nets import resnet_utils
|
||||
|
||||
|
||||
resnet_arg_scope = resnet_utils.resnet_arg_scope
|
||||
slim = contrib_slim
|
||||
|
||||
|
||||
class NoOpScope(object):
|
||||
"""No-op context manager."""
|
||||
|
||||
def __enter__(self):
|
||||
return None
|
||||
|
||||
def __exit__(self, exc_type, exc_value, traceback):
|
||||
return False
|
||||
|
||||
|
||||
@slim.add_arg_scope
|
||||
def bottleneck(inputs,
|
||||
depth,
|
||||
depth_bottleneck,
|
||||
stride,
|
||||
rate=1,
|
||||
outputs_collections=None,
|
||||
scope=None,
|
||||
use_bounded_activations=False):
|
||||
"""Bottleneck residual unit variant with BN after convolutions.
|
||||
|
||||
This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
|
||||
its definition. Note that we use here the bottleneck variant which has an
|
||||
extra bottleneck layer.
|
||||
|
||||
When putting together two consecutive ResNet blocks that use this unit, one
|
||||
should use stride = 2 in the last unit of the first block.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch, height, width, channels].
|
||||
depth: The depth of the ResNet unit output.
|
||||
depth_bottleneck: The depth of the bottleneck layers.
|
||||
stride: The ResNet unit's stride. Determines the amount of downsampling of
|
||||
the units output compared to its input.
|
||||
rate: An integer, rate for atrous convolution.
|
||||
outputs_collections: Collection to add the ResNet unit output.
|
||||
scope: Optional variable_scope.
|
||||
use_bounded_activations: Whether or not to use bounded activations. Bounded
|
||||
activations better lend themselves to quantized inference.
|
||||
|
||||
Returns:
|
||||
The ResNet unit's output.
|
||||
"""
|
||||
with tf.compat.v1.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
|
||||
depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
|
||||
if depth == depth_in:
|
||||
shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
|
||||
else:
|
||||
shortcut = slim.conv2d(
|
||||
inputs,
|
||||
depth, [1, 1],
|
||||
stride=stride,
|
||||
activation_fn=tf.nn.relu6 if use_bounded_activations else None,
|
||||
scope='shortcut')
|
||||
|
||||
residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1,
|
||||
scope='conv1')
|
||||
residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
|
||||
rate=rate, scope='conv2')
|
||||
residual = slim.conv2d(residual, depth, [1, 1], stride=1,
|
||||
activation_fn=None, scope='conv3')
|
||||
|
||||
if use_bounded_activations:
|
||||
# Use clip_by_value to simulate bandpass activation.
|
||||
residual = tf.clip_by_value(residual, -6.0, 6.0)
|
||||
output = tf.nn.relu6(shortcut + residual)
|
||||
else:
|
||||
output = tf.nn.relu(shortcut + residual)
|
||||
|
||||
return slim.utils.collect_named_outputs(outputs_collections,
|
||||
sc.name,
|
||||
output)
|
||||
|
||||
|
||||
def resnet_v1(inputs,
|
||||
blocks,
|
||||
num_classes=None,
|
||||
is_training=True,
|
||||
global_pool=True,
|
||||
output_stride=None,
|
||||
include_root_block=True,
|
||||
spatial_squeeze=True,
|
||||
store_non_strided_activations=False,
|
||||
reuse=None,
|
||||
scope=None):
|
||||
"""Generator for v1 ResNet models.
|
||||
|
||||
This function generates a family of ResNet v1 models. See the resnet_v1_*()
|
||||
methods for specific model instantiations, obtained by selecting different
|
||||
block instantiations that produce ResNets of various depths.
|
||||
|
||||
Training for image classification on Imagenet is usually done with [224, 224]
|
||||
inputs, resulting in [7, 7] feature maps at the output of the last ResNet
|
||||
block for the ResNets defined in [1] that have nominal stride equal to 32.
|
||||
However, for dense prediction tasks we advise that one uses inputs with
|
||||
spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
|
||||
this case the feature maps at the ResNet output will have spatial shape
|
||||
[(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
|
||||
and corners exactly aligned with the input image corners, which greatly
|
||||
facilitates alignment of the features to the image. Using as input [225, 225]
|
||||
images results in [8, 8] feature maps at the output of the last ResNet block.
|
||||
|
||||
For dense prediction tasks, the ResNet needs to run in fully-convolutional
|
||||
(FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
|
||||
have nominal stride equal to 32 and a good choice in FCN mode is to use
|
||||
output_stride=16 in order to increase the density of the computed features at
|
||||
small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
|
||||
|
||||
Args:
|
||||
inputs: A tensor of size [batch, height_in, width_in, channels].
|
||||
blocks: A list of length equal to the number of ResNet blocks. Each element
|
||||
is a resnet_utils.Block object describing the units in the block.
|
||||
num_classes: Number of predicted classes for classification tasks.
|
||||
If 0 or None, we return the features before the logit layer.
|
||||
is_training: whether batch_norm layers are in training mode. If this is set
|
||||
to None, the callers can specify slim.batch_norm's is_training parameter
|
||||
from an outer slim.arg_scope.
|
||||
global_pool: If True, we perform global average pooling before computing the
|
||||
logits. Set to True for image classification, False for dense prediction.
|
||||
output_stride: If None, then the output will be computed at the nominal
|
||||
network stride. If output_stride is not None, it specifies the requested
|
||||
ratio of input to output spatial resolution.
|
||||
include_root_block: If True, include the initial convolution followed by
|
||||
max-pooling, if False excludes it.
|
||||
spatial_squeeze: if True, logits is of shape [B, C], if false logits is
|
||||
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
|
||||
To use this parameter, the input images must be smaller than 300x300
|
||||
pixels, in which case the output logit layer does not contain spatial
|
||||
information and can be removed.
|
||||
store_non_strided_activations: If True, we compute non-strided (undecimated)
|
||||
activations at the last unit of each block and store them in the
|
||||
`outputs_collections` before subsampling them. This gives us access to
|
||||
higher resolution intermediate activations which are useful in some
|
||||
dense prediction problems but increases 4x the computation and memory cost
|
||||
at the last unit of each block.
|
||||
reuse: whether or not the network and its variables should be reused. To be
|
||||
able to reuse 'scope' must be given.
|
||||
scope: Optional variable_scope.
|
||||
|
||||
Returns:
|
||||
net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
|
||||
If global_pool is False, then height_out and width_out are reduced by a
|
||||
factor of output_stride compared to the respective height_in and width_in,
|
||||
else both height_out and width_out equal one. If num_classes is 0 or None,
|
||||
then net is the output of the last ResNet block, potentially after global
|
||||
average pooling. If num_classes a non-zero integer, net contains the
|
||||
pre-softmax activations.
|
||||
end_points: A dictionary from components of the network to the corresponding
|
||||
activation.
|
||||
|
||||
Raises:
|
||||
ValueError: If the target output_stride is not valid.
|
||||
"""
|
||||
with tf.compat.v1.variable_scope(
|
||||
scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
|
||||
end_points_collection = sc.original_name_scope + '_end_points'
|
||||
with slim.arg_scope([slim.conv2d, bottleneck,
|
||||
resnet_utils.stack_blocks_dense],
|
||||
outputs_collections=end_points_collection):
|
||||
with (slim.arg_scope([slim.batch_norm], is_training=is_training)
|
||||
if is_training is not None else NoOpScope()):
|
||||
net = inputs
|
||||
if include_root_block:
|
||||
if output_stride is not None:
|
||||
if output_stride % 4 != 0:
|
||||
raise ValueError('The output_stride needs to be a multiple of 4.')
|
||||
output_stride /= 4
|
||||
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
|
||||
net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
|
||||
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride,
|
||||
store_non_strided_activations)
|
||||
# Convert end_points_collection into a dictionary of end_points.
|
||||
end_points = slim.utils.convert_collection_to_dict(
|
||||
end_points_collection)
|
||||
|
||||
if global_pool:
|
||||
# Global average pooling.
|
||||
net = tf.reduce_mean(
|
||||
input_tensor=net, axis=[1, 2], name='pool5', keepdims=True)
|
||||
end_points['global_pool'] = net
|
||||
if num_classes:
|
||||
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
|
||||
normalizer_fn=None, scope='logits')
|
||||
end_points[sc.name + '/logits'] = net
|
||||
if spatial_squeeze:
|
||||
net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
|
||||
end_points[sc.name + '/spatial_squeeze'] = net
|
||||
end_points['predictions'] = slim.softmax(net, scope='predictions')
|
||||
return net, end_points
|
||||
resnet_v1.default_image_size = 224
|
||||
|
||||
|
||||
def resnet_v1_block(scope, base_depth, num_units, stride):
|
||||
"""Helper function for creating a resnet_v1 bottleneck block.
|
||||
|
||||
Args:
|
||||
scope: The scope of the block.
|
||||
base_depth: The depth of the bottleneck layer for each unit.
|
||||
num_units: The number of units in the block.
|
||||
stride: The stride of the block, implemented as a stride in the last unit.
|
||||
All other units have stride=1.
|
||||
|
||||
Returns:
|
||||
A resnet_v1 bottleneck block.
|
||||
"""
|
||||
return resnet_utils.Block(scope, bottleneck, [{
|
||||
'depth': base_depth * 4,
|
||||
'depth_bottleneck': base_depth,
|
||||
'stride': 1
|
||||
}] * (num_units - 1) + [{
|
||||
'depth': base_depth * 4,
|
||||
'depth_bottleneck': base_depth,
|
||||
'stride': stride
|
||||
}])
|
||||
|
||||
|
||||
def resnet_v1_50(inputs,
|
||||
num_classes=None,
|
||||
is_training=True,
|
||||
global_pool=True,
|
||||
output_stride=None,
|
||||
spatial_squeeze=True,
|
||||
store_non_strided_activations=False,
|
||||
min_base_depth=8,
|
||||
depth_multiplier=1,
|
||||
reuse=None,
|
||||
scope='resnet_v1_50'):
|
||||
"""ResNet-50 model of [1]. See resnet_v1() for arg and return description."""
|
||||
depth_func = lambda d: max(int(d * depth_multiplier), min_base_depth)
|
||||
blocks = [
|
||||
resnet_v1_block('block1', base_depth=depth_func(64), num_units=3,
|
||||
stride=2),
|
||||
resnet_v1_block('block2', base_depth=depth_func(128), num_units=4,
|
||||
stride=2),
|
||||
resnet_v1_block('block3', base_depth=depth_func(256), num_units=6,
|
||||
stride=2),
|
||||
resnet_v1_block('block4', base_depth=depth_func(512), num_units=3,
|
||||
stride=1),
|
||||
]
|
||||
return resnet_v1(inputs, blocks, num_classes, is_training,
|
||||
global_pool=global_pool, output_stride=output_stride,
|
||||
include_root_block=True, spatial_squeeze=spatial_squeeze,
|
||||
store_non_strided_activations=store_non_strided_activations,
|
||||
reuse=reuse, scope=scope)
|
||||
resnet_v1_50.default_image_size = resnet_v1.default_image_size
|
||||
|
||||
|
||||
def resnet_v1_101(inputs,
|
||||
num_classes=None,
|
||||
is_training=True,
|
||||
global_pool=True,
|
||||
output_stride=None,
|
||||
spatial_squeeze=True,
|
||||
store_non_strided_activations=False,
|
||||
min_base_depth=8,
|
||||
depth_multiplier=1,
|
||||
reuse=None,
|
||||
scope='resnet_v1_101'):
|
||||
"""ResNet-101 model of [1]. See resnet_v1() for arg and return description."""
|
||||
depth_func = lambda d: max(int(d * depth_multiplier), min_base_depth)
|
||||
blocks = [
|
||||
resnet_v1_block('block1', base_depth=depth_func(64), num_units=3,
|
||||
stride=2),
|
||||
resnet_v1_block('block2', base_depth=depth_func(128), num_units=4,
|
||||
stride=2),
|
||||
resnet_v1_block('block3', base_depth=depth_func(256), num_units=23,
|
||||
stride=2),
|
||||
resnet_v1_block('block4', base_depth=depth_func(512), num_units=3,
|
||||
stride=1),
|
||||
]
|
||||
return resnet_v1(inputs, blocks, num_classes, is_training,
|
||||
global_pool=global_pool, output_stride=output_stride,
|
||||
include_root_block=True, spatial_squeeze=spatial_squeeze,
|
||||
store_non_strided_activations=store_non_strided_activations,
|
||||
reuse=reuse, scope=scope)
|
||||
resnet_v1_101.default_image_size = resnet_v1.default_image_size
|
||||
|
||||
|
||||
def resnet_v1_152(inputs,
|
||||
num_classes=None,
|
||||
is_training=True,
|
||||
global_pool=True,
|
||||
output_stride=None,
|
||||
store_non_strided_activations=False,
|
||||
spatial_squeeze=True,
|
||||
min_base_depth=8,
|
||||
depth_multiplier=1,
|
||||
reuse=None,
|
||||
scope='resnet_v1_152'):
|
||||
"""ResNet-152 model of [1]. See resnet_v1() for arg and return description."""
|
||||
depth_func = lambda d: max(int(d * depth_multiplier), min_base_depth)
|
||||
blocks = [
|
||||
resnet_v1_block('block1', base_depth=depth_func(64), num_units=3,
|
||||
stride=2),
|
||||
resnet_v1_block('block2', base_depth=depth_func(128), num_units=8,
|
||||
stride=2),
|
||||
resnet_v1_block('block3', base_depth=depth_func(256), num_units=36,
|
||||
stride=2),
|
||||
resnet_v1_block('block4', base_depth=depth_func(512), num_units=3,
|
||||
stride=1),
|
||||
]
|
||||
return resnet_v1(inputs, blocks, num_classes, is_training,
|
||||
global_pool=global_pool, output_stride=output_stride,
|
||||
include_root_block=True, spatial_squeeze=spatial_squeeze,
|
||||
store_non_strided_activations=store_non_strided_activations,
|
||||
reuse=reuse, scope=scope)
|
||||
resnet_v1_152.default_image_size = resnet_v1.default_image_size
|
||||
|
||||
|
||||
def resnet_v1_200(inputs,
|
||||
num_classes=None,
|
||||
is_training=True,
|
||||
global_pool=True,
|
||||
output_stride=None,
|
||||
store_non_strided_activations=False,
|
||||
spatial_squeeze=True,
|
||||
min_base_depth=8,
|
||||
depth_multiplier=1,
|
||||
reuse=None,
|
||||
scope='resnet_v1_200'):
|
||||
"""ResNet-200 model of [2]. See resnet_v1() for arg and return description."""
|
||||
depth_func = lambda d: max(int(d * depth_multiplier), min_base_depth)
|
||||
blocks = [
|
||||
resnet_v1_block('block1', base_depth=depth_func(64), num_units=3,
|
||||
stride=2),
|
||||
resnet_v1_block('block2', base_depth=depth_func(128), num_units=24,
|
||||
stride=2),
|
||||
resnet_v1_block('block3', base_depth=depth_func(256), num_units=36,
|
||||
stride=2),
|
||||
resnet_v1_block('block4', base_depth=depth_func(512), num_units=3,
|
||||
stride=1),
|
||||
]
|
||||
return resnet_v1(inputs, blocks, num_classes, is_training,
|
||||
global_pool=global_pool, output_stride=output_stride,
|
||||
include_root_block=True, spatial_squeeze=spatial_squeeze,
|
||||
store_non_strided_activations=store_non_strided_activations,
|
||||
reuse=reuse, scope=scope)
|
||||
resnet_v1_200.default_image_size = resnet_v1.default_image_size
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user