Add CNTK backend.

2017-06-06 23:03:04 -07:00
commit 75d9415c82
@@ -17,6 +17,10 @@ matrix:
          env: KERAS_BACKEND=theano
        - python: 3.5
          env: KERAS_BACKEND=theano
        - python: 2.7
          env: KERAS_BACKEND=cntk
        - python: 3.5
          env: KERAS_BACKEND=cntk
 install:
  # code below is taken from http://conda.pydata.org/docs/travis.html
  # We do this conditionally because it saves us some downloading if the
@@ -49,6 +53,22 @@ install:
  # install TensorFlow (CPU version).
  - pip install tensorflow
  # install cntk
  - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
      pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0-cp27-cp27mu-linux_x86_64.whl;
    elif [[ "$TRAVIS_PYTHON_VERSION" == "3.5" ]]; then
      pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0-cp35-cp35m-linux_x86_64.whl;
    fi
  #install open mpi
  - rm -rf ~/mpi
  - mkdir ~/mpi
  - pushd ~/mpi
  - wget http://cntk.ai/PythonWheel/ForKeras/depends/openmpi_1.10-3.zip
  - unzip ./openmpi_1.10-3.zip
  - sudo dpkg -i openmpi_1.10-3.deb
  - popd
 # command to run tests
 script:
@@ -8,6 +8,10 @@ All contributions by Google:
 Copyright (c) 2015, Google, Inc.
 All rights reserved.
 All contributions by Microsoft:
 Copyright (c) 2017, Microsoft, Inc.
 All rights reserved.
 All other contributions:
 Copyright (c) 2015 - 2017, the respective contributors.
 All rights reserved.
@@ -1,11 +1,11 @@
-# Keras: Deep Learning library for TensorFlow and Theano
+# Keras: Deep Learning library for TensorFlow, CNTK, and Theano
 [![Build Status](https://travis-ci.org/fchollet/keras.svg?branch=master)](https://travis-ci.org/fchollet/keras)
 [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/fchollet/keras/blob/master/LICENSE)
 ## You have just found Keras.
-Keras is a high-level neural networks API, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research.*
+Keras is a high-level neural networks API, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) [CNTK](https://github.com/Microsoft/cntk) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research.*
 Use Keras if you need a deep learning library that:
@@ -125,6 +125,11 @@ Keras uses the following dependencies:
 - TensorFlow
    - [See installation instructions](https://www.tensorflow.org/install/).
 *When using the CNTK backend:*
 - CNTK
    - [See installation instructions](https://docs.microsoft.com/en-us/cognitive-toolkit/setup-cntk-on-your-machine).
 *When using the Theano backend:*
 - Theano
@@ -143,7 +148,7 @@ sudo pip install keras
 ------------------
-## Switching from TensorFlow to Theano
+## Switching from TensorFlow to CNTK or Theano
 By default, Keras will use TensorFlow as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
@@ -4,12 +4,13 @@
 Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.
-At this time, Keras has two backend implementations available: the **TensorFlow** backend and the **Theano** backend.
+At this time, Keras has three backend implementations available: the **TensorFlow** backend, the **Theano** backend, and the **CNTK** backend.
 - [TensorFlow](http://www.tensorflow.org/) is an open-source symbolic tensor manipulation framework developed by Google, Inc.
 - [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA/MILA Lab at Université de Montréal.
 - [CNTK](https://www.microsoft.com/en-us/cognitive-toolkit/) is an open-source, commercial-grade toolkit for deep learning developed by Microsoft.
-In the future, we are likely to add more backend options. Go ask Microsoft about how their CNTK backend project is doing.
+In the future, we are likely to add more backend options.
 ----
@@ -34,7 +35,7 @@ The default configuration file looks like this:
 }
 ```
-Simply change the field `backend` to either `"theano"` or `"tensorflow"`, and Keras will use the new configuration next time you run any Keras code.
+Simply change the field `backend` to `"theano"`, `"tensorflow"`, or `"cntk"`, and Keras will use the new configuration next time you run any Keras code.
 You can also define the environment variable ``KERAS_BACKEND`` and this will
 override what is defined in your config file :
@@ -65,7 +66,7 @@ You can change these settings by editing `$HOME/.keras/keras.json`.
  - For 3D data, `"channels_last"` assumes `(conv_dim1, conv_dim2, conv_dim3, channels)` while `"channels_first"` assumes `(channels, conv_dim1, conv_dim2, conv_dim3)`.
 * `epsilon`: float, a numeric fuzzing constant used to avoid dividing by zero in some operations.
 * `floatx`: string, `"float16"`, `"float32"`, or `"float64"`. Default float precision.
-* `backend`: string, `"tensorflow"` or `"theano"`.
+* `backend`: string, `"tensorflow"`, `"theano"`, or `"cntk"`.
 ----
@@ -38,7 +38,7 @@ Please cite Keras in your publications if it helps your research. Here is an exa
 ### How can I run Keras on GPU?
-If you are running on the TensorFlow backend, your code will automatically run on GPU if any available GPU is detected.
+If you are running on the TensorFlow or CNTK backends, your code will automatically run on GPU if any available GPU is detected.
 If you are running on the Theano backend, you can use one of the following methods:
@@ -1,3 +1,3 @@
-# Keras: Deep Learning library for Theano and TensorFlow
+# Keras: The Python Deep Learning library
 {{autogenerated}}
@@ -32,7 +32,7 @@ if os.path.exists(_config_path):
    _epsilon = _config.get('epsilon', epsilon())
    assert isinstance(_epsilon, float)
    _backend = _config.get('backend', _BACKEND)
-    assert _backend in {'theano', 'tensorflow'}
+    assert _backend in {'theano', 'tensorflow', 'cntk'}
    _image_data_format = _config.get('image_data_format',
                                     image_data_format())
    assert _image_data_format in {'channels_last', 'channels_first'}
@@ -68,11 +68,14 @@ if not os.path.exists(_config_path):
 # Set backend based on KERAS_BACKEND flag, if applicable.
 if 'KERAS_BACKEND' in os.environ:
    _backend = os.environ['KERAS_BACKEND']
-    assert _backend in {'theano', 'tensorflow'}
+    assert _backend in {'theano', 'tensorflow', 'cntk'}
    _BACKEND = _backend
 # Import backend functions.
-if _BACKEND == 'theano':
+if _BACKEND == 'cntk':
    sys.stderr.write('Using CNTK backend\n')
    from .cntk_backend import *
 elif _BACKEND == 'theano':
    sys.stderr.write('Using Theano backend.\n')
    from .theano_backend import *
 elif _BACKEND == 'tensorflow':
@@ -3346,28 +3346,54 @@ def bias_add(x, bias, data_format=None):
        Output tensor.
    # Raises
-        ValueError: In case of invalid `data_format` argument.
+        ValueError: In one of the two cases below:
                    1. invalid `data_format` argument.
                    2. invalid bias shape.
                       the bias should be either a vector or
                       a tensor with ndim(x) - 1 dimension
    """
    if data_format is None:
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    bias_shape = int_shape(bias)
    if len(bias_shape) != 1 and len(bias_shape) != ndim(x) - 1:
        raise ValueError('Unexpected bias dimensions %d, expect to be 1 or %d dimensions'
                         % (len(bias_shape), ndim(x)))
    if ndim(x) == 5:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, int_shape(bias)[0], 1, 1, 1))
+            if len(bias_shape) == 1:
                x += reshape(bias, (1, bias_shape[0], 1, 1, 1))
            else:
                x += reshape(bias, (1, bias_shape[3]) + bias_shape[:3])
        elif data_format == 'channels_last':
-            x += reshape(bias, (1, 1, 1, 1, int_shape(bias)[0]))
+            if len(bias_shape) == 1:
                x += reshape(bias, (1, 1, 1, bias_shape[0]))
            else:
                x += reshape(bias, (1,) + bias_shape)
    elif ndim(x) == 4:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, int_shape(bias)[0], 1, 1))
+            if len(bias_shape) == 1:
                x += reshape(bias, (1, bias_shape[0], 1, 1))
            else:
                x += reshape(bias, (1, bias_shape[2]) + bias_shape[:2])
        elif data_format == 'channels_last':
-            x = tf.nn.bias_add(x, bias,
+            if len(bias_shape) == 1:
-                               data_format='NHWC')
+                x = tf.nn.bias_add(x, bias,
                                   data_format='NHWC')
            else:
                x += reshape(bias, (1,) + bias_shape)
    elif ndim(x) == 3:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, int_shape(bias)[0], 1))
+            if len(bias_shape) == 1:
                x += reshape(bias, (1, bias_shape[0], 1))
            else:
                x += reshape(bias, (1, bias_shape[1], bias_shape[0]))
        elif data_format == 'channels_last':
-            x += reshape(bias, (1, 1, int_shape(bias)[0]))
+            if len(bias_shape) == 1:
                x += reshape(bias, (1, 1, bias_shape[0]))
            else:
                x += reshape(bias, (1, ) + bias_shape)
    else:
        x = tf.nn.bias_add(x, bias)
    return x
@@ -3632,3 +3658,110 @@ def foldr(fn, elems, initializer=None, name=None):
        Same type and shape as initializer
    """
    return tf.foldr(fn, elems, initializer=initializer, name=name)
 def local_conv1d(inputs, kernel, kernel_size, strides, data_format=None):
    """Apply 1D conv with un-shared weights.
    # Arguments
        inputs: 3D tensor with shape: (batch_size, steps, input_dim)
        kernel: the unshared weight for convolution,
                with shape (output_length, feature_dim, filters)
        kernel_size: a tuple of a single integer,
                     specifying the length of the 1D convolution window
        strides: a tuple of a single integer,
                 specifying the stride length of the convolution
        data_format: the data format, channels_first or channels_last
    # Returns
        the tensor after 1d conv with un-shared weights, with shape (batch_size, output_lenght, filters)
    # Raises
        ValueError: if `data_format` is neither `channels_last` or `channels_first`.
    """
    if data_format is None:
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    stride = strides[0]
    kernel_shape = int_shape(kernel)
    output_length, feature_dim, filters = kernel_shape
    xs = []
    for i in range(output_length):
        slice_length = slice(i * stride,
                             i * stride + kernel_size[0])
        xs.append(reshape(inputs[:, slice_length, :],
                          (1, -1, feature_dim)))
    x_aggregate = concatenate(xs, axis=0)
    # Shape: `(output_length, batch_size, filters)`.
    output = batch_dot(x_aggregate, kernel)
    return permute_dimensions(output, (1, 0, 2))
 def local_conv2d(inputs, kernel, kernel_size, strides, output_shape, data_format=None):
    """Apply 2D conv with un-shared weights.
    # Arguments
        inputs: 4D tensor with shape:
                (batch_size, filters, new_rows, new_cols)
                if data_format='channels_first'
                or 4D tensor with shape:
                (batch_size, new_rows, new_cols, filters)
                if data_format='channels_last'.
        kernel: the unshared weight for convolution,
                with shape (output_items, feature_dim, filters)
        kernel_size: a tuple of 2 integers, specifying the
                     width and height of the 2D convolution window.
        strides: a tuple of 2 integers, specifying the strides
                 of the convolution along the width and height.
        output_shape: a tuple with (output_row, output_col)
        data_format: the data format, channels_first or channels_last
    # Returns
        A 4d tensor with shape:
        (batch_size, filters, new_rows, new_cols)
        if data_format='channels_first'
        or 4D tensor with shape:
        (batch_size, new_rows, new_cols, filters)
        if data_format='channels_last'.
    # Raises
        ValueError: if `data_format` is neither
                    `channels_last` or `channels_first`.
    """
    if data_format is None:
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    stride_row, stride_col = strides
    output_row, output_col = output_shape
    kernel_shape = int_shape(kernel)
    _, feature_dim, filters = kernel_shape
    xs = []
    for i in range(output_row):
        for j in range(output_col):
            slice_row = slice(i * stride_row,
                              i * stride_row + kernel_size[0])
            slice_col = slice(j * stride_col,
                              j * stride_col + kernel_size[1])
            if data_format == 'channels_first':
                xs.append(reshape(inputs[:, :, slice_row, slice_col],
                                  (1, -1, feature_dim)))
            else:
                xs.append(reshape(inputs[:, slice_row, slice_col, :],
                                  (1, -1, feature_dim)))
    x_aggregate = concatenate(xs, axis=0)
    output = batch_dot(x_aggregate, kernel)
    output = reshape(output,
                     (output_row, output_col, -1, filters))
    if data_format == 'channels_first':
        output = permute_dimensions(output, (2, 3, 0, 1))
    else:
        output = permute_dimensions(output, (2, 0, 1, 3))
    return output
@@ -2059,21 +2059,44 @@ def bias_add(x, bias, data_format=None):
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    if ndim(bias) != 1 and ndim(bias) != ndim(x) - 1:
        raise ValueError('Unexpected bias dimensions %d, '
                         'expect to be 1 or %d dimensions'
                         % (ndim(bias), ndim(x) - 1))
    bias_shape = tuple(bias.shape)
    if ndim(x) == 5:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, bias.shape[0], 1, 1, 1))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, bias_shape[0], 1, 1, 1))
            else:
                x += reshape(bias, (1, bias_shape[3]) + bias_shape[:3])
        elif data_format == 'channels_last':
-            x += reshape(bias, (1, 1, 1, 1, bias.shape[0]))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, 1, 1, 1, bias_shape[0]))
            else:
                x += reshape(bias, (1,) + bias_shape)
    elif ndim(x) == 4:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, bias.shape[0], 1, 1))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, bias_shape[0], 1, 1))
            else:
                x += reshape(bias, (1, bias_shape[2]) + bias_shape[:2])
        elif data_format == 'channels_last':
-            x += reshape(bias, (1, 1, 1, bias.shape[0]))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, 1, 1, bias_shape[0]))
            else:
                x += reshape(bias, (1,) + bias_shape)
    elif ndim(x) == 3:
        if data_format == 'channels_first':
-            x += reshape(bias, (1, bias.shape[0], 1))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, bias_shape[0], 1))
            else:
                x += reshape(bias, (1, bias_shape[1], bias_shape[0]))
        elif data_format == 'channels_last':
-            x += reshape(bias, (1, 1, bias.shape[0]))
+            if ndim(bias) == 1:
                x += reshape(bias, (1, 1, bias_shape[0]))
            else:
                x += reshape(bias, (1,) + bias_shape)
    else:
        x += bias
    return x
@@ -2291,3 +2314,72 @@ def foldr(fn, elems, initializer=None, name=None):
    fn2 = lambda x, acc: fn(acc, x)
    return theano.foldr(fn2, elems, initializer, name=name)[0]
 def local_conv1d(inputs, kernel, kernel_size, strides, data_format=None):
    if data_format is None:
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    stride = strides[0]
    kernel_shape = int_shape(kernel)
    output_length, feature_dim, filters = kernel_shape
    xs = []
    for i in range(output_length):
        slice_length = slice(i * stride,
                             i * stride + kernel_size[0])
        xs.append(reshape(inputs[:, slice_length, :],
                          (1, -1, feature_dim)))
    x_aggregate = concatenate(xs, axis=0)
    # Shape: `(output_length, batch_size, filters)`.
    output = batch_dot(x_aggregate, kernel)
    return permute_dimensions(output, (1, 0, 2))
 def local_conv2d(inputs, kernel, kernel_size, strides, output_shape, data_format=None):
    if data_format is None:
        data_format = image_data_format()
    if data_format not in {'channels_first', 'channels_last'}:
        raise ValueError('Unknown data_format ' + str(data_format))
    stride_row, stride_col = strides
    output_row, output_col = output_shape
    kernel_shape = int_shape(kernel)
    _, feature_dim, filters = kernel_shape
    if data_format == 'channels_first':
        output = []
        for i in range(output_row):
            for j in range(output_col):
                slice_row = slice(i * stride_row,
                                  i * stride_row + kernel_size[0])
                slice_col = slice(j * stride_col,
                                  j * stride_col + kernel_size[1])
                x_flatten = reshape(inputs[:, :, slice_row, slice_col],
                                    (1, -1, feature_dim))
                output.append(dot(x_flatten,
                                  kernel[i * output_col + j, :, :]))
        output = concatenate(output, axis=0)
        output = reshape(output,
                         (output_row, output_col, -1, filters))
        output = permute_dimensions(output, (2, 3, 0, 1))
    else:
        xs = []
        for i in range(output_row):
            for j in range(output_col):
                slice_row = slice(i * stride_row,
                                  i * stride_row + kernel_size[0])
                slice_col = slice(j * stride_col,
                                  j * stride_col + kernel_size[1])
                xs.append(reshape(inputs[:, slice_row, slice_col, :],
                                  (1, -1, feature_dim)))
        x_aggregate = concatenate(xs, axis=0)
        output = batch_dot(x_aggregate, kernel)
        output = reshape(output,
                         (output_row, output_col, -1, filters))
        output = permute_dimensions(output, (2, 0, 1, 3))
    return output
@@ -58,7 +58,7 @@ class NonNeg(Constraint):
    """
    def __call__(self, w):
-        w *= K.cast(w >= 0., K.floatx())
+        w *= K.cast(K.greater_equal(w, 0.), K.floatx())
        return w
@@ -1088,7 +1088,7 @@ class Layer(object):
        if hasattr(self, '_losses'):
            self._losses += losses
        # Update self._per_input_updates
-        if inputs == []:
+        if isinstance(input, list) and inputs == []:
            inputs = None
        if inputs is not None:
            inputs_hash = _object_list_uid(inputs)
@@ -1120,7 +1120,7 @@ class Layer(object):
        if hasattr(self, '_updates'):
            self._updates += updates
        # Update self._per_input_updates
-        if inputs == []:
+        if isinstance(inputs, list) and inputs == []:
            inputs = None
        if inputs is not None:
            inputs_hash = _object_list_uid(inputs)
@@ -202,7 +202,7 @@ class ThresholdedReLU(Layer):
        self.theta = K.cast_to_floatx(theta)
    def call(self, inputs, mask=None):
-        return inputs * K.cast(inputs > self.theta, K.floatx())
+        return inputs * K.cast(K.greater(inputs, self.theta), K.floatx())
    def get_config(self):
        config = {'theta': float(self.theta)}
@@ -147,22 +147,11 @@ class LocallyConnected1D(Layer):
        return (input_shape[0], length, self.filters)
    def call(self, inputs):
-        stride = self.strides[0]
+        output_length, _, filters = self.kernel_shape
        output_length, feature_dim, filters = self.kernel_shape
        xs = []
        for i in range(output_length):
            slice_length = slice(i * stride,
                                 i * stride + self.kernel_size[0])
            xs.append(K.reshape(inputs[:, slice_length, :],
                                (1, -1, feature_dim)))
        x_aggregate = K.concatenate(xs, axis=0)
        # Shape: `(output_length, batch_size, filters)`.
        output = K.batch_dot(x_aggregate, self.kernel)
        output = K.permute_dimensions(output, (1, 0, 2))
        output = K.local_conv1d(inputs, self.kernel, self.kernel_size, self.strides)
        if self.use_bias:
-            output += K.reshape(self.bias, (1, output_length, filters))
+            output = K.bias_add(output, self.bias)
        if self.activation is not None:
            output = self.activation(output)
        return output
@@ -363,62 +352,19 @@ class LocallyConnected2D(Layer):
            return (input_shape[0], rows, cols, self.filters)
    def call(self, inputs):
-        stride_row, stride_col = self.strides
+        _, _, filters = self.kernel_shape
        _, feature_dim, filters = self.kernel_shape
-        if self.data_format == 'channels_first':
+        output = K.local_conv2d(inputs,
-            if K.backend() == 'theano':
+                                self.kernel,
-                output = []
+                                self.kernel_size,
-                for i in range(self.output_row):
+                                self.strides,
-                    for j in range(self.output_col):
+                                (self.output_row, self.output_col),
-                        slice_row = slice(i * stride_row,
+                                self.data_format)
                                          i * stride_row + self.kernel_size[0])
                        slice_col = slice(j * stride_col,
                                          j * stride_col + self.kernel_size[1])
                        x_flatten = K.reshape(inputs[:, :, slice_row, slice_col],
                                              (1, -1, feature_dim))
                        output.append(K.dot(x_flatten,
                                      self.kernel[i * self.output_col + j, :, :]))
                output = K.concatenate(output, axis=0)
            else:
                xs = []
                for i in range(self.output_row):
                    for j in range(self.output_col):
                        slice_row = slice(i * stride_row,
                                          i * stride_row + self.kernel_size[0])
                        slice_col = slice(j * stride_col,
                                          j * stride_col + self.kernel_size[1])
                        xs.append(K.reshape(inputs[:, :, slice_row, slice_col],
                                            (1, -1, feature_dim)))
                x_aggregate = K.concatenate(xs, axis=0)
                output = K.batch_dot(x_aggregate, self.kernel)
            output = K.reshape(output,
                               (self.output_row, self.output_col, -1, filters))
            output = K.permute_dimensions(output, (2, 3, 0, 1))
        elif self.data_format == 'channels_last':
            xs = []
            for i in range(self.output_row):
                for j in range(self.output_col):
                    slice_row = slice(i * stride_row,
                                      i * stride_row + self.kernel_size[0])
                    slice_col = slice(j * stride_col,
                                      j * stride_col + self.kernel_size[1])
                    xs.append(K.reshape(inputs[:, slice_row, slice_col, :],
                                        (1, -1, feature_dim)))
            x_aggregate = K.concatenate(xs, axis=0)
            output = K.batch_dot(x_aggregate, self.kernel)
            output = K.reshape(output,
                               (self.output_row, self.output_col, -1, filters))
            output = K.permute_dimensions(output, (2, 0, 1, 3))
        if self.use_bias:
-            if self.data_format == 'channels_first':
+            if self.data_format == 'channels_first' or self.data_format == 'channels_last':
-                output += K.reshape(self.bias,
+                output = K.bias_add(output, self.bias, data_format=self.data_format)
-                                    (1, filters, self.output_row, self.output_col))
+
            elif self.data_format == 'channels_last':
                output += K.reshape(self.bias,
                                    (1, self.output_row, self.output_col, filters))
        output = self.activation(output)
        return output
@@ -198,6 +198,9 @@ class Recurrent(Layer):
        self.return_sequences = return_sequences
        self.return_state = return_state
        self.go_backwards = go_backwards
        if K.backend() == 'cntk' and stateful:
            raise ValueError('Stateful RNN is not currently supported with CNTK.')
        self.stateful = stateful
        self.unroll = unroll
        self.implementation = implementation
@@ -144,8 +144,8 @@ def save_model(model, filepath, overwrite=True, include_optimizer=True):
                weight_values = K.batch_get_value(symbolic_weights)
                weight_names = []
                for i, (w, val) in enumerate(zip(symbolic_weights, weight_values)):
-                    # Default values of symbolic_weights is /variable for theano
+                    # Default values of symbolic_weights is /variable for theano and cntk
-                    if K.backend() == 'theano':
+                    if K.backend() == 'theano' or K.backend() == 'cntk':
                        if hasattr(w, 'name') and w.name != "/variable":
                            name = str(w.name)
                        else:
@@ -12,7 +12,7 @@ if K.backend() == 'tensorflow':
 def clip_norm(g, c, n):
    if c > 0:
-        g = K.switch(n >= c, g * c / n, g)
+        g = K.switch(K.greater_equal(n, c), g * c / n, g)
    return g
@@ -136,6 +136,10 @@ def test_elu():
    assert_allclose(result, test_values, rtol=1e-05)
    negative_values = np.array([[-1, -2]], dtype=K.floatx())
    # cntk can't rebind the input shape, so create the model again to test different batch size
    if (K.backend() == 'cntk'):
        x2 = K.placeholder(ndim=2)
        f = K.function([x2], [activations.elu(x2, 0.5)])
    result = f([negative_values])[0]
    true_result = (np.exp(negative_values) - 1) / 2
@@ -11,12 +11,24 @@ def test_resnet50():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_resnet50_notop():
    model = applications.ResNet50(weights=None, include_top=False)
    assert model.output_shape == (None, None, None, 2048)
@keras_test
 def test_resnet50_notop_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.ResNet50(weights=None, include_top=False, input_shape=input_shape)
    output_shape = (None, 2048, 1, 1) if K.image_data_format() == 'channels_first' else (None, 1, 1, 2048)
    assert model.output_shape == output_shape
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_resnet50_pooling():
    model = applications.ResNet50(weights=None,
                                  include_top=False,
@@ -24,6 +36,16 @@ def test_resnet50_pooling():
    assert model.output_shape == (None, 2048)
@keras_test
 def test_resnet50_pooling_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.ResNet50(weights=None,
                                  include_top=False,
                                  pooling='avg',
                                  input_shape=input_shape)
    assert model.output_shape == (None, 2048)
@keras_test
 def test_vgg16():
    model = applications.VGG16(weights=None)
@@ -31,17 +53,36 @@ def test_vgg16():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_vgg16_notop():
    model = applications.VGG16(weights=None, include_top=False)
    assert model.output_shape == (None, None, None, 512)
@keras_test
 def test_vgg16_notop_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.VGG16(weights=None, include_top=False, input_shape=input_shape)
    output_shape = (None, 512, 9, 9) if K.image_data_format() == 'channels_first' else (None, 9, 9, 512)
    assert model.output_shape == output_shape
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_vgg16_pooling():
    model = applications.VGG16(weights=None, include_top=False, pooling='avg')
    assert model.output_shape == (None, 512)
@keras_test
 def test_vgg16_pooling_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.VGG16(weights=None, include_top=False, pooling='avg', input_shape=input_shape)
    assert model.output_shape == (None, 512)
@keras_test
 def test_vgg19():
    model = applications.VGG19(weights=None)
@@ -49,17 +90,36 @@ def test_vgg19():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_vgg19_notop():
-    model = applications.VGG16(weights=None, include_top=False)
+    model = applications.VGG19(weights=None, include_top=False)
    assert model.output_shape == (None, None, None, 512)
@keras_test
 def test_vgg19_notop_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.VGG19(weights=None, include_top=False, input_shape=input_shape)
    output_shape = (None, 512, 9, 9) if K.image_data_format() == 'channels_first' else (None, 9, 9, 512)
    assert model.output_shape == output_shape
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_vgg19_pooling():
    model = applications.VGG16(weights=None, include_top=False, pooling='avg')
    assert model.output_shape == (None, 512)
@keras_test
 def test_vgg19_pooling_specified_input_shape():
    input_shape = (3, 300, 300) if K.image_data_format() == 'channels_first' else (300, 300, 3)
    model = applications.VGG16(weights=None, include_top=False, pooling='avg', input_shape=input_shape)
    assert model.output_shape == (None, 512)
@keras_test
@pytest.mark.skipif((K.backend() != 'tensorflow'),
                    reason='Requires tensorflow backend')
@@ -91,12 +151,16 @@ def test_inceptionv3():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_inceptionv3_notop():
    model = applications.InceptionV3(weights=None, include_top=False)
    assert model.output_shape == (None, None, None, 2048)
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support padding with non-concrete dimension")
 def test_inceptionv3_pooling():
    model = applications.InceptionV3(weights=None, include_top=False, pooling='avg')
    assert model.output_shape == (None, 2048)
@@ -77,6 +77,8 @@ def test_trainable_weights():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support add learning_phase() as input")
 def test_learning_phase():
    a = Input(shape=(32,), name='input_a')
    b = Input(shape=(32,), name='input_b')
@@ -494,7 +496,7 @@ def test_load_layers():
    from keras.models import Model
    from keras.engine.topology import preprocess_weights_for_loading
-    if K.backend() == 'tensorflow':
+    if K.backend() == 'tensorflow' or K.backend() == 'cntk':
        inputs = Input(shape=(10, 20, 20, 1))
    else:
        inputs = Input(shape=(10, 1, 20, 20))
@@ -550,6 +552,7 @@ def test_load_layers():
    assert np.all(K.eval(model.layers[2].weights[5]) == weight_tensor_bi_convlstm_new[5])
@keras_test
 def test_recursion_with_bn_and_loss():
    model1 = Sequential([
        layers.Dense(5, input_dim=5, activity_regularizer='l1'),
@@ -433,6 +433,8 @@ def test_model_with_partial_loss():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support external loss yet")
 def test_model_with_external_loss():
    # None loss, only regularization loss.
    a = Input(shape=(3,), name='input_a')
@@ -43,67 +43,69 @@ def test_convolutional_recurrent():
            if data_format == 'channels_first' or return_sequences:
                continue
-            # Tests for statefulness
+            # cntk doesn't support statefulness on LSTM yet, will enable it on cntk later
-            model = Sequential()
+            if K.backend() != 'cntk':
-            kwargs = {'data_format': data_format,
+                # Tests for statefulness
-                      'return_sequences': return_sequences,
+                model = Sequential()
-                      'filters': filters,
+                kwargs = {'data_format': data_format,
-                      'kernel_size': (num_row, num_col),
+                          'return_sequences': return_sequences,
-                      'stateful': True,
+                          'filters': filters,
-                      'batch_input_shape': inputs.shape,
+                          'kernel_size': (num_row, num_col),
-                      'padding': 'same'}
+                          'stateful': True,
-            layer = convolutional_recurrent.ConvLSTM2D(**kwargs)
+                          'batch_input_shape': inputs.shape,
                          'padding': 'same'}
                layer = convolutional_recurrent.ConvLSTM2D(**kwargs)
-            model.add(layer)
+                model.add(layer)
-            model.compile(optimizer='sgd', loss='mse')
+                model.compile(optimizer='sgd', loss='mse')
-            out1 = model.predict(np.ones_like(inputs))
+                out1 = model.predict(np.ones_like(inputs))
-            # train once so that the states change
+                # train once so that the states change
-            model.train_on_batch(np.ones_like(inputs),
+                model.train_on_batch(np.ones_like(inputs),
-                                 np.random.random(out1.shape))
+                                     np.random.random(out1.shape))
-            out2 = model.predict(np.ones_like(inputs))
+                out2 = model.predict(np.ones_like(inputs))
-            # if the state is not reset, output should be different
+                # if the state is not reset, output should be different
-            assert(out1.max() != out2.max())
+                assert(out1.max() != out2.max())
-            # check that output changes after states are reset
+                # check that output changes after states are reset
-            # (even though the model itself didn't change)
+                # (even though the model itself didn't change)
-            layer.reset_states()
+                layer.reset_states()
-            out3 = model.predict(np.ones_like(inputs))
+                out3 = model.predict(np.ones_like(inputs))
-            assert(out2.max() != out3.max())
+                assert(out2.max() != out3.max())
-            # check that container-level reset_states() works
+                # check that container-level reset_states() works
-            model.reset_states()
+                model.reset_states()
-            out4 = model.predict(np.ones_like(inputs))
+                out4 = model.predict(np.ones_like(inputs))
-            assert_allclose(out3, out4, atol=1e-5)
+                assert_allclose(out3, out4, atol=1e-5)
-            # check that the call to `predict` updated the states
+                # check that the call to `predict` updated the states
-            out5 = model.predict(np.ones_like(inputs))
+                out5 = model.predict(np.ones_like(inputs))
-            assert(out4.max() != out5.max())
+                assert(out4.max() != out5.max())
-            # check regularizers
+                # check regularizers
-            kwargs = {'data_format': data_format,
+                kwargs = {'data_format': data_format,
-                      'return_sequences': return_sequences,
+                          'return_sequences': return_sequences,
-                      'kernel_size': (num_row, num_col),
+                          'kernel_size': (num_row, num_col),
-                      'stateful': True,
+                          'stateful': True,
-                      'filters': filters,
+                          'filters': filters,
-                      'batch_input_shape': inputs.shape,
+                          'batch_input_shape': inputs.shape,
-                      'kernel_regularizer': regularizers.L1L2(l1=0.01),
+                          'kernel_regularizer': regularizers.L1L2(l1=0.01),
-                      'recurrent_regularizer': regularizers.L1L2(l1=0.01),
+                          'recurrent_regularizer': regularizers.L1L2(l1=0.01),
-                      'bias_regularizer': 'l2',
+                          'bias_regularizer': 'l2',
-                      'activity_regularizer': 'l2',
+                          'activity_regularizer': 'l2',
-                      'kernel_constraint': 'max_norm',
+                          'kernel_constraint': 'max_norm',
-                      'recurrent_constraint': 'max_norm',
+                          'recurrent_constraint': 'max_norm',
-                      'bias_constraint': 'max_norm',
+                          'bias_constraint': 'max_norm',
-                      'padding': 'same'}
+                          'padding': 'same'}
-            layer = convolutional_recurrent.ConvLSTM2D(**kwargs)
+                layer = convolutional_recurrent.ConvLSTM2D(**kwargs)
-            layer.build(inputs.shape)
+                layer.build(inputs.shape)
-            assert len(layer.losses) == 3
+                assert len(layer.losses) == 3
-            assert layer.activity_regularizer
+                assert layer.activity_regularizer
-            output = layer(K.variable(np.ones(inputs.shape)))
+                output = layer(K.variable(np.ones(inputs.shape)))
-            assert len(layer.losses) == 4
+                assert len(layer.losses) == 4
-            K.eval(output)
+                K.eval(output)
            # check dropout
            layer_test(convolutional_recurrent.ConvLSTM2D,
@@ -17,6 +17,8 @@ else:
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support dilated conv")
 def test_causal_dilated_conv():
    # Causal:
    layer_test(convolutional.Conv1D,
@@ -122,6 +124,8 @@ def test_averagepooling_1d():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support dilated conv")
 def test_convolution_2d():
    num_samples = 2
    filters = 2
@@ -597,6 +601,8 @@ def test_upsampling_2d():
                assert_allclose(np_output, expected_out)
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support it yet")
 def test_upsampling_3d():
    num_samples = 2
    stack_size = 2
@@ -651,6 +657,8 @@ def test_upsampling_3d():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support slice to 0 dimension")
 def test_cropping_1d():
    num_samples = 2
    time_length = 4
@@ -2,9 +2,12 @@ import pytest
 from keras.utils.test_utils import layer_test
 from keras.utils.test_utils import keras_test
 from keras.layers import noise
 from keras import backend as K
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support it yet")
 def test_GaussianNoise():
    layer_test(noise.GaussianNoise,
               kwargs={'stddev': 1.},
@@ -12,6 +15,8 @@ def test_GaussianNoise():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support it yet")
 def test_GaussianDropout():
    layer_test(noise.GaussianDropout,
               kwargs={'rate': 0.5},
@@ -77,6 +77,8 @@ def test_implementation_mode(layer_class):
@rnn_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support stateful RNN yet")
 def test_statefulness(layer_class):
    model = Sequential()
    model.add(embeddings.Embedding(embedding_num, embedding_dim,
@@ -147,6 +149,8 @@ def test_regularizer(layer_class):
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support mask on RNN yet")
 def test_masking_layer():
    ''' This test based on a previously failing issue here:
    https://github.com/fchollet/keras/issues/1567
@@ -170,7 +174,9 @@ def test_masking_layer():
@rnn_test
 def test_from_config(layer_class):
-    for stateful in (False, True):
+    # cntk does not support stateful yet.
    stateful_flags = (False, True) if K.backend() != 'cntk' else (False,)
    for stateful in stateful_flags:
        l1 = layer_class(units=1, stateful=stateful)
        l2 = layer_class.from_config(l1.get_config())
        assert l1.get_config() == l2.get_config()
@@ -220,6 +226,8 @@ def test_specify_initial_state_non_keras_tensor(layer_class):
@rnn_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support stateful RNN yet")
 def test_reset_states_with_values(layer_class):
    num_states = 2 if layer_class is recurrent.LSTM else 1
@@ -268,6 +276,8 @@ def test_specify_state_with_masking(layer_class):
@rnn_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support stateful RNN yet")
 def test_return_state(layer_class):
    num_states = 2 if layer_class is recurrent.LSTM else 1
@@ -5,6 +5,7 @@ from keras.utils.test_utils import keras_test
 from keras.layers import wrappers, Input
 from keras.layers import core, convolutional, recurrent, embeddings
 from keras.models import Sequential, Model, model_from_json
 from keras import backend as K
@keras_test
@@ -108,6 +109,8 @@ def test_regularizers():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support reverse yet")
 def test_Bidirectional():
    rnn = recurrent.SimpleRNN
    samples = 2
@@ -184,6 +184,8 @@ def test_merge():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support stateful RNN yet")
 def test_merge_mask_2d():
    rand = lambda *shape: np.asarray(np.random.random(shape) > 0.5, dtype='int32')
@@ -217,6 +219,8 @@ def test_merge_mask_2d():
@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="cntk does not support stateful RNN yet")
 def test_merge_mask_3d():
    rand = lambda *shape: np.asarray(np.random.random(shape) > 0.5, dtype='int32')
@@ -42,6 +42,8 @@ def test_sparse_metrics():
        assert K.eval(metric(y_a, y_b)).shape == (6,)
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="keras cntk backend does not support top_k yet")
 def test_top_k_categorical_accuracy():
    y_pred = K.variable(np.array([[0.3, 0.2, 0.1], [0.1, 0.2, 0.7]]))
    y_true = K.variable(np.array([[0, 1, 0], [1, 0, 0]]))
@@ -56,6 +58,8 @@ def test_top_k_categorical_accuracy():
    assert failure_result == 0
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason="keras cntk backend does not support top_k yet")
 def test_sparse_top_k_categorical_accuracy():
    y_pred = K.variable(np.array([[0.3, 0.2, 0.1], [0.1, 0.2, 0.7]]))
    y_true = K.variable(np.array([[1], [0]]))
`@@ -1,3 +1,3 @@`
	`# Keras: Deep Learning library for Theano and TensorFlow`	`# Keras: The Python Deep Learning library`

	`{{autogenerated}}`	`{{autogenerated}}`