* conv3d_tranpose in tf and th
* fix _preprocess_deconv_output_shape error
* cntk conv3d_tranpose
* conv3d_tranpose test
* formatting
* cleanup tests
* fix incorrect axis ordering and docs
* fix incorrect axis ordering and docs
* deconv3d_output_shape to fix errors
* remove conv2d_transpose reference in theano backend
* remove kernel_size loop from test
* put depth first in test and add dim to invalid use case input
* formatting - removed extra line
* fix pep8
* remove extraneous args from tf conv3d_transpose function
* default val for data_format=None
* Add MobileNet to application
* Add support for 1001 classes in imagenet utils
* Revert a mistake in the tests
* Setup application test for mobilenet to run only when on tensorflow
* Correct pytest.mark.skipif explanation for skipping tests if not on tensorflow
* Corrected mobilenet to support 1000 classes and reverted imagenet_utils to prior state
* Fix tensorflow test
* Restrict mobilenets to data format "channels_last"
* Add review fixes
* PEP8 fix
* Add relu6 to activations.py
* Corrected imports in mobilenet.py
* Rolled back activation relu6 and inlined it to mobilenet.py
* Refactored DepthwiseConv2D and other corrections
* Fixed tests
* PEP8 correction
* Add docs to private functions and other fixes
* Fix failed test where input shape is None
* Fix value of size for model name
* Add more numpy-style attributes to HDF5Matrix
* Improve docstrings
* Add test coverage
* Add a ´# Returns´ section to shape, dtype, ndim, size.
* Remove whitespace in blank lines
* Use third-person and close docstrings on a new line.
* Utility function to check if a callable has a given keyword argument
* Added unit tests for the has_arg function
* Replace uses of getargspec with the new has_arg function
Not changing keras.backend, because that gives ImportErrors due to
a circular import (conv_utils uses the backend, and is imported
before generic_utils in utils/__init__.py)
Not changing keras.utils.test_utils, because that change exposes
(what looks to me like) a latent bug
* Replace incorrect use of getargspec in test_utils.py
The previous code would always fail to detect the 'weights' argument.
Simply replacing getargspec would cause the tests for some of the legacy
layers to fail because the passed 'weights' argument is bad.
Instead, I have added a check for whether the passed `weights` array
is empty, this avoids tripping the bug.
* Replacing getargspec with has_arg in the backend modules
This requires reordering imports to avoid errors caused by
conv_utils trying to import the backend, the backend wanting to
import generic_utils, and utils/__init__.py listing conv_utils
before generic_utils.
* Removed getargspec from legacy wrapping function
Instead save the wrapped function in an attribute and call
getargspec on this attribute during documentation generation.
* Initial support for Datasets
* Fix warnings
* Fix for python2
* Fix travis deps
* Fix python2 indexing
* Fix test and docs
* Avoir use of future, use multiprocessing.pool
* Changed warning and better moduling
* fix threading test
* Move Dataset and enqueuers to utils.data_utils
* Skip None input, add seed for generators
* Skip None input fix
* pep8
* Fix example
* Add test in training and changed Dataset to hold item
* Revert to batch handling
* Docs update
* PEP8
* Rename in test
* Better documentation in Sequence
* Typo in sequence warning
* Rename pickle_safe and max_q_size, typos
* Typo in docstring
* Fix tests in training
* add skip_compile option to keras.models.load_model()
* update document
* change name from skip_compile to compile
* fix bug in `preprocess_weights_for_loading` so that layer of type `Model` can be coverted correctly
* update codestyle
* updated
* fix indent
* revert changing
* update spacing
* Fixed default header in RemoteMonitor callback.
* Removed default headers from RemoteMonitor
The requests library automatically adds the appropriate headers by default.
* Fixed PEP8 warning in RemoteMonitor constructor
* Utility function to check if a callable has a given keyword argument
* Replace uses of getargspec with the new has_arg function
Not changing keras.backend, because that gives ImportErrors due to
a circular import (conv_utils uses the backend, and is imported
before generic_utils in utils/__init__.py)
Not changing keras.utils.test_utils, because that change exposes
(what looks to me like) a latent bug
* Added unit tests for the has_arg function
* Replace incorrect use of getargspec in test_utils.py
The previous code would always fail to detect the 'weights' argument.
Simply replacing getargspec would cause the tests for some of the legacy
layers to fail because the passed 'weights' argument is bad.
Instead, I have added a check for whether the passed `weights` array
is empty, this avoids tripping the bug.
* Replacing getargspec with has_arg in the backend modules
This requires reordering imports to avoid errors caused by
conv_utils trying to import the backend, the backend wanting to
import generic_utils, and utils/__init__.py listing conv_utils
before generic_utils.
* Warn always about semantic changes if having keras1 args in *_generator calls.
* modified api upgrade warning message to be more detailed
* minor fix to pep8 syntax
* Fix to use floatx as argument in set_floatx
* Add line break
* Change to lower case
* Use 'x' as in moving_average_update description
* Fix to drop duplicate in one_hot Returns
* The foldr Returns convert to foldl itself
* Add back quote
* Add back quote
* Rebase and integrate comment on one-hot
* Fix the issue that when n can be mod by batch_size, the shuffle never happened
* Ensure generator lock will be process version instead of threading lock
* Add refs and comments of training generator lock
* Update comment
* Use the pytest tmpdir fixture (#6881)
* Run test_data_utils in a temporary directory
* Check output using os.path.isdir or os.path.isfile instead of os.path.exists
* Use the tmpdir fixture instead of mkdtemp
* Use in_tmpdir fixture when writing files in tests
... to avoid leaving files in the repository when tests fail, and also to
avoid the possibility of race conditions when several tests try to access
the same file.
* added parallel counting of sample files when initializing DirectoryIterator
* Updated to actually run in parallel.
* Added parallel generation of the filenames and labels lists
* Added documentation and removed commented-out code
* style fixes
* changes discussed in pull request
* Removed trailing spaces
* Switching to thread pool
* fixed broken import
* Raise a descriptive error if `Model` constructor `inputs` are not inputs.
* Assert that layer attached to input tensors is an InputLayer
* fix pep8
* under-indented
* fix indent
* Update TypError message
* Fix TypeError message
* Serialize/Deserialize numpy arrays passed as arguments to Lambda layers
* Serialize/Deserialize numpy arrays passed as arguments to Lambda layers
* Corrections from fchollet comments
* corrections
* Removes warning and adds a unit test
* pep8 corrections
* Add top_k_sparse_categorical_accuracy and test_top_k_sparse_categorical_accuracy
* Rename top_k_sparse_categorical_accuracy and sparse_top_k_categorical_accuracy
@fchollet Merge this pull request plus follow https://github.com/integration/probot-stale to automatically mark the many 3 month old issues as stale, then close them after an additional 30 days.
I chose 30 additional days for closing because sometimes people go on vacation for a few weeks, this way they'll have time after being notified.
* Fix get_file download progress bar
* Added a comment to clarify the purpose of the "enclosed" dictionary
* pep8
* Fix get_file download progress bar, including no Content-Length header.
* Progbar accepts target None in addition to -1.
* #6670 Remove Progbar implementation details from docstring
Only None should be supported on the Progbar target parameter,
target values of -1 are an unsupported implementation detail
that may be removed in the future.
* Better error message for invalid funcational api inputs (#6589)
* raise ValueError if `inputs` is not a Keras tensor
* Move to respective backends
* raise error if is_keras_tensor is called on a non-tensor object
* Fix failing tests
* responding to comments
* Update docstring comments to better explain expected behavior
* Fixed type conversion in neural_doodle example. Shape returns number of channels as int32 however further calculations require it to be float
* Updated neural doodle example to follow Keras2 API. Renamed ‘border_mode’ argument to ‘padding’.
* Fixed apostrophe for consistency.
Two hyperlinks (namely `[here]` and `[details]`) are misrendered in TensorBoard documentation, see https://keras.io/callbacks/#tensorboard. Fix exclude `(` in argument names, because otherwise `[link](http://` is rendered as a function/class argument.
* add exception handling when attempting to write keras config file to disk to match tf.contrib.keras implementation
* Add reliance on exceptions rather than testing write access to the target directory.
* Add an option to create dot model in different directions
This commit adds an optional argument to functions plot() and model_to_dot() specifying the direction of the dot object
* Rename visualize_util.py to vis_utils.py and and model plot direction
* Format the code in the PEP8 style guide
* Add docstring for plot_model method, format code according to PEP8
pycodestyle and pydocstyle raises no info, warning, or error with this pr.
* Docstring style
* Docstring fixes.
* Visualize weight grad distributions in TensorBoard
* TensorBoard: Add learning_phase if needed and fix fit_generator target dimensions.
* TensorBoard: Fix pep8
* TensorBoard: Add a flag to make grad visualization optional.
* TensorBoard: Test grad visualizations as well.
* TensorBoard: Documentation and further pep8 changes.
* TensorBoard: Add dropout layer to test K.learning_phase()
* Add learning_phase check in fit() to fit_generator().
* Tensorboard: Add test for comparing cbk.validation_data for fit() and fit_generator()
* Tensorboard: Fix cbk.val_data test.
* Tensorboard: Enable grad vis in tb convnet test.
* Tensorboard: No linebreak for more readability
* Tensorboard: Add a convnet test for tensorboard
* Tensorboard: Check weight dimensions better in write_images and make the code more explicit
* Tensorboard: 2 epochs is enough for tb convnet test
* Tensorboard: Fix pep8
* add huber loss function (for robust regression)
* rename huber to logcosh (PR comments were correct), fix PEP8 whitespace checks
* logcosh loss: change from lambda to fn def'n, add text coverage
* Add test for documentation
* Changes according to review
* Changes according to review
* Fix documentation and add Travis task
* Style fixes.
* Fix line length
* PEP8
* Added support for the new pydot API to fix find_graphviz error
* Simplified pydot installation checking
* Workaround for pydot generic Exception raising
* Removed hacky workaround for pyplot Exception, included comment
* fix specify state
* Added documentation for `reset_states`
* Remove unneeded check
* Update Documentation
* pep8
* Fix when initial_states is a tensor
* modify tests for non-list initial states.
* use initial_state instead of initial_states
* pep8
* change get_initial_states to get_initial_state in ConvLSTM2D
* Check for Keras Tensors in Recurrent
* check if initial_state is passed to call
* pep8
* Move state_spec definition to __init__
* Fix reset states
* fix masking when specifying state
* added masking test for RNNs with specified state
* pep8
* remove unnecessary blank line
* Update layer_utils.py
Model is not sequential if there is a "merge" layer somewhere in the graph. So if a layer has multiple input layers ("inbound_layers"), the whole model is no longer sequential...
* Explanation of changed condition
Added a comment to explain the check for sequentiality in a model:
A model is not sequential if it has multiple nodes or if a layer has multiple inbound_layers
* Update the value of 'steps_per_epoch'
* Update the docstring of fit_generator to steps_per_epoch * batch_size
* Update the value of 'steps_per_epoch'
* Update the docstring of fit_generator: when 'steps_per_epoch' batches have been seen
* Corrected a comment in function "print_layer_summary_with_connections"
Changed line 82 from "# node is node part of the current network" to "# node is not part of the current network"
* Fixed issue #6286
Fixed the issue where the summary of non-sequential models would not display content of "Connected to" column
* Allows preprocess_weights_for_loading() to consider layers wrapped in TimeDistributed or Bidirectional.
* fixed whitespace PEP8 issue
* Allows preprocess_weights_for_loading() to consider layers wrapped in TimeDistributed or Bidirectional.
* Allows preprocess_weights_for_loading() to consider layers wrapped in TimeDistributed or Bidirectional.
* Refactored preprocess_weights_for_loading() to allow for loading to TimeDistributed and Bidirectional. PEP8 Fixes.
* PEP8 Fixes
* Recursive implementation of preprocess_weights_for_loading to accomodate Bidirectional and TimeDistributed wrappers.
* Recursive implementation of preprocess_weights_for_loading to accomodate Bidirectional and TimeDistributed wrappers.
* deindentation and doc-string formatting. method argument formating.
* Embedding visualization is added to TensorBoard callback.
* CI failure fix.
* Code review fixes
+ None or empty list for embeddings_layer_names implies monitoring
of all layers of type Embedding
+ embeddings_metadata now can contain just a string with metadata
filename if it's common for all the embedding layers.
+ Frequencies now takes 0-th epoch as first.
* Code review is in progress
load_model fails when a model has multiple output layers that have more
than one metric. Solve this problem by adding a clause that checks if
metrics are a list.
For more elaborate description see issue #3958
Include a unit test confirming that model with multiple outputs that
have more than one metric can indeed be saved and reloaded.
* get_file() with tar, tgz, tar.bz, zip and sha256, resolves#5861.
The changes were designed to preserve backwards compatibility while adding support
for .tar.gz, .tgz, .tar.bz, and .zip files.
sha256 hash is now supported in addition to md5.
* get_file() improve large file performance #5861.
* getfile() extract parameter fix (#5861)
* extract_archive() py3 fix (#5861)
* get_file() tarfile fix (#5861)
* data_utils.py and data_utils_test.py updated based on review (#5861)
# This is a combination of 4 commits.
# The first commit's message is:
get_file() with tar, tgz, tar.bz, zip and sha256, resolves#5861.
The changes were designed to preserve backwards compatibility while adding support
for .tar.gz, .tgz, .tar.bz, and .zip files.
Adds extract_archive() and hash_file() functions.
sha256 hash is now supported in addition to md5.
adds data_utils_test.py to test new functionality
# This is the 2nd commit message:
extract_archive() redundant open (#5861)
# This is the 3rd commit message:
data_utils.py and data_utils_test.py updated based on review (#5861)
test creates its own tiny file to download and extract locally.
test covers md5 sha256 zip and tar
_hash_file() now private
_extract_archive() now private
# This is the 4th commit message:
data_utils.py and data_utils_test.py updated based on review (#5861)
test creates its own tiny file to download and extract locally.
test covers md5 sha256 zip and tar
_hash_file() now private
_extract_archive() now private
* data_utils.py and data_utils_test.py updated based on review (#5861)
* data_utils.py get_file() cache_dir docs (#5861)
* data_utils.py address docs comments (#5861)
* get_file() comment link, path, & typo fix
- TensorFlow 1
- Theano 0.9 : also use "device=cuda" in theanorc to use new
"gpuarray" backend
- Miniconda 4.2.12 (latest conda installer with python 3.5)
- Simplified pip install for tensorflow and keras test dependencies
* Fix docstring relating to stacked recurrent layers
The docstring did not specify the need to use return_sequences=True when creating a stacked recurrent network. I have replaced the original example with a more descriptive one.
* expand comment on LSTM example
Comment expanded to explicitly state that the input size only needs to be defined for the first layer.
* Update recurrent.py
* Add a dtype paramater to the map_fn backend function
* Update the map test to include the dtype parameter
Also update foldl and foldr to use variables for future proofing.
* Make accuracy metrics work with masked outputs
Several accuracy metrics end in a call to `K.equal()`, which gives a tensor with dtype `bool`. The multiplication with a float mask then crashes. This fixes that crash.
* Move cast to metrics
* fix small inconsistencies in the documentation
* remove backend-specific details
* add line removed by accident (that what happen when you commit your changes too fast)
* Fix in doc example
* Fix and improvements to the `backend` documentation
Improved Preamble of the `backend.md` template:
- fixed a typo
- Added few notes that makes the documentation more self explanatory
- Made all code examples running by Copy&Paste
Aligned the format of the `backend()` function
Fixed docstring of `set_image_dim_ordering()` function
* Fixed a Typo in %USERPROFILE% env name for Window Users
* Added `_variable` so not to get a different value every time
* Changed conv kernel size in resid pathway to 1x1, and changed activation from BN+RELU to ELU.
* Added a more informative docstring decsribing elu argument and its two behaviors.
* Add UpSampling*D API conversion interface. Also modified generate_legacy_interface to allow value conversions in positional args.
* Removed positional keyword_conversion modification
* Add a dtype paramater to the map_fn backend function
* Update the map test to include the dtype parameter
Also update foldl and foldr to use variables for future proofing.
* Add API conversion interface for Dropout layer
* Fixed API-conversion for Dropout
* Fixed warning message
* Added another test case
* Fixed warning message
* Whitespace fix
Currently, `import keras` will fail if pydot is not installed
(for example, in a fresh virtualenv without the `keras[visualize]`
option). This commit delays the check for pydot until it is
actually used, in line with the treatment of a PIL.Image dependency
in keras/preprocessing/image.py
* update VAE examples (MLP and ConvNet) to the new API
* renamed objectives to metrics for xent_loss
* std -> stddev for random_normal
* adjusted arguments to Conv2D/Deconv2D
* fixed typo (filterss -> filters)
* change to conv2dtranspose and tuple strides
* Update mnist_transfer_cnn for new API
* Update mnist_siamese_graph.py for new API
* Refactor example a little bit for clarity
* Update mnist_irnn.py for new API
* Fix variable name
* Update mnist_heirarchial_rnn.py for new api
* Fix a few api calls i missed
* Update mnist_acgan.py for new API
* Fix variable name
* Update imdb_cnn for new API
* Update benchmark.py to work with new API
* PEP8 fix
* Change filter_length to kernel_size
* Update imdb_cnn_lstm.py for new API
* PEP8 indentation fix
Tokenizer returns sequence values in the range of [0, nb_words). In this
example, MAX_NB_WORDS is 20000 and the data's min value is 19999. There
is no need to use 'nb_words + 1'.
Reuse of `CSVLogger` object raises `ValueError: I/O operation on closed file.`
because in `on_train_end` method `self.csv_file` is closed
but `self.writer` is not reset to `None`.
The expression of pictures should be (img_height, img_width, 3) or (3, img_height, img_width), not (img_width, img_height, 3) or (3, img_width, img_height).
* Checking that ndim is >= 3 for TF batch_dot
* Checking error is thrown if ndim < 3 in batch_dot for TF
* Integration test raising error when ndim < 3 on TF
* Using TF backend during shape test
* Fix style issue
* Add reference to slides for RMSprop
I thought as the docs cite other methods, it would be good to provide a citation for this optimiser. Hinton mentions to 'keep citing the slides' for this method.
I chose the title of the lecture in question, if there's a better title I'm sure that can be used instead, but I think the citation should be there.
* Whitespace violation
* Fix custom_objects for regularizers and other issues
* Add custom_object_scope
* Update optimizers and Lambda to respect global_custom_objects
* Add generic_utils to mkdocs and add tests for custom_object_scope
* Fix elif statement in optimizers.py
* Clean up generic_utils.py docstrings
Corrected a typo in line 65. This was giving `TypeError: mel() got an unexpected keyword argument 'hop_lengthgth'` error. Please verify and merge.
Thanks.
* Fixed checking input masks in Layer.compute_mask
* Added dtype parameter to zeros_like and ones_like
* Fix existing docstring for ones_like and zeros_like
Beforehand, slow generators could have caused race conditions and
crashes with 'ValueError: generator already executing', e.g. if
a validation generator filling up the queue took longer than a single
epoch that elapsed meanwhile.
* Fix a warning on python3
In Python3, 50000 / 10 = 5000.0. This will result in a warning from numpy:
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future.
* Use // instead
* "random_uniform" initializer doesn't exists.
The following line raises :`ValueError: Invalid initialization: random`
because "random_uniform" is just "uniform"
```
self.W = self.add_weight(shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
```
* shape parameter missing in build call
super(MyLayer, self).build(input_shape
* Theano cudnn code now throws Exception when it is not available, need to catch this
* Revert "Theano cudnn code now throws Exception when it is not available, need to catch this"
This reverts commit 2d107a6a9aca469d545d6ee31624c4b530c7ea0a.
* use dnn_available to check if cudnn is available
* Fix pep8 error
* documentation: tf backend, complete.
* few fixes
* pep8
* fixed according to review.
* documentation: tf backend, complete.
* few fixes
* pep8
* fixed according to review.
* initial tensorflow 0.12 fixes
see #4805
* fixed indents for pep8
* added tests for clipnorm and clipvalues
* updated travis to tf 0.12.1
* batch_matmul removed
even though the tests don’t fail on travis… they fail locally…
* make changes work with TF 0.11
* move statement outside of if
* Fix for Issue #4851
I didn't catch when my original documentation was changed by @fchollet (overall for the better) but introducing this bug: https://github.com/fchollet/keras/issues/4851
My original docs only included the dimensions of the parameters (no batch dim) and were correct, but I think its better to change the functions to reflect the current docs.
My original docs were:
```Python
shared_axes: the axes along with to share parameters for
- the activation function. For example if the
- incoming feature maps from a 2D convolution
- has dimensions 16x32x32 and you wish to share
- parameters across space so that each feature
- maps only has one set of parameters, set
- shared_axes = [1, 2]
```
* Make PEP8 compliant
Add spaces around subtraction
The kullback_leibler_divergence metric in metrics.py returned an output
with dimensionality N-1 (where N is the dimensionality of the target).
Add mean after sum to fix this, such that always a scalar is returned.
* Added CTC to Theano and Tensorflow backend along with image OCR example
* Fixed python style issues, made data files remote, and made code more idiomatic to Keras
* Fixed a couple more style issues brought up in the original PR
* Reverted wrappers.py
* Fixed potential training-on-validation issue and removed unused imports
* Fixed PEP8 issue
* Remaining PEP8 issues fixed
* Fixed failure to learn issue on image_ocr.py found on newer versions of Keras
* Minor tweaks before submitting fixed image_ocr.py example
* Removed TimeDistributed usage and fixed quote inconsistencies
* Fixed PEP8 issues
* Fixed issue where loading weights does not work for start_epoch < 10
* Switched to using initial_epoch
* fix accuracy computation in MNIST siamese graph example
previous code:
```
def compute_accuracy(predictions, labels):
'''Compute classification accuracy with a fixed threshold on distances.
'''
return labels[predictions.ravel() < 0.5].mean()
```
is not accuracy over all the samples, but over samples with negative prediction.
* add space around "=="
follow François's suggestion
* updated the issue template with a bit more information
* Update ISSUE_TEMPLATE.md
De-emphasized the google group, added StackOverflow.
* Update ISSUE_TEMPLATE.md
* allow ability to share activation parameters along specified axes
* add tests
* change to shared_axes and remove TF dummy broadcast function
* update tests to shared_axes
* Update docstrings in advanced activations
* Theano backend consistancy
ones_like and zeros_like don't have the name parameter in their signature for theano backend so it can trigger an error if it is used.
* pep8
pep8
* documentation - tensorflow_backend.py - first 1/3
* pep8, add periods.
* documentation - tensorflow_backend.py
* pep8
* * fix the confusion of variable/tensor.
* remove `var` as a variable name. `kvar` is used instead.
* new lines between multiple items under # Arguments
* Simplify BatchNormalization code.
* Make Theano's K.batch_normalization similar to TensorFlow.
* Change default batch normalization epsilon to 1e-3.
* Use Theano's new batch normalization interface.
* recurse directories in DicrectoryIterator
rebased and squashed, again
* added prose about new behaviour of flow_from_directory
also, about BMP files.
* Update topology.py
* Update topology.py
* Update topology.py
* white space fix
* indentation fix
* add tests
* fix all tests
* add arguments arg to merge
* space after period
* add test with arguments
* add test with arguments for lambda layer too
* pep8 fixes
* fix tf test
* try fixing tf test; again
* bug fix
* finally
* Added SpatialDropout1D
This is a straightforward modification of SpatialDropout2D but for 1D data.
* Added SpatialDropout1D to docs
* SpatialDropout1D test
* Fixed indent issue
* Combined TF and TH dimension conditions
Use the same 1D dimensions for TensorFlow and Theano in SpatialDropout1D.
* trailing whitespace
* Removed dim_ordering variable
* Removing dim_ordering values
removing dim_ordering values as requested
* fix batch_norm when axis!=1
* fix dimshuffle for all backends
* moving cudnn bn fix to theano backend
* fix pep8
* dont use cudnn when bn axis is non broadcastable, ie dim=1
* Display wrapped layers in graph visualization
* Check parent class instead of class's module
* Check instance instead for brevity
* More consistent naming
* Fixed weights.sort for Python 3
In Python 3 weights.sort could throw a TypeError exception, if the
names are all None
* Fixed _flattened_layers under Python 3
If self.layers is empty, an IndexError appears when accessing it. So
it’s necessary to check if it’s non-empty first
* Fixed weight sorting for Theano backend
* Added missing import statement
* Improved backend handling for weight calculation
* Simplified weight sorting and backend check
* Changed behavior of weights sorting
* Removed unnecessary import
* manually terminate threads process returned by `generator_queue()`
Recently I custum a video sequence DataGenerator (based on ImageDataGenerator) for experiment. When I use model.fit_generator as following:
>history = model.fit_generator(train_data_generator, samples_per_epoch=train_data_generator.nb_sample,
nb_epoch=nb_epoch, verbose=1, callbacks=[early_stopping, model_checkpoint],
validation_data=test_data_generator, nb_val_samples=test_data_generator.nb_sample,
max_q_size=10, nb_worker=8, pickle_safe=True)
I found that the validation process consumes much longer time than training despite it contains less data.
I read the code and changed the `self.evaluate_generator()` (line 1482) in `fit_generator' to use a multiprocessing approach as training process did. However, the memory usage quikly increases and it only last for a few epoches.
Through analysis, I think it is caused by the processes weren't freed after the `evaluate_generator` accomplished. Thus I suggest returning `generator_threads` from function `generator_queue()` and manually terminate these threads in `fit_generator`, `evaluate_generator`, `predict_generator`.
* stastify the PEP style
* correct the PEP8's E128 error
* Switch use of TF cond function to use public function.
Prior to newer TFs, cond was unavailable and thus was being
imported via private module namespaces.
Newer TFs expose tf.cond as the public interface. There
are plans to remove private module namespace access so
this fixes keras to first try accessing through the public
namespace, and then going through the private one for older
versions of TF.
* PEP8 fix
* Make ZeroPadding2D and ZeroPadding1D optionally asymmetric
* Make padding argument polymorphic.
Add test case for asymmetric padding.
Remove excessive imports.
* Fix layer config saving.
* Duck typing (as soon as test passes tuple as a list)
* Doc update
* Set padding value for the missing keys to 0.
Raise exception if unexpected keys are found in the padding dict.
* Add test for ZeroPadding1D
* add categorical accuracy metric which tracks over top-k predictions
* remove top_k_categorical_accuracy from being tested together with other all_metrics
* fix in_top_k to work with batches. correct metrics.py and test_metrics.py appropriately
* style fixes for documentation on in_top_k function
* default to k=5 for top_k_categorical_accuracy metric
* Added optional path argument
* Added optional field name argument
* Added LambdaCallback callback
* Fixed on_epoch_begin assignment
* Match default signatures
* Whitespace
* Test LambdaCallback examples
* Only test process termination
* Imports
* Fixed test
* Wait on process to terminate
* Add zero threshold and set F measure to zero if no true samples exist
* Reduce zero threshold
* Flip thresholded non-zero count
* Add F measure test
* Updated test
* Remove lambda, simplify
* Whitespace
* Update docstring
* Update test
* Whitespace
* bypass shape inference in deconv2d
* * more doc in deconv layer
* more deconv layers in var autoencoder example
* * typo doc
* replicate deconv example with with paper's params
* replicate example with paper's params
* typo doc
* + relus in the deconv
* typo in var autoencodeur example
* + mult by ndim
* style fixes
* pep8
`os.listdir` to `sorted(os.listdir)` for alphabetical order instead of arbitrary order. Following PR#3751 this allows mask and images with the same name to be read together.
* add audio models: audio_convnet and audio_conv_rnn
* add audio models: audio_convnet and audio_conv_rnn
* remove white spaces at the end of lines
* add audio_conv_utils.py, update applications.md
* remove useless line in example in application.md
* remove useless line in example in application.md
* rename models (MusicTaggerCNN,CRNN), BN mode=0 weights
* pep8
* remove MusicTaggerCNN, add include_top argument
* update to follow pep8
* ReduceLROnPlateau Callback and CSVLogger Callback
* Added documentation and cleanup.
* Added examples.
* Added test for ReduceLROnPlateau()
* Minor changes to naming.
* Added epsilon for lr comparison.
* Fix sensitivity issue
* PEP8
Some of the variable names in this guide were misleadingly named. The outputs were named as `*_loss` implying that they held loss values, whereas they in fact held the outputs. It rather confused me; I believe my proposed naming is clearer.
* Added optional path argument
* Added optional field name argument
* Added LambdaCallback callback
* Fixed on_epoch_begin assignment
* Match default signatures
* Whitespace
* Test LambdaCallback examples
* Only test process termination
* Imports
* Fixed test
* Wait on process to terminate
* make ImageDataGenerator behaviour fully seedable/repeatable
This makes ImageDataGenerator fully seedable.
- the seed argument in fit is now used
- the seed argument in flow and flow_from_directory now effects
transforms
- added example to docs of transforming images and masks together
- added test of using two seeded streams at once
* implemented requested changes
- PEP8
- explicit names
- classes=None
- remove test
My reading of regularizers is that they cannot be reused, but it doesn't actually fail in any way and seems like it results in only regularizing the last layer. Having an exception prevent this would probably improve the ergonomics.
* Minimal SparseTensor support for TensorFlow
* Basic Theano support for Sparse dot product
* Sparse Input for Both + Sparse Concat for TF
* Fixed issue with _keras_shape for sparse Inputs
* pep8
* Cleanup + Theano concat (untested)
* Bug fix & pep8
* Fix Theano concat
* Bugfix & simplification
* Next step: Unit tests
* Basic unit test for sparse dot; TF works, TH fails
* Fix KTH is_sparse
* pep8
* more tests, sparse KTH.eval, pep8
* sparse model test
* address code review comments
* make sparse boolean in K.placeholder
* skip sparse tests when TH.sparse import fails
* pep8
* pep8
* fixed flakey test, auto-dense in KTH.eval
* fixed some more len/shape issues for fit_generator
* fixed some more len/shape issues for prediction
* Added better exceptions when theano.sparse fails to import
* betterer
* pep8
* Added stacked what where autoencoder.
SWWAE uses residual blocks. Trains fast. Creates very good reconstructions.
* Added newline at end for PEP8
* Went through PEP8 errors and corrected all (except for the imports which following the numpy seed, but this should be ok). Also, for the pool_size of 2, we halved the number of features maps and the number of epochs, and it still trains a net that can very nicely reconstruct the input.
* Added spaces arround - and + when they are used as binary operators (more PEP8).
* In decoder, the index of the features and pool size and wheres are all equal to nlayers-1-i, so set ind variable to this value and passed it to them.
* With ind variable in decoder, don't need two lines for the upsampling layer.
* Added title to plot, got rid of ticks on plot.
* PEP8 for * binary operator. Corrected some grammar issues in the docstring.
* Add Matthews correlation coefficient to metrics
I needed this for a Kaggle competition and it seemed useful in general so I thought I'd contribute it back.
* Enabled test for matthews metric
* Remove unnecessary cast garbage
* Addresses code review comments
* Renamed to matthews_corrcoef to be consistent with sklearn
* Update test_metrics.py
* pep8
* rename to mathews_correlation
* Update metrics.py
* Fixed typo
* CTC import compatibility with tensorflow 0.10
Try except clause to import ctc_loss in new path on tensorflow 0.10.
* Fixed ctc_decode and added tests for tensorflow.
ctc_decode when using beam search decoder has been fixed to conform with
tensorflow API. Function documentation has been updated to reflect the
changes. Two tests, for greedy and beam search decoding, have also been
added to test_backends.py.
* Fix pep8 styling.
* Fixed styling on long lines on ctc_decode tests.
* Fix Batch Norm compatibility with 3D inputs
the theano backend now uses dnn_batch_normalization which only supports
up to 4-dimensional input. This breaks any 5-d layers such as 3D
convolutions.
* using intermediate variable
By default TensorFlow allocates all gradient matricies on gpu:0, which makes it pretty much impossible to do parallelize a large model.
colocate_gradients_with_ops puts these matricies next to the operations, allowing you to split your model across multiple GPUs. I ran into this issue myself and this fixed it for me.
I think it's also meant to set gradient computations to be done on the device where the operations are stored, but my belief about that comes from https://github.com/tensorflow/tensorflow/issues/2441
I'm not sure why this isn't the default in TF, so I'm not sure if this should be behind a flag or something, but having to make my own patches to keras to do multi-GPU training seems like the wrong answer.
* Add support for dynamic RNNs in TensorFlow.
* Fix return states
* Add support for go_backwards in dynamic TF RNNs
* Currently broken: TF RNN dropout, go_backwards
* Finalize dynamic RNNs in TF
* Remove unnecessary comment
* Comment out added test
* Comment out functional guide test
* add cropping1d/2d/3d layers
* fix PEP8 issue, fix incorrect doc strings
* add example code on Cropping2D
* fix init/get_config of crop1d/3d, add test codes for cropping1d/2d/3d
* fix test code - PEP8
It doesnt pass test (only in cropping2d and basic_test), but my laptop setting is not correct (it doesnt pass some other existing layes as well), so committing to test it in a correct way.
* change to follow PEP8 again
* update test_convolutonal.py for PEP8, test code to us K.image_dim_ordering()
* PEP8 for test_convolutional.py - indentation
* fix typo. add assert to check cropping lengths
* Upload examples/imdb_fasttext.py which implement the fasttext model
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Fix a issue when only specify one dot_axes for in the Merge layer
* Fix a issue when only specify one dot_axes for in the Merge layer
* Updated dataset documentation to reflect removal of test_split argument
from imbd dataset. Added docstring to reuters dataset load_data.
* Updated imbd and reuters examples in dataset docs to reflect all
available arguments with current default values.
* Added CTC to Theano and Tensorflow backend along with image OCR example
* Fixed python style issues, made data files remote, and made code more idiomatic to Keras
* Fixed a couple more style issues brought up in the original PR
* Reverted wrappers.py
* Fixed potential training-on-validation issue and removed unused imports
* Fixed PEP8 issue
* Remaining PEP8 issues fixed
* Upload examples/imdb_fasttext.py which implement the fasttext model
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Added Convolution1D instead of Conv1D, which is depreceated
* updated rest of the example using Conv1D
* Python3 fails to decode utf-8 data, thus using encoding='latin-1'
* added condition for Encoding line 65-67
* Conv1D reverted back to the way it was
* One hot op
* tf too
* Update theano_backend.py
* Use built-in theano op
* Update theano_backend.py
* Add test
* Update test_backends.py
* Update test_backends.py
* Generalize for nD tensors
* Fix docstring on TF backend
* Update theano_backend.py
* Update theano_backend.py
* remove usage of tf.assign() in the tensorflow backend (#3316)
Usage of the tf.assign() function in the set_value() and batch_set_values() functions creates new nodes on the Tensorflow graph which can eventually overflow the memory.
Therefore, the function has been rewritten using placeholders and feed_dict to avoid allocating additional memory.
* Correction to the set_value() function
Change to the set_value() function that had a bug when the variable "value" was a float.
The *1. dummy multiplication was added to avoid having to deal with tf.float32_ref dtypes.
* update set_value() of the tensorflow backend
Removal of the *1. dummy multiplication, replacement with a split() to avoid creating a new operation in the graph.
* fix to have session.run() called once in batch_set_value()
Rewriting of the batch_set_value() to avoid multiple calls to session.run() to improve speed.
* Docker image for test and experiment Keras
- Docker image with CUDA support on ubuntu 14.04
- nvidia-docker script to forward the GPU to the container
- MakeFile to simplify docker commands for build, run, test, ..etc
- Add useful tools like jupyter notebook, ipdb, sklearn for experiments
* update nvidia-docker plugin
* use .theanorc in Dockerfile
* Add tensorflow to the docker image
* update Docker image to cuDNN v5
* test fixes
* move docker to sub directory
* README for docker
* Fix typos
* Add visualization to Dockerfile
* theano backend now supports transposed convolutions
* working deconv
* new example file with deconv vae
* merged with #3273, fixed based on comments, pep8 tested
* test fix
* passes theano test
* start fixing deconv test
* fix deconv layer tests
* fix the right test
sorry, I "fixed" the wrong test last time
* clean up
* replace with_None with fixed_batch_size
* with_None --> fixed_batch_size
* comment edit
* fixed comments online
A number of changes:
1. Switch from Lambda to merge, otherwise code will not run.
2. Rename z_log_std to z_log_var in order for the objective function to make sense
3. Adjust reparameterization trick to reflect use of z_log_var, not z_log_std
4. Remove epsilon_std, since (standard) VAE uses isotropic gaussian prior.
5. Re-balance the weighting of KL and reconstruction terms
6. Use adam instead of rmsprop
7. Increase hidden unit size to improve model
8. Increase batch size to speed up training
* make examples/pretrained_word_embeddings.py more memory efficient
* make examples/pretrained_word_embeddings.py more memory efficient
* rename NB_WORDS to nb_words as it is not a global constant
The method get_uid on common.py first check if a prefix is in _UID_PREFIXED dict
and if it is not, a variable is added to the dict.
However, using a defaultdict, this check is no longer necessary.
* Added 'max' operation to Merge layer. It allows to implement convolutional maxout with two (or more) convoluion layers and one Merge.
* Added 'max' to merge test
* Add multiprocessing for fit generator
* Change maxproc to nb_worker and update documentation
* Simplify multiprocessing test, clarify doc replace maxproc by nb_worker
* Replace maxproc by nb_worker in test
* Replace maxproc by nb_worker in test
* Update the doc: specify non picklable arguments should not be used with multiprocessing
* Add multiprocessing as an option with the pickle_safe argument
* New function signature for conv2d in backend
* Clean up stuff
* Touch-up TF deconv op
* More cleanup
* Support for TF 3D conv/pool
* Move pooling layers to their own file
* Update TF version in Travis config
* Fix conv3d tests
The documentation says that [1]:
> If [classes are] not provided, the list of classes will be automatically inferred (and the order of the classes, which will map to the label indices, will be alphanumeric).
However, the code was adding classes in the order `os.listdir` returned them. This commit alphanumerically sorts the sub-directories before mapping them to label indices.
[1] http://keras.io/preprocessing/image/
On method on_epoch_end, to add new keys to the history dict, first it is
verified if a key is not on the history dict and if that is the case, a new key
is created on the history dict with an empty list as value.
However, this operation search for a key twice in the dict. This same behavior
can be achieved in a single step using dict setdefault method.
An EarlyStopping callback object has internal state variables to tell it
when it has reached its stopping point. These were initialized in __init__(),
so attempting to re-use the same object resulted in immediate stopping. This
prevents (for example) performing early stopping during cross-validation with
the scikit-learn wrapper.
This patch initializes the variables in on_train_begin(), so they are re-set
for each training fold. Tests included.
* Resolve#2960
Introduce `K.var` so that the standard deviation computation can
be made numerically stable. Instead of
K.std(x)
the user is able to write
K.sqrt(K.var(x) + self.epsilon)
avoiding a division by zero in the gradient computation of `sqrt`.
* Fix typos
This issue is due to an unexpected loss of dimensionality when
composing the backend tensor operations "reshape" and "squeeze"
when there are dimensions of length 1.
For example, using a Theano backend the following fails with a
complaint about dimension mismatch:
UpSampling1D(2)(MaxPooling1D(2)(Reshape((2,1))(Input(shape=(2,)))))
The issue arises due to the conflict of two behaviors specific
to the Theano backend:
- Reshape uses Theano's reshape function. Theano's reshape
automatically makes dimensions with length 1 "broadcastable"
- MaxPooling1D's implementation class _Pooling1D has a call method
which uses a dummy dimension which it has to remove. The manner
in which this dummy method is removed it to call "squeeze(x, axis)"
from the backend. The squeeze implementation tells Theano to make
the dummy dimension broadcastable, and then calls Theano's "squeeze",
which removes ALL the broadcastable dimensions; not just the dummy
dimension, but also the length 1 dimension flagged as broadcastable
by reshape. This causes the problem observed above. This behavior
is distinct from the behavior of the TensorFlow backend, which
removes only the requested dimension.
This PR addresses this issue in two ways:
First, it introduces a test which checks the composition of "reshape"
and "squeeze" to make sure we get the same result using both Theano
and TensorFlow backends.
Second, it changes the implementation of squeeze(x,axis) so that the
Theano backend should behave similarly to the TensorFlow backend. With
this change the introduced test passes and the above example works.
* Update regularizers.py
I included a new regularizer named Eigenvalue Decay to the deep learning practitioner that aims at maximum-margin learning. This version approximates the dominant eigenvalue by a soft function given by the power method. For details, see:
Oswaldo Ludwig. "Deep learning with Eigenvalue Decay regularizer." ArXiv eprint arXiv:1604.06985 [cs.LG], (2016). https://www.researchgate.net/publication/301648136_Deep_Learning_with_Eigenvalue_Decay_Regularizer
The syntax for Eigenvalue Decay is similar to the other Keras weight regularizers, e.g.:
model.add(Dense(100, W_regularizer=EigenvalueRegularizer(0.0005)))
* Example with Eigenvalue Decay regularization.
An example from Keras including regularization with Eigenvalue Decay. After training, you have to save the trained weights, create/compile a similar model without Eingenvalue Decay and save this model. Then, you can use your trained weights with this model, see lines 123-153 of CIFAR10_with_Eigenvalue_Decay.py (This is still an open issue).
This example yields a gain in the accuracy by the use of Eigenvalue Decay of 2.71% (averaged over 10 runs).
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update regularizers.py
* Update regularizers.py
* Delete CIFAR10_with_Eigenvalue_Decay.py
* Update test_regularizers.py
* Update regularizers.py
* Update test_regularizers.py
* Update regularizers.py
* Update regularizers.py
I needed another reading in Keras backend...
* Issue to get shape of a tensor.
Issue to get shape of a tensor in the class EigenvalueRegularizer: the type returned for shape is different for Theano backend (Theano tensor type) and TF backend (TF TensorShape).
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* limit progress bar update rate
Limit progress bar update rate in verbose=1 mode. This patch allows to
reduce terminal I/O throughput while keeping reasonable high visual
update rate (defaults to 100 refreshes per second). It helps greatly
when working with large but simple data sets with small batches, which
leads to millions of relatively useless screen updates per second. Also
it helps to keep network traffic at reasonable rates, which
exceptionally useful within laggy networking conditions when using
keras over telnet/ssh, and improve web browser responsibility when
using keras within Jupyter Notebook.
* add docstrings for 'interval' and 'force' arguments
* bug fixed, numpy randint only output positive numbers ranging from 1 to 10e6
* Update theano_backend.py
changed style and numpy randint range
* Update theano_backend.py
removed extra spaces
From the documentation it is not entirely clear that if mask_zero is set
to True, the input_dim argument should be equal to the size of the
vocabulary + 2, as index 0 cannot be used anymore.
(This behaviour seems a bit strange, as it has as a consequence that the
first column of the weights of the embeddings will never be used or
updated. The resulting network thus has a redundant set of parameters).
* add a simple named entity recognition example
add a simple named entity recognition example
* add fast version of GRU
add fast version of GRU
* remove useless stuff
* Faster LSTM
* PEP8
* RNN dropout fix
* PEP
* PEP
* Less code duplication
* LSTM benchmark example
* PEP
* Test implementation modes
* Go through Keras backend
* Much better image data augmentor
* removed unnecessary functions
* shift origin to centre of the image for homographies
* init commit
* change to zoom_range
* Added scikit-image to extras_require in setup.py
* add zoom_range test, exception for invalid zoom_range
* add scikit-image to dependency
* fix fit and retain old functions for unit test
* use ndi insteadskimage in random_transform
* removed buggy code in random_rotations, shears etc and replaced it with todos.
* remove sci-image, implement ndimage based methods, refactor random_transform
* random_zoom, array_to_img consider dim_ordering
* add random_channel_shift, support fill_mode and cval
* image doc, update test_image, PEP8
* fix channel shift clip
* fix doc, refine code
* detail explain of zoom range
* check coding style
* adding a disable_b boolean to Dense
* changing 'disable_b' to 'bias'
Changing the name of the boolean & flipping its behavior so that the default is True and when set to False the bias is not used.
* integrating bias flag fully
changed the bias flag to affect the creation of the self.b variable as well as the output calculation
* fixing a blank line to appease pep8
* Max Over Time in imdb_cnn.py
Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR.
The mayor optimisation a part of the Max over time are:
- Dropout in the Embedding layer.
- Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time.
- Adam optimizer.
Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore.
* Update imdb_cnn.py
* added learning phase to callbacks (#2297)
* cleaned imports
* replaced tabs by spaces
* added case where uses_learning_phase is False
* fixed pep8 blank line bug
Previously, strides were required to be smaller than the convolution
kernel. Usually, this is what a user wants, but there are edge
cases where one might want to do this (for instance, projection
shortcuts in Residual Networks).
LeakyReLU returns a tensor with float64 dtype.
It is stupid, but this line actually produces a float64 array:
```
0.5*np.array(0.2, dtype=np.float32)
```
The theano nnet.relu function does something similar like this with the
LeakyReLU alpha parameter, which lead to a float64 tensor.
The solution is to not cast the alpha to float32.
Furthermore I tighten the `test_utils.layer_test`. It is now
required that the layer's output dtype is equal to the input dtype.
* add in predict_generator and tests
* fix PEP8 details
* Pre-allocate predictions
* make predictions return list if neccessary
* reset batch_size for other tests, make less wonky generator
* Fix merge_dot tests
* Make batch_dot unique
batch_dot is not tensordot! It only accepts one reduce dimension at a
time. Other reduce dimensions should be dome afterwards with K.sum
This means that K.batch_dot will have the same behavior in both
tensorflow and theano. This also means that we have less parenthesis and
less nested lists.
New usage:
merge_mode = 'dot', dot_axes=[axis1, axis2]
Before:
merge_mode = 'dot', dot_axes=[[axis1], [axis2]]
* Backport sign by @the-moliver
* Fix docstrings
* Fix backend batch_dot tests
When saving the weights a TypeError is raised by h5py.
See this issue https://github.com/h5py/h5py/issues/289 for details.
As it is recommended in the issue, the strings are now encoded as utf8.
* Fix merge_dot tests
* Make batch_dot unique
batch_dot is not tensordot! It only accepts one reduce dimension at a
time. Other reduce dimensions should be dome afterwards with K.sum
This means that K.batch_dot will have the same behavior in both
tensorflow and theano. This also means that we have less parenthesis and
less nested lists.
New usage:
merge_mode = 'dot', dot_axes=[axis1, axis2]
Before:
merge_mode = 'dot', dot_axes=[[axis1], [axis2]]
* Backport sign by @the-moliver
* Fix docstrings
Move caches to properties so that containers can override the
implementation to ensure that the cache gets propagated correctly
to child layers when it is changed.
Reset instead of disabling layer and shape cache in __call__
Previously, __call__ did not get the speed benefits from caching
because it disabled it in order to feed the layer new input. This
meant that __call__ could be very slow on complicated structures.
Now, instead of disabling it, we temporarily empty it, then restore
the original when we're done.
This refactor allows the inherited method to work properly for
Sequential and Graph (with single input) containers in addition
to normal layers, so there's no need to override the method.
Previously, __call__ did not work correctly for Graph containers.
Implement Graph.__call__ for multiple inputs
Add option (re-)initialize weights in set_previous
This allows us to use set_previous in places where we previously
manually adjusted the previous layer, which means that layers
that have non-standard set_previous implementations (like Graph)
work properly when they are, for example, the first layer in a
Sequential model.
This commit also adds a clear_previous method.
Add input_shape property to Graph container
---------------------------------------
Squashed from the following commits
add Convolution3D and MaxPooling3D layers
fix 5D tensor in theano, add examples
update conv3d, pool3d, add resize_volumes and spatial_3d_padding
update Convolution3D, MaxPooling3D and AveragePooling3D, add UpSampling3D and ZeroPadding3D
add test functions for Convolution3D, MaxPooling3D, AveragePooling3D, ZeroPadding3D and UpSampling3D
small fix by changing pad_z to pad_t
update comment
skip some tests for tenforflow, @pytest.mark.skipif(K._BACKEND != theano, reason="Requires Theano backend")
use autopep8 to fix the code to match pep8 coding style
small fix (caused by autopep8)
small fix (caused by autopep8)
small fix (caused by autopep8)
fixed the document string for all newly added layers
remove the example and the dataset for 3d
add error messge for tensorflow backend
support stride in pool3d
Rename "params" to "trainable_weights"
change notations and docstrings for 3D layers
fix pep8 error
change variable name in test code
small fix for pep8
add error message and docstring for strides in conv3d
fix test error caused by wrong strides in conv3d
support strides in conv3d by slicing the output
add if statement for stride (1,1,1)
fix get_config according to mdering, and other small fix
fix model_from_json issue by passing a 3d border_mode
fix according to jruales' review
change docstring in Convolution3D
delete docstring about TensorFlow
change docstring in Convolution3D and theano_backend
---------------------------------------
Author: Wei OUYANG <oeway007@gmail.com>
Have noticed how default GRUs works usually worse than LSTMs? It seems that "tanh" is a more sensible activation choice. Also for GRUs, tanh seems to be the default:
see http://arxiv.org/pdf/1412.3555v1.pdf Section 3.2
Squashed commit of the following:
commit 39a59192e96fe4098f1d663384b79b10e3bcc979
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 02:15:29 2016 +0000
Squashed commit of the following:
commit 88faa440d02df8ff356011258e3e89ce44a13e1d
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 02:13:24 2016 +0000
Clean up
commit f55245199a11a202857efb1413ffa3b97c1dcfaf
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:57:50 2016 +0000
Ported dropout for LSTM, GRU, SimpleRNN, and Embedding layer to latest Keras (turned off by default).
Squashed commit of the following:
commit 574c4549da69f8c0831f02dce1ad05331d8b38ed
Merge: 19ef51c bdb149d
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:23:54 2016 +0000
Merge branch 'BRNN_latest' of https://github.com/yaringal/keras into BRNN_latest
commit 19ef51c633544f847cddebeb7a3add0936051f19
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:12:23 2016 +0000
implemented dropout in GRU and SimpleRNN
commit bdb149d1bbff64cc6b4d694090b905153d28e33a
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:12:23 2016 +0000
implemented dropout in GRU and SimpleLSTM
commit 72ade3f493dd725fb414cbc65a847259360be138
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 00:52:01 2016 +0000
clean up
commit 9f3d213c91906b3be5c876d539819a8577bc438c
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 00:42:58 2016 +0000
Model test callback
commit d4ffffc26cf24c8b7927209caad4379aac3db9c5
Author: Yarin <yaringal@gmail.com>
Date: Fri Feb 19 23:47:40 2016 +0000
removed dependence on theano
commit 89a4e6576278564ffb882032d5a7ec5758fe00e4
Author: Yarin <yg279@cam.ac.uk>
Date: Fri Feb 19 23:25:13 2016 +0000
working BayesianLSTM and embedding dropout for theano backend
commit 1ab4e19dfe9d49defd5575a5c2b0b880b5c46eb5
Author: Yarin <yg279@cam.ac.uk>
Date: Fri Feb 19 16:41:48 2016 +0000
working BayesianLSTM with dependence on theano
commit 672c27401ee345a69592771cfc9ab017642b6af3
Merge: 9360ea6 b8a9f84
Author: Yarin <yaringal@gmail.com>
Date: Fri Feb 19 00:30:44 2016 +0000
Merge https://github.com/fchollet/keras into BRNN_latest
commit 9360ea6c25eab90e83aebb32eb187c65ed63c01d
Author: Yarin <yaringal@gmail.com>
Date: Thu Feb 18 23:28:35 2016 +0000
work in progress on BayesianLSTM
commit b8a9f84fad
Merge: a1544950f3f563
Author: François Chollet <francois.chollet@gmail.com>
Date: Thu Feb 18 11:24:42 2016 -0800
Merge pull request #1756 from gw0/fix-for-refactor-callbacks
Fix missing callback refactoring.
commit 0f3f56327b
Author: gw0 [http://gw.tnode.com/] <gw.2016@tnode.com>
Date: Thu Feb 18 17:01:45 2016 +0100
Fix missing callback refactoring.
`Sequential.set_weights` would fail for nested `Sequential`
containers. Borrow the implementation of `Graph.set_weights` to
get it working. Also add tests for triply-nested `Sequential`
models.
If sample_weights is to be used as a mask as well as for re-weighting
then it's important that, at least when used as a mask, the output be
rescaled. Otherwise the order of magnitude of your objective changes
purely based on the number of masked entries in your training data.
- Categorical_crossentropy was taking an extra mean, the function
already removes the final dimension of your input, so you don't need
to take a mean as you would with, say, L2 loss.
- The RNN backend call can now take a mask with or without the same
number of dimensions as the input data
- Fix Masking layer for Tensorflow
- Add some tests to confirm objective function shapes
Currently, polynomial interpolation of 3rd order is done when shifting. However, that is not needed because the images are shifted by integer values (crop_left_pixels, crop_top_pixels), and there is nothing to interpolate.
Setting ```order=0``` will speed up random shifts significantly.
The change to the dimshuffle/transpose call to support >3d inputs was
correct for the inputs array but did not apply to the mask array. This
fixes that.
Currently, only 3D input is supported by the rnn function.
Update theano_backend.py
Fix tf too
Avoid slicing
assert ndim>=3
Update theano_backend.py
typo
Update theano_backend.py
This commit fixes the DisconnectedInputError described in issue
the `get_output` method. Before this commit the `updates` member
could would use another input as the `get_output` method, if the
input was changed.
The `params`, `regularizers`, `constraints` and `updates` member of the
AutoEncoder were set in the `__init__` method.
When set_previous was called, the mentioned members were not updated.
This behavior resulted in a DisconnectedInputError.
Now the mentioned members are set in the `build` method and the
`set_previous` method calls the `build` method every time the
input changes. This commit fixes issue #1275.
# The first commit's message is:
test image preprocessing
# The 2nd commit message will be skipped:
# add PIL to enable testing of preprocessing code
# The 3rd commit message will be skipped:
# try a different way to install PIL on travis
# The 4th commit message will be skipped:
# include PIL only in python 2.7
# The 5th commit message will be skipped:
# test image preprocessing
# The 6th commit message will be skipped:
# fall back to Pillow for python 3 image processing
In order to propagate state through _predictions_, I created a new
property of the model, `state_updates` that returns any model step
updates that are needed when doing a stateful prediction. These updates
are identified as *any updates defined by a stateful layer*.
thresholded activations
parametric softplus
some bugfixes
fix error caused by calling layer.build on PReLU
seed the rng in every test individually to make them deterministic
Summary of changes:
- py.test is configured to display test profiling information that shows 10 slowest tests. This would allow additional speed ups if anyone has ideas on some particular test. The slowest test is usually cifar dataset test and tensorflow convolutions. It seems that there are some other IT tests that could be sped up.
- py.test is configured to run with pytest-xdist with 2 processes in parallel because travis does provide multicore support (1.5 cores) and because the slowest cifar test spends time on download which can run in parallel with other tests.
- travis is configured to split backend tests into test matrix to make parallel theano vs tensorflow testing as opposed to rerun all the tests twice for python 2.7.
- pickle filenames in tests are renamed to avoid clashes during multiprocessing
As the graph container was not using each individual layer's get/set
weights, but rather the super class layer.get_weights, which works on
self.params(), it was missing some weights in the process, i.e., the
BatchNormalizationLayer has custom get_weights which allows to save the
running mean/std. However, these running computations are not added to
BatchNormalizationLayer.params(), resulting in losing these weights
after serializing a graph model utilizing a BatchNormalizationLayer.
Fixed to use each node's get/set weights.
When mode='ave', and the dtype of the input is float32, dividing the sum
by shape[1], which is of dtype int64, results in an output of dtype
float64, which is wrong.
fixed to use theano.tensor.mean instead.
When loading regularizers/constraints from config, and the object isn't
found, don't consume the 'name' key.
This enables expansions to keras to be saved/loaded with dictionaries as
some of their parameters.
Signed-off-by: Amit Beka <amit.beka@gmail.com>
This should fix the problem`Exception: Invalid layer: LRN2D` while loading a model that includes LRN2D.
```py
model = Sequential()
model.add(Convolution2D(30, 3, 3, input_shape=(1, 28, 28)))
model.add(LRN2D())
model_def = model.to_yaml()
# this line raises Exception: Invalid layer: LRN2D
model_from_yaml(model_def)
```
The code above could reproduce the problem.
“TypeError: Cannot cast ufunc subtract output from dtype('float64') to
dtype('uint8') with casting rule 'same_kind'” in
keras/preprocessing/image.py, line 239, when using data augmentation.
A bit surprised that keras was using globals() to access layers (doesn't work
across modules.) Hacky solution was to pass a dict mapping name -> class.
I called this dict `custom_layers`.
Is there a better way of doing this that I'm not seeing?
This allows you to do nice things like save JSON models so that they're human
readable & editable. For example:
>>> with open('output.json', 'w') as f:
... f.write(model.to_json(indent=4, sort_keys=True))
...
This makes merge_mode='join' complaint with keras API. Also, the OrderedDict
allows the user to simple .values() and use it as a list if he knows in which
order the inputs were merged.
By allowing sum_values[k] to be other things than lists, it makes it easier for children classes to print "any value" (in my case, a timedelta object).
enable string formatted filenames (e.g. weights.{epoch:02d}.hdf5), so
every epoch will be saved to a different file without overwriting.
Signed-off-by: Amit Beka <amit.beka@gmail.com>
We used nonzero() on the weights in order to ensure that if there
happened to be a NaN or an Inf in the output that was going to be masked
about by the weights anyway, it wouldn't propagate (because 0*inf = NaN)
however this was causing interaction issues if you also used a mask,
because that wasn't using nonzero() properly.
This fixes that, and also fixes what I believe was an issue where I was
calling mean() instead of dividing by the sum of the sample weights.
With lr and momentum being scalars we can change their values without
needing to recompile the model. This PR also includes a Callback called
LrSetter that gets a dict with epoch x lr pairs and set the values of
the later at the begging of the associated epoch.
`refs` is a class attribute, not an instance attribute. If you make `refs` an instance attribute, this will cause `HDF5Matrix` to open the same HDF5 file more than once (which should never happen).
Calling sequences_to_matrix results in an IndexError when nb_words = None. This is caused by a 1-indexed word_index, since sequences_to_matrix expects 0-indexing. Converts word_index to 0-based indexing.
As far as I can tell there is no reason not to support class_weight with
time distributed data, rewriting the standardize_weights function with
that in mind.
urlretrieve will blindly swallow any 4xx and 5xx responses
and then save the html error response in the local file. This
is probably exactly what we don't want, because not only will
the program crash if there is a network hiccup when the error
file cannot be opened, but it will continue to do so when rerun
until the corrupt cached file is found and manually removed.
Luckily, urlretrieve is just a thin wrapper around
FancyURLopener, so we can make our own thin wrapper
that throws an exception instead of caching the
wrong file.
Tested to be working as before when running cached and
uncached datasets, and also verified to fail loudly
when asked to fetch http://httpstat.us/500
Updated adam solver to v8 of paper. The kappa (lambda) parameter has no
practical use and has been removed.
Fixed the calculations for beta_1_t and beta_2_t where also wrong.
Modify to use proper multinomial sampling, with temperature to control diversity. This seems to generate qualitatively better results and is technically more correct.
applying a Convolution2D with border_mode='Full', images will grow in
size, this Layer allows to shrink them back to its original size (or any
other size)
Standard deviation values were being passed as scale values for uniform distributions.
But the relationship is: scale = standard deviation * sqrt(3).
So, the s values in glorot_uniform, lecun_uniform, and he_uniform should have been multiplied by sqrt(3) before being passed into uniform() function. Now it is fixed.
The scan in get_output TimeDistributedDense leaked memory like crazy. Changing it to match get_output in Dense seems to have fixed the problem and behaves identically.
This changes objective functions to no longer return scalars, but
rather tensors of dimension one less than y, representing the loss for
each datapoint in y, on which it is expected you will calculate a weighted mean.
There is no reason to have two different functions for this! The softmax
function can just be configured to always perform the softmax across the
trailing dimension (i.e. nb_dimensions)
Both the training features and labels can be represented as numpy
booleans instead of float32 / float64. This enables standard low RAM
machines to scale up to large datasets. Especially important if you
either have many characters (ASCII), long sequences, or a large dataset.
Both the training features and labels can be represented as numpy
booleans instead of float32 / float64. This enables standard low RAM
machines to scale up to large datasets. Especially important if you
either have many characters (ASCII), long sequences, or a large dataset.
I realized that it makes more sense to have _step *apply* a mask, but
then to set the masked entries to mask_value outside of step. This
should be more efficient, but more importantly should make
implementations easier to understand.
Another nice effect: an alternative masking scheme can be introduced
without changing _step at all.
This led me to realize that I also was not properly passing masks out of
recurrent layers, nor were my tests properly checking for this. I've
resolved this here.
Found a bug? Have a new feature to suggest? Want to contribute changes to the codebase? Make sure to read this first.
## Bug reporting
Your code doesn't work, and you have determined that the issue lies with Keras? Follow these steps to report a bug.
1. Your bug may already be fixed. Make sure to update to the current Keras master branch, as well as the latest Theano/TensorFlow master branch.
To easily update Theano: `pip install git+git://github.com/Theano/Theano.git --upgrade`
2. Search for similar issues. Make sure to delete `is:open` on the issue search to find solved tickets as well. It's possible somebody has encountered this bug already. Also remember to check out Keras' [FAQ](http://keras.io/faq/). Still having a problem? Open an issue on Github to let us know.
3. Make sure you provide us with useful information about your configuration: what OS are you using? What Keras backend are you using? Are you running on GPU? If so, what is your version of Cuda, of cuDNN? What is your GPU?
4. Provide us with a script to reproduce the issue. This script should be runnable as-is and should not require external data download (use randomly generated data if you need to run a model on some test data). We recommend that you use Github Gists to post your code. Any issue that cannot be reproduced is likely to be closed.
5. If possible, take a stab at fixing the bug yourself --if you can!
The more information you provide, the easier it is for us to validate that there is a bug and the faster we'll be able to take action. If you want your issue to be resolved quickly, following the steps above is crucial.
---
## Requesting a Feature
You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API.
1. Provide a clear and detailed explanation of the feature you want and why it's important to add. Keep in mind that we want features that will be useful to the majority of our users and not just a small subset. If you're just targeting a minority of users, consider writing an add-on library for Keras. It is crucial for Keras to avoid bloating the API and codebase.
2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of your feature. Of course, you don't need to write any real code at this point!
3. After discussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.
---
## Requests for Contributions
[This is the board](https://github.com/fchollet/keras/projects/1) where we list current outstanding issues and features to be added. If you want to start contributing to Keras, this is the place to start.
---
## Pull Requests
**Where should I submit my pull request?**
1.**Keras improvements and bugfixes** go to the [Keras `master` branch](https://github.com/fchollet/keras/tree/master).
2.**Experimental new features** such as layers and datasets go to [keras-contrib](https://github.com/farizrahman4u/keras-contrib). Unless it is a new feature listed in [Requests for Contributions](https://github.com/fchollet/keras/projects/1), in which case it belongs in core Keras. If you think your feature belongs in core Keras, you can submit a design doc to explain your feature and argue for it (see explainations below).
Here's a quick guide to submitting your improvements:
1. If your PR introduces a change in functionality, make sure you start by writing a design doc and sending it to the Keras mailing list to discuss whether the change should be made, and how to handle it. This will save you from having your PR closed down the road! Of course, if your PR is a simple bug fix, you don't need to do that. The process for writing and submitting design docs is as follow:
- Start from [this Google Doc template](https://docs.google.com/document/d/1ZXNfce77LDW9tFAj6U5ctaJmI5mT7CQXOFMEAZo-mAA/edit#), and copy it to new Google doc.
- Fill in the content. Note that you will need to insert code examples. To insert code, use a Google Doc extension such as [CodePretty](https://chrome.google.com/webstore/detail/code-pretty/igjbncgfgnfpbnifnnlcmjfbnidkndnh?hl=en) (there are several such extensions available).
- Set sharing settings to "everyone with the link is allowed to comment"
- Send the document to `keras-users@googlegroups.com` with a subject that starts with `[API DESIGN REVIEW]` (all caps) so that we notice it.
- Wait for comments, and answer them as they come. Edit the proposal as necessary.
- The proposal will finally be approved or rejected. Once approved, you can send out Pull Requests or ask others to write Pull Requests.
2. Write the code (or get others to write it). This is the hard part!
3. Make sure any new function or class you introduce has proper docstrings. Make sure any code you touch still has up-to-date docstrings and documentation. **Docstring style should be respected.** In particular, they should be formatted in MarkDown, and there should be sections for `Arguments`, `Returns`, `Raises` (if applicable). Look at other docstrings in the codebase for examples.
4. Write tests. Your code should have full unit test coverage. If you want to see your PR merged promptly, this is crucial.
5. Run our test suite locally. It's easy: from the Keras folder, simply run: `py.test tests/`.
- You will need to install the test requirements as well: `pip install -e .[tests]`.
6. Make sure all tests are passing:
- with the Theano backend, on Python 2.7 and Python 3.5. Make sure you have the development version of Theano.
- with the TensorFlow backend, on Python 2.7 and Python 3.5. Make sure you have the development version of TensorFlow.
7. We use PEP8 syntax conventions, but we aren't dogmatic when it comes to line length. Make sure your lines stay reasonably sized, though. To make your life easier, we recommend running a PEP8 linter:
- Run a standalone PEP8 check: `py.test --pep8 -m pep8`
- You can automatically fix some PEP8 error by running: `autopep8 -i --select <errors> <FILENAME>` for example: `autopep8 -i --select E128 tests/keras/backend/test_backends.py`
8. When committing, use appropriate, descriptive commit messages.
9. Update the documentation. If introducing new functionality, make sure you include code snippets demonstrating the usage of your new feature.
10. Submit your PR. If your changes have been approved in a previous discussion, and if you have complete (and passing) unit tests as well as proper docstrings/documentation, your PR is likely to be merged promptly. Otherwise, well...
---
## Adding new examples
Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. [Existing examples](https://github.com/fchollet/keras/tree/master/examples) show idiomatic Keras code: make sure to keep your own script in the same spirit.
Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on [StackOverflow](http://stackoverflow.com/questions/tagged/keras) or [join the Keras Slack channel](https://keras-slack-autojoin.herokuapp.com/) and ask there instead of filing a GitHub issue.
Thank you!
- [ ] Check that you are up-to-date with the master branch of Keras. You can update with:
- [ ] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found [here](https://www.tensorflow.org/get_started/os_setup).
- [ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python / Theano so as not to have to deal with the dearth of ecosystem in Lua. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Keras is a high-level neural networks API, written in Python and capable of running on top of [TensorFlow](https://github.com/tensorflow/tensorflow), [CNTK](https://github.com/Microsoft/cntk), or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research.*
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
-supports both convolutional networks (for vision) and recurrent networks (for sequence data). As well as combinations of the two.
-runs seamlessly on the CPU and the GPU.
-Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
-Supports both convolutional networks and recurrent networks, as well as combinations of the two.
- Runs seamlessly on CPU and GPU.
Read the documentation at [Keras.io](http://keras.io).
Keras is compatible with __Python 2.7-3.4__.
Keras is compatible with:__Python 2.7-3.5__.
------------------
## Guiding principles
- __Modularity.__ A model is understood as a sequence of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions and dropout are all standalone modules that you can combine to create new models.
- __User friendliness.__ Keras is an API designed for human beings, not machines. It puts user experience front and center. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.
- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Modularity.__ A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
- __Easy extensibility.__ New features (a new module, per the above definition, or a new way to combine modules together) are dead simple to add (as new classes/functions), and existing modules provide ample examples.
- __Easy extensibility.__ New modules are simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, benefits from syntax highlighting, and most of all, allows for ease of extensibility. See for yourself with the examples below.
- __Work with Python__. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
## Examples
### Multilayer Perceptron (MLP):
------------------
## Getting started: 30 seconds to Keras
The core data structure of Keras is a __model__, a way to organize layers. The simplest type of model is the [`Sequential`](http://keras.io/getting-started/sequential-model-guide) model, a linear stack of layers. For more complex architectures, you should use the [Keras functional API](http://keras.io/getting-started/functional-api-guide), which allows to build arbitrary graphs of layers.
If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Building a question answering system, an image classification model, a Neural Turing Machine, or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?
For a more in-depth tutorial about Keras, you can check out:
- [Getting started with the Sequential model](http://keras.io/getting-started/sequential-model-guide)
- [Getting started with the functional API](http://keras.io/getting-started/functional-api-guide)
In the [examples folder](https://github.com/fchollet/keras/tree/master/examples) of the repository, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, etc.
## Current capabilities
------------------
For complete coverage of the API, check out [the Keras documentation](http://keras.io).
A few highlights: convnets, LSTM, GRU, word2vec-style embeddings, PReLU, batch normalization...
## Installation
Keras uses the following dependencies:
- numpy, scipy
- Theano
- See installation instructions: http://deeplearning.net/software/theano/install.html#install
- yaml
- HDF5 and h5py (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: cuDNN.
Once you have the dependencies installed, cd to the Keras folder and run the install command:
To install Keras, `cd` to the Keras folder and run the install command:
```sh
sudo python setup.py install
```
You can also install Keras from PyPI:
```sh
sudo pip install keras
```
------------------
## Switching from TensorFlow to CNTK or Theano
By default, Keras will use TensorFlow as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
------------------
## Support
You can ask questions and join the development discussion:
- On the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
- On the [Keras Slack channel](https://kerasteam.slack.com). Use [this link](https://keras-slack-autojoin.herokuapp.com/) to request an invitation to the channel.
You can also post **bug reports and feature requests** (only) in [Github issues](https://github.com/fchollet/keras/issues). Make sure to read [our guidelines](https://github.com/fchollet/keras/blob/master/CONTRIBUTING.md) first.
------------------
## Why this name, Keras?
Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
>_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
- __softmax__: Should only be applied to 2D layers (expected shape: `(nb_samples, nb_dims)`).
- __time_distributed_softmax__: Softmax applied to every sample at every timestep of a layer of shape `(nb_samples, nb_timesteps, nb_dims)`.
- __softplus__
- __relu__
- __tanh__
- __sigmoid__
- __hard_sigmoid__
- __linear__
## On Advanced Activations
Activations that are more complex than a simple Theano function (eg. learnable activations, configurable activations, etc.) are available as [Advanced Activation layers](layers/advanced_activations.md), and can be found in the module `keras.layers.advanced_activations`. These include PReLU and LeakyReLU.
A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callback (as the keyword argument `callbacks`) to the `.fit()` method of the `Sequential` model. The relevant methods of the callbacks will then be called at each stage of the training.
---
## Base class
```python
keras.callbacks.Callback()
```
- __Properties__:
- __params__: dict. Training parameters (eg. verbosity, batch size, number of epochs...).
- __model__: `keras.models.Model`. Reference of the model being trained.
- __Methods__:
- __on_train_begin__(logs={}): Method called at the beginning of training.
- __on_train_end__(logs={}): Method called at the end of training.
- __on_epoch_begin__(epoch, logs={}): Method called at the beginning of epoch `epoch`.
- __on_epoch_end__(epoch, logs={}): Method called at the end of epoch `epoch`.
- __on_batch_begin__(batch, logs={}): Method called at the beginning of batch `batch`.
- __on_batch_end__(batch, logs={}): Method called at the end of batch `batch`.
The `logs` dictionary will contain keys for quantities relevant to the current batch or epoch. Currently, the `.fit()` method of the `Sequential` model class will include the following quantities in the `logs` that it passes to its callbacks:
- __on_epoch_end__: logs optionally include `val_loss` (if validation is enabled in `fit`), and `val_accuracy` (if validation and accuracy monitoring are enabled).
- __on_batch_begin__: logs include `size`, the number of samples in the current batch.
- __on_batch_end__: logs include `loss`, and optionally `accuracy` (if accuracy monitoring is enabled).
---
## Create a callback
You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.
Here's a simple example saving a list of losses over each batch during training:
- __X_train, X_test__: uint8 array of RGB image data with shape (nb_samples, 3, 32, 32).
- __y_train, y_test__: uint8 array of category labels with shape (nb_samples,).
- __Arguments:__
- __label_mode__: "fine" or "coarse".
---
## IMDB Movie reviews sentiment classification
`keras.datasets.imdb`
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a [sequence](preprocessing/sequence.md) of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".
As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word.
- __X_train, X_test__: list of sequences, which are lists of indexes (integers). If the nb_words argument was specific, the maximum possible index value is nb_words-1. If the maxlen argument was specified, the largest possible sequence length is maxlen.
- __y_train, y_test__: list of integer labels (1 or 0).
- __Arguments:__
- __path__: if you do have the data locally (at `'~/.keras/datasets/' + path`), if will be downloaded to this location (in cPickle format).
- __nb_words__: integer or None. Top most frequent words to consider. Any less frequent word will appear as 0 in the sequence data.
- __skip_top__: integer. Top most frequent words to ignore (they will appear as 0s in the sequence data).
- __maxlen__: int. Maximum sequence length. Any longer sequence will be truncated.
- __test_split__: float. Fraction of the dataset to be used as test data.
- __seed__: int. Seed for reproducible data shuffling.
---
## Reuters newswire topics classification
`keras.datasets.reuters`
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with the IMDB dataset, each wire is encoded as a sequence of word indexes (same conventions).
Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses [Theano](http://deeplearning.net/software/theano/) under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- supports both __convolutional networks__ and __recurrent networks__ (LSTM, GRU, etc). As well as combinations of the two.
- runs seamlessly on the CPU and the GPU.
## Guiding principles
- __Modularity.__ A model is understood as a sequence of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions and dropout are all standalone modules that you can combine to create new models.
- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Easy extensibility.__ A new feature (a new module, per the above definition, or a new way to combine modules together) are dead simple to add (as new classes/functions), and existing modules provide ample examples.
- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, benefits from syntax highlighting, and most of all, allows for ease of extensibility.
## Code
Find the code on Github: [fchollet/keras](https://github.com/fchollet/keras).
## License
Keras is licensed under the [MIT license](http://opensource.org/licenses/MIT).
## Getting started: 30 seconds to Keras
The core datastructure of Keras is a __model__, a way to organize layers. Here's a sequential model (a linear pile of layers).
If you need to, you can further configure your optimizer. A core principle of Keras is make things things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Building a network of LSTMs, a deep CNN, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?
Have a look at the [examples](examples.md).
## Installation
Keras uses the following dependencies:
- numpy, scipy
- Theano
- See [installation instructions](http://deeplearning.net/software/theano/install.html#install).
- HDF5 and h5py (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: cuDNN.
Once you have the dependencies installed, clone the repo:
```bash
git clone https://github.com/fchollet/keras.git
```
Go to the Keras folder and run the install command:
```bash
cd keras
sudo python setup.py install
```
## Support
You can ask questions and join the development discussion on the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
## Contribution Guidelines
Keras welcomes all contributions from the community.
- Keep a pragmatic mindset and avoid bloat. Only add to the source if that is the only path forward.
- New features should be documented. Make sure you update the documentation along with your Pull Request.
- The documentation for every new feature should include a usage example in the form of a code snippet.
- All changes should be tested. A formal test process will be introduced very soon.
- Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of [examples](https://github.com/fchollet/keras/tree/master/examples).
## Why this name, Keras?
Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
> _"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_
Parametrized linear unit. Similar to a LeakyReLU, where each input unit has its alpha coefficient, and where these coefficients are learned during training.
- __Input shape__: Same as `input_shape`. This layer cannot be used as first layer in a model.
- __Output shape__: Same as input.
- __Arguments__:
- __input_shape__: tuple.
- __References__:
- [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://arxiv.org/pdf/1502.01852v1.pdf)
Containers are ensembles of layers that can be interacted with through the same API as `Layer` objects.
## Sequential
```python
keras.layers.containers.Sequential(layers=[])
```
The Sequential container is a linear stack of layers. Apart from the `add` methods and the `layers` constructor argument, the API is identical to that of the `Layer` class.
This class is also the basis for the `keras.models.Sequential` architecture.
The `layers` constructor argument is a list of Layer instances.
Connect the input of the current layer to the output of the argument layer.
- __Return__: None.
- __Arguments__:
- __previous_layer__: Layer object.
```python
output(train)
```
Get the output of the layer.
- __Return__: Theano tensor.
- __Arguments__:
- __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
```python
get_input(train)
```
Get the input of the layer.
- __Return__: Theano tensor.
- __Arguments__:
- __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
```python
get_weights()
```
Get the weights of the parameters of the layer.
- __Return__: List of numpy arrays (one per layer parameter).
```python
set_weights(weights)
```
Set the weights of the parameters of the layer.
- __Arguments__:
- __weights__: List of numpy arrays (one per layer parameter). Should be in the same order as what `get_weights(self)` returns.
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
Fully-connected layer distributed over the time dimension. Useful after a recurrent network set to `return_sequences=True`.
- __Input shape__: 3D tensor with shape: `(nb_samples, nb_timesteps, input_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
A customizable autoencoder model. If `output_reconstruction = True` then dim(input) = dim(output) else dim(output) = dim(hidden)
- __Input shape__: The layer shape is defined by the encoder definitions
- __Output shape__: The layer shape is defined by the decoder definitions
- __Arguments__:
- __encoder__: A [layer](./) or [layer container](./containers.md).
- __decoder__: A [layer](./) or [layer container](./containers.md).
- __output_reconstruction__: If this is False the when .predict() is called the output is the deepest hidden layer's activation. Otherwise the output of the final decoder layer is presented. Be sure your validation data confirms to this logic if you decide to use any.
- __tie_weights__: If True then the encoder bias is tied to the decoder bias. **Note**: This required the encoder layer corresponding to this decoder layer to be of the same time, eg: Dense:Dense
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
A denoising autoencoder model that inherits the base features from autoencoder.
Since this layer uses similar logic to Dropout it cannot be the first layer in a pipeline.
- __Input shape__: The layer shape is defined by the encoder definitions
- __Output shape__: The layer shape is defined by the decoder definitions
- __Arguments__:
- __encoder__: A [layer](./) or [layer container](./containers.md).
- __decoder__: A [layer](./) or [layer container](./containers.md).
- __output_reconstruction__: If this is False the when .predict() is called the output is the deepest hidden layer's activation. Otherwise the output of the final decoder layer is presented. Be sure your validation data confirms to this logic if you decide to use any.
- __tie_weights__: If True then the encoder bias is tied to the decoder bias. **Note**: This required the encoder layer corresponding to this decoder layer to be of the same time, eg: Dense:Dense
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __corruption_level__: the amount of binomial noise added to the input layer of the model.
- __Input shape__: This layer does not assume a specific input shape. As a result, it cannot be used as the first layer in a model.
- __Output shape__: Same as input.
- __Arguments__:
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
---
## Dropout
```python
keras.layers.core.Dropout(p)
```
Apply dropout to the input. Dropout consists in randomly setting a fraction `p` of input units to 0 at each update during training time, which helps prevent overfitting. Reference: [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)
- __Input shape__: This layer does not assume a specific input shape.
- __Output shape__: Same as input.
- __Arguments__:
- __p__: float (0 <= p < 1). Fraction of the input that gets dropped out at training time.
---
## Reshape
```python
keras.layers.core.Reshape(*dims)
```
Reshape the input to a new shape containing the same number of units.
- __Input shape__: This layer does not assume a specific input shape.
A dense maxout layer. A `MaxoutDense` layer takes the element-wise maximum of `nb_feature``Dense(input_dim, output_dim)` linear layers. This allows the layer to learn a convex, piecewise linear activation function over the inputs. See [this paper](http://arxiv.org/pdf/1302.4389.pdf) for more details. Note that this is a *linear* layer -- if you wish to apply activation function (you shouldn't need to -- they are universal function approximators), an `Activation` layer must be added after.
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __nb_feature__: int >= 0. the number of features to create for the maxout. This is equivalent to the number of piecewise elements to be allowed for the activation function.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
Merge the output of a list of models into a single tensor, following one of two modes: `sum` or `concat`.
- __Arguments__:
- __models__: List of `Sequential` models.
- __mode__: String, one of `{'sum', 'concat'}`. `sum` will simply sum the outputs of the models (therefore all models should have an output with the same shape). `concat` will concatenate the outputs along the last dimension (therefore all models should have an output that only differ along the last dimension).
Turn positive integers (indexes) into denses vectors of fixed size,
eg. `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]`
- __Input shape__: 2D tensor with shape: `(nb_samples, maxlen)`.
- __Output shape__: 3D tensor with shape: `(nb_samples, maxlen, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occuring in the input data.
- __output_dim__: int >= 0. Dimension of the dense embedding.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the embedding matrix.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the embedding matrix.
This layer turns a pair of words (a pivot word + a context word, ie. a word from the same context as a pivot, or a random, out-of-context word), indentified by their indices in a vocabulary, into two dense reprensentations (word representation and context representation).
Then it returns `activation(dot(pivot_embedding, context_embedding))`, which can be trained to encode the probability of finding the context word in the context of the pivot word (or reciprocally depending on your training procedure).
For more context, see Mikolov et al.: [Efficient Estimation of Word reprensentations in Vector Space](http://arxiv.org/pdf/1301.3781v3.pdf)
- __Input shape__: 2D tensor with shape: `(nb_samples, 2)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, 1)`.
- __Arguments__:
- __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occuring in the input data.
- __proj_dim__: int >= 0. Dimension of the dense embedding used internally.
- __init__: name of initialization function for the embeddings (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
- __weights__: list of numpy arrays to set as initial weights. The list should have 2 element, both of shape `(input_dim, proj_dim)`. The first element is the word embedding weights, the second one is the context embedding weights.
Fully connected RNN where output is to fed back to input. Not a particularly useful model, included for demonstration purposes.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __activation__: activation function. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __weights__: list of numpy arrays to set as initial weights. The list should have 3 elements, of shapes: `[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]`.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
Not a particularly useful model, included for demonstration purposes.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __depth__: int >= 1. Lookback depth (eg. depth=1 is equivalent to SimpleRNN).
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have depth+2 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have 9 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __References__:
- [On the Properties of Neural Machine Translation: Encoder–Decoder Approaches](http://www.aclweb.org/anthology/W14-4012)
- [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](http://arxiv.org/pdf/1412.3555v1.pdf)
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have 12 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
- __loss__: str (name of objective function) or objective function. See [objectives](objectives.md).
- __class_mode__: one of "categorical", "binary". This is only used for computing classification accuracy or using the predict_classes method.
- __theano_mode__: A `theano.compile.mode.Mode` ([reference](http://deeplearning.net/software/theano/library/compile/mode.html)) instance controlling specifying compilation options.
- __fit__(X, y, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, show_accuracy=False, callbacks=[]): Train a model for a fixed number of epochs.
- __Return__: a history dictionary with a record of training loss values at successive epochs, as well as validation loss values (if applicable), accuracy (if applicable), etc.
- __Arguments__:
- __X__: data.
- __y__: labels.
- __batch_size__: int. Number of samples per gradient update.
- __nb_epoch__: int.
- __verbose__: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
- __validation_split__: float (0. < x < 1). Fraction of the data to use as held-out validation data.
- __validation_data__: tuple (X, y) to be used as held-out validation data. Will override validation_split.
- __shuffle__: boolean. Whether to shuffle the samples at each epoch.
- __show_accuracy__: boolean. Whether to display class accuracy in the logs to stdout at each epoch.
- __callbacks__: `keras.callbacks.Callback` list. List of callbacks to apply during training. See [callbacks](callbacks.md).
- __evaluate__(X, y, batch_size=128, show_accuracy=False, verbose=1): Show performance of the model over some validation data.
- __Return__: The loss score over the data.
- __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
- __predict__(X, batch_size=128, verbose=1):
- __Return__: An array of predictions for some test data.
- __Arguments__: Same meaning as fit method above.
- __predict_classes__(X, batch_size=128, verbose=1): Return an array of class predictions for some test data.
- __Return__: An array of labels for some test data.
- __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
- __train__(X, y, accuracy=False): Single gradient update on one batch. if accuracy==False, return tuple (loss_on_batch, accuracy_on_batch). Else, return loss_on_batch.
- __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
- __test__(X, y, accuracy=False): Single performance evaluation on one batch. if accuracy==False, return tuple (loss_on_batch, accuracy_on_batch). Else, return loss_on_batch.
- __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
- __save_weights__(fname, overwrite=False): Store the weights of all layers to a HDF5 file. If overwrite==False and the file already exists, an exception will be thrown.
- __load_weights__(fname): Sets the weights of a model, based to weights stored by __save__weights__. You can only __load__weights__ on a savefile from a model with an identical architecture. __load_weights__ can be called either before or after the __compile__ step.
You can either pass the name of an existing objective, or pass a Theano symbolic function that returns a scalar and takes the following two arguments:
- __y_true__: True labels. Theano tensor.
- __y_pred__: Predictions. Theano tensor of the same shape as y_true.
For a few examples of such functions, check out the [objectives source](https://github.com/fchollet/keras/blob/master/keras/objectives.py).
## Available objectives
- __mean_squared_error__ / __mse__
- __mean_absolute_error__ / __mae__
- __squared_hinge__
- __hinge__
- __binary_crossentropy__: Also known as logloss.
- __categorical_crossentropy__: Also known as multiclass logloss. __Note__: using this objective requires that your labels are binary arrays of shape `(nb_samples, nb_classes)`.
You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
```python
# pass optimizer by name: default parameters will be used
Adam optimizer, proposed by Kingma and Lei Ba in [Adam: A Method For Stochastic Optimization](http://arxiv.org/pdf/1412.6980v4.pdf). Default parameters are those suggested in the paper. The parameter "lambda" from the paper has been renamed kappa, for syntactic reasons.
__Arguments__:
- __lr__: float >= 0. Learning rate.
- __beta_1__, __beta_2__: floats, 0 < beta < 1. Generally close to 1.
- __epsilon__: float >= 0. Fuzz factor.
- __kappa__: float 0 < kappa < 1. Lambda parameter in the original paper.
- __fit(X)__: Required if featurewise_center or featurewise_std_normalization or zca_whitening. Compute necessary quantities on some sample data.
- __Arguments__:
- __X__: sample data.
- __augment__: Boolean (default: False). Whether to fit on randomly augmented samples.
- __rounds__: int (default: 1). If augment, how many augmentation passes over the data to use.
- __flow(X, y)__:
- __Arguments__:
- __X__: data.
- __y__: labels.
- __batch_size__: int (default: 32).
- __shuffle__: boolean (defaut: False).
- __save_to_dir__: None or str. This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str. Prefix to use for filenames of saved pictures.
- __filters__: list (or concatenation) of characters to filter out, such as punctuation. Default: base_filter(), includes basic punctuation, tabs, and newlines.
- __lower__: boolean. Whether to set the text to lowercase.
- __split__: str. Separator for word splitting.
## one_hot
```python
keras.preprocessing.text.one_hot(text,n,
filters=base_filter(),lower=True,split="")
```
One-hot encode a text into a list of word indexes in a vocabulary of size n.
- __Return__: List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).
- __Arguments__: Same as `text_to_word_sequence` above.
Class for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
- __Arguments__: Same as `text_to_word_sequence` above.
- __nb_words__: None or int. Maximum number of words to work with (if set, tokenization will be restricted to the top nb_words most common words in the dataset).
- __Methods__:
- __fit_on_texts(texts)__:
- __Arguments__:
- __texts__: list of texts to train on.
- __texts_to_sequences(texts)__
- __Arguments__:
- __texts__: list of texts to turn to sequences.
- __Return__: list of sequences (one per text input).
- __texts_to_sequences_generator(texts)__: generator version of the above.
- __Return__: yield one sequence per input text.
- __texts_to_matrix(texts)__:
- __Return__: numpy array of shape `(len(texts), nb_words)`.
- __Arguments__:
- __texts__: list of texts to vectorize.
- __mode__: one of "binary", "count", "tfidf", "freq" (default: "binary").
- __fit_on_sequences(sequences)__:
- __Arguments__:
- __sequences__: list of sequences to train on.
- __sequences_to_matrix(sequences)__:
- __Return__: numpy array of shape `(len(sequences), nb_words)`.
- __Arguments__:
- __sequences__: list of sequences to vectorize.
- __mode__: one of "binary", "count", "tfidf", "freq" (default: "binary").
- __Attributes__:
- __word_counts__: dictionary mapping words (str) to the number of times they appeared on during fit. Only set after fit_on_texts was called.
- __word_docs__: dictionary mapping words (str) to the number of documents/texts they appeared on during fit. Only set after fit_on_texts was called.
- __word_index__: dictionary mapping words (str) to their rank/index (int). Only set after fit_on_texts was called.
- __document_count__: int. Number of documents (texts/sequences) the tokenizer was trained on. Only set after fit_on_texts or fit_on_sequences was called.
Activations can either be used through an `Activation` layer, or through the `activation` argument supported by all forward layers:
```python
fromkeras.layersimportActivation,Dense
model.add(Dense(64))
model.add(Activation('tanh'))
```
This is equivalent to:
```python
model.add(Dense(64,activation='tanh'))
```
You can also pass an element-wise TensorFlow/Theano/CNTK function as an activation:
```python
fromkerasimportbackendasK
model.add(Dense(64,activation=K.tanh))
model.add(Activation(K.tanh))
```
## Available activations
{{autogenerated}}
## On "Advanced Activations"
Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as [Advanced Activation layers](layers/advanced-activations.md), and can be found in the module `keras.layers.advanced_activations`. These include `PReLU` and `LeakyReLU`.
Keras Applications are deep learning models that are made available alongside pre-trained weights.
These models can be used for prediction, feature extraction, and fine-tuning.
Weights are downloaded automatically when instantiating a model. They are stored at `~/.keras/models/`.
## Available models
### Models for image classification with weights trained on ImageNet:
- [Xception](#xception)
- [VGG16](#vgg16)
- [VGG19](#vgg19)
- [ResNet50](#resnet50)
- [InceptionV3](#inceptionv3)
All of these architectures (except Xception) are compatible with both TensorFlow and Theano, and upon instantiation the models will be built according to the image data format set in your Keras configuration file at `~/.keras/keras.json`. For instance, if you have set `image_data_format=channels_last`, then any model loaded from this repository will get built according to the TensorFlow data format convention, "Width-Height-Depth".
The Xception model is only available for TensorFlow, due to its reliance on `SeparableConvolution` layers.
VGG16 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "channels_first" data format (channels, height, width) or "channels_last" data format (height, width, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
- input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 48.
E.g. `(200, 200, 3)` would be one valid value.
- pooling: Optional pooling mode for feature extraction
when `include_top` is `False`.
-`None` means that the output of the model will be
the 4D tensor output of the
last convolutional layer.
-`avg` means that global average pooling
will be applied to the output of the
last convolutional layer, and thus
the output of the model will be a 2D tensor.
-`max` means that global max pooling will
be applied.
- classes: optional number of classes to classify images
into, only to be specified if `include_top` is True, and
if no `weights` argument is specified.
### Returns
A Keras model instance.
### References
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556): please cite this paper if you use the VGG models in your work.
### License
These weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/).
VGG19 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "channels_first" data format (channels, height, width) or "channels_last" data format (height, width, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
- input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 48.
E.g. `(200, 200, 3)` would be one valid value.
- pooling: Optional pooling mode for feature extraction
when `include_top` is `False`.
-`None` means that the output of the model will be
the 4D tensor output of the
last convolutional layer.
-`avg` means that global average pooling
will be applied to the output of the
last convolutional layer, and thus
the output of the model will be a 2D tensor.
-`max` means that global max pooling will
be applied.
- classes: optional number of classes to classify images
into, only to be specified if `include_top` is True, and
if no `weights` argument is specified.
### Returns
A Keras model instance.
### References
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
### License
These weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/).
ResNet50 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "channels_first" data format (channels, height, width) or "channels_last" data format (height, width, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the fully-connected layer at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
- input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.
- pooling: Optional pooling mode for feature extraction
when `include_top` is `False`.
-`None` means that the output of the model will be
the 4D tensor output of the
last convolutional layer.
-`avg` means that global average pooling
will be applied to the output of the
last convolutional layer, and thus
the output of the model will be a 2D tensor.
-`max` means that global max pooling will
be applied.
- classes: optional number of classes to classify images
into, only to be specified if `include_top` is True, and
if no `weights` argument is specified.
### Returns
A Keras model instance.
### References
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
### License
These weights are ported from the ones [released by Kaiming He](https://github.com/KaimingHe/deep-residual-networks) under the [MIT license](https://github.com/KaimingHe/deep-residual-networks/blob/master/LICENSE).
Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.
At this time, Keras has three backend implementations available: the **TensorFlow** backend, the **Theano** backend, and the **CNTK** backend.
- [TensorFlow](http://www.tensorflow.org/) is an open-source symbolic tensor manipulation framework developed by Google, Inc.
- [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA/MILA Lab at Université de Montréal.
- [CNTK](https://www.microsoft.com/en-us/cognitive-toolkit/) is an open-source, commercial-grade toolkit for deep learning developed by Microsoft.
In the future, we are likely to add more backend options.
----
## Switching from one backend to another
If you have run Keras at least once, you will find the Keras configuration file at:
`$HOME/.keras/keras.json`
If it isn't there, you can create it.
**NOTE for Windows Users:** Please change `$HOME` with `%USERPROFILE%`.
The default configuration file looks like this:
```
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
```
Simply change the field `backend` to `"theano"`, `"tensorflow"`, or `"cntk"`, and Keras will use the new configuration next time you run any Keras code.
You can also define the environment variable ``KERAS_BACKEND`` and this will
override what is defined in your config file :
```bash
KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
```
----
## keras.json details
```
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
```
You can change these settings by editing `$HOME/.keras/keras.json`.
* `image_data_format`: string, either `"channels_last"` or `"channels_first"`. It specifies which data format convention Keras will follow. (`keras.backend.image_data_format()` returns it.)
- For 2D data (e.g. image), `"channels_last"` assumes `(rows, cols, channels)` while `"channels_first"` assumes `(channels, rows, cols)`.
- For 3D data, `"channels_last"` assumes `(conv_dim1, conv_dim2, conv_dim3, channels)` while `"channels_first"` assumes `(channels, conv_dim1, conv_dim2, conv_dim3)`.
* `epsilon`: float, a numeric fuzzing constant used to avoid dividing by zero in some operations.
* `floatx`: string, `"float16"`, `"float32"`, or `"float64"`. Default float precision.
* `backend`: string, `"tensorflow"`, `"theano"`, or `"cntk"`.
----
## Using the abstract Keras backend to write new code
If you want the Keras modules you write to be compatible with both Theano (`th`) and TensorFlow (`tf`), you have to write them via the abstract Keras backend API. Here's an intro.
You can import the backend module via:
```python
from keras import backend as K
```
The code below instantiates an input placeholder. It's equivalent to `tf.placeholder()` or `th.tensor.matrix()`, `th.tensor.tensor3()`, etc.
```python
input = K.placeholder(shape=(2, 4, 5))
# also works:
input = K.placeholder(shape=(None, 4, 5))
# also works:
input = K.placeholder(ndim=3)
```
The code below instantiates a shared variable. It's equivalent to `tf.Variable()` or `th.shared()`.
```python
import numpy as np
val = np.random.random((3, 4, 5))
var = K.variable(value=val)
# all-zeros variable:
var = K.zeros(shape=(3, 4, 5))
# all-ones:
var = K.ones(shape=(3, 4, 5))
```
Most tensor operations you will need can be done as you would in TensorFlow or Theano:
```python
# Initializing Tensors with Random Numbers
b = K.random_uniform_variable(shape=(3, 4)). # Uniform distribution
c = K.random_normal_variable(shape=(3, 4)). # Gaussian distribution
A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument `callbacks`) to the `.fit()` method of the `Sequential` or `Model` classes. The relevant methods of the callbacks will then be called at each stage of the training.
---
{{autogenerated}}
---
# Create a callback
You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.
Here's a simple example saving a list of losses over each batch during training:
Functions from the `constraints` module allow setting constraints (eg. non-negativity) on network parameters during optimization.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers `Dense`, `Conv1D`, `Conv2D` and `Conv3D` have a unified API.
- __x_train, x_test__: uint8 array of RGB image data with shape (num_samples, 3, 32, 32).
- __y_train, y_test__: uint8 array of category labels with shape (num_samples,).
- __Arguments:__
- __label_mode__: "fine" or "coarse".
---
## IMDB Movie reviews sentiment classification
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a [sequence](preprocessing/sequence.md) of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".
As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word.
- __x_train, x_test__: list of sequences, which are lists of indexes (integers). If the num_words argument was specific, the maximum possible index value is num_words-1. If the maxlen argument was specified, the largest possible sequence length is maxlen.
- __y_train, y_test__: list of integer labels (1 or 0).
- __Arguments:__
- __path__: if you do not have the data locally (at `'~/.keras/datasets/' + path`), it will be downloaded to this location.
- __num_words__: integer or None. Top most frequent words to consider. Any less frequent word will appear as `oov_char` value in the sequence data.
- __skip_top__: integer. Top most frequent words to ignore (they will appear as `oov_char` value in the sequence data).
- __maxlen__: int. Maximum sequence length. Any longer sequence will be truncated.
- __seed__: int. Seed for reproducible data shuffling.
- __start_char__: int. The start of a sequence will be marked with this character.
Set to 1 because 0 is usually the padding character.
- __oov_char__: int. words that were cut out because of the `num_words`
or `skip_top` limit will be replaced with this character.
- __index_from__: int. Index actual words with this index and higher.
---
## Reuters newswire topics classification
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with the IMDB dataset, each wire is encoded as a sequence of word indexes (same conventions).
- [How should I cite Keras?](#how-should-i-cite-keras)
- [How can I run Keras on GPU?](#how-can-i-run-keras-on-gpu)
- [What does "sample", "batch", "epoch" mean?](#what-does-sample-batch-epoch-mean)
- [How can I save a Keras model?](#how-can-i-save-a-keras-model)
- [Why is the training loss much higher than the testing loss?](#why-is-the-training-loss-much-higher-than-the-testing-loss)
- [How can I obtain the output of an intermediate layer?](#how-can-i-obtain-the-output-of-an-intermediate-layer)
- [How can I use Keras with datasets that don't fit in memory?](#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory)
- [How can I interrupt training when the validation loss isn't decreasing anymore?](#how-can-i-interrupt-training-when-the-validation-loss-isnt-decreasing-anymore)
- [How is the validation split computed?](#how-is-the-validation-split-computed)
- [Is the data shuffled during training?](#is-the-data-shuffled-during-training)
- [How can I record the training / validation loss / accuracy at each epoch?](#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch)
- [How can I "freeze" layers?](#how-can-i-freeze-keras-layers)
- [How can I use stateful RNNs?](#how-can-i-use-stateful-rnns)
- [How can I remove a layer from a Sequential model?](#how-can-i-remove-a-layer-from-a-sequential-model)
- [How can I use pre-trained models in Keras?](#how-can-i-use-pre-trained-models-in-keras)
- [How can I use HDF5 inputs with Keras?](#how-can-i-use-hdf5-inputs-with-keras)
- [Where is the Keras configuration file stored?](#where-is-the-keras-configuration-file-stored)
---
### How should I cite Keras?
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
The name 'gpu' might have to be changed depending on your device's identifier (e.g. `gpu0`, `gpu1`, etc).
Method 2: set up your `.theanorc`: [Instructions](http://deeplearning.net/software/theano/library/config.html)
Method 3: manually set `theano.config.device`, `theano.config.floatX` at the beginning of your code:
```python
importtheano
theano.config.device='gpu'
theano.config.floatX='float32'
```
---
### What does "sample", "batch", "epoch" mean?
Below are some common definitions that are necessary to know and understand to correctly utilize Keras:
- **Sample**: one element of a dataset.
- *Example:* one image is a **sample** in a convolutional network
- *Example:* one audio file is a **sample** for a speech recognition model
- **Batch**: a set of *N* samples. The samples in a **batch** are processed independently, in parallel. If training, a batch results in only one update to the model.
- A **batch** generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to processes and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluating/prediction).
- **Epoch**: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.
- When using `evaluation_data` or `evaluation_split` with the `fit` method of Keras models, evaluation will be run at the end of every **epoch**.
- Within Keras, there is the ability to add [callbacks](https://keras.io/callbacks/) specifically designed to be run at the end of an **epoch**. Examples of these are learning rate changes and model checkpointing (saving).
---
### How can I save a Keras model?
*It is not recommended to use pickle or cPickle to save a Keras model.*
You can use `model.save(filepath)` to save a Keras model into a single HDF5 file which will contain:
- the architecture of the model, allowing to re-create the model
- the weights of the model
- the training configuration (loss, optimizer)
- the state of the optimizer, allowing to resume training exactly where you left off.
You can then use `keras.models.load_model(filepath)` to reinstantiate your model.
`load_model` will also take care of compiling the model using the saved training configuration
(unless the model was never compiled in the first place).
Example:
```python
fromkeras.modelsimportload_model
model.save('my_model.h5')# creates a HDF5 file 'my_model.h5'
delmodel# deletes the existing model
# returns a compiled model
# identical to the previous one
model=load_model('my_model.h5')
```
If you only need to save the **architecture of a model**, and not its weights or its training configuration, you can do:
```python
# save as JSON
json_string=model.to_json()
# save as YAML
yaml_string=model.to_yaml()
```
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
```python
# model reconstruction from JSON:
fromkeras.modelsimportmodel_from_json
model=model_from_json(json_string)
# model reconstruction from YAML
fromkeras.modelsimportmodel_from_yaml
model=model_from_yaml(yaml_string)
```
If you need to save the **weights of a model**, you can do so in HDF5 with the code below.
Note that you will first need to install HDF5 and the Python library h5py, which do not come bundled with Keras.
```python
model.save_weights('my_model_weights.h5')
```
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the *same* architecture:
```python
model.load_weights('my_model_weights.h5')
```
If you need to load weights into a *different* architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by *layer name*:
model.add(Dense(2,input_dim=3,name='dense_1'))# will be loaded
model.add(Dense(10,name='new_dense'))# will not be loaded
# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname,by_name=True)
```
---
### Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
---
### How can I obtain the output of an intermediate layer?
One simple way is to create a new `Model` that will output the layers that you are interested in:
### How can I use Keras with datasets that don't fit in memory?
You can do batch training using `model.train_on_batch(x, y)` and `model.test_on_batch(x, y)`. See the [models documentation](/models/sequential).
Alternatively, you can write a generator that yields batches of training data and use the method `model.fit_generator(data_generator, steps_per_epoch, epochs)`.
You can see batch training in action in our [CIFAR10 example](https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py).
---
### How can I interrupt training when the validation loss isn't decreasing anymore?
Find out more in the [callbacks documentation](/callbacks).
---
### How is the validation split computed?
If you set the `validation_split` argument in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the *last* x% of samples in the input you passed.
The same validation set is used for all epochs (within a same call to `fit`).
---
### Is the data shuffled during training?
Yes, if the `shuffle` argument in `model.fit` is set to `True` (which is the default), the training data will be randomly shuffled at each epoch.
Validation data is never shuffled.
---
### How can I record the training / validation loss / accuracy at each epoch?
The `model.fit` method returns an `History` callback, which has a `history` attribute containing the lists of successive losses and other metrics.
```python
hist=model.fit(x,y,validation_split=0.2)
print(hist.history)
```
---
### How can I "freeze" Keras layers?
To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
You can pass a `trainable` argument (boolean) to a layer constructor to set a layer to be non-trainable:
```python
frozen_layer=Dense(32,trainable=False)
```
Additionally, you can set the `trainable` property of a layer to `True` or `False` after instantiation. For this to take effect, you will need to call `compile()` on your model after modifying the `trainable` property. Here's an example:
```python
x=Input(shape=(32,))
layer=Dense(32)
layer.trainable=False
y=layer(x)
frozen_model=Model(x,y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.fit(data,labels)# this does NOT update the weights of `layer`
trainable_model.fit(data,labels)# this updates the weights of `layer`
```
---
### How can I use stateful RNNs?
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If `x1` and `x2` are successive batches of samples, then `x2[i]` is the follow-up sequence to `x1[i]`, for every `i`.
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a `batch_size` argument to the first layer in your model. E.g. `batch_size=32` for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set `stateful=True` in your RNN layer(s).
- specify `shuffle=False` when calling fit().
To reset the states accumulated:
- use `model.reset_states()` to reset the states of all layers in the model
- use `layer.reset_states()` to reset the states of a specific stateful RNN layer
Example:
```python
x# this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10
Notes that the methods `predict`, `fit`, `train_on_batch`, `predict_classes`, etc. will *all* update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
---
### How can I remove a layer from a Sequential model?
You can remove the last added layer in a Sequential model by calling `.pop()`:
For a few simple usage examples, see [the documentation for the Applications module](/applications).
For a detailed example of how to use such a pre-trained model for feature extraction or for fine-tuning, see [this blog post](http://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).
The VGG16 model is also the basis for several Keras example scripts:
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
This guide assumes that you are already familiar with the `Sequential` model.
Let's start with something simple.
-----
## First example: a densely-connected network
The `Sequential` model is probably a better choice to implement such a network, but it helps to start with something really simple.
- A layer instance is callable (on a tensor), and it returns a tensor
- Input tensor(s) and output tensor(s) can then be used to define a `Model`
- Such a model can be trained just like Keras `Sequential` models.
```python
fromkeras.layersimportInput,Dense
fromkeras.modelsimportModel
# This returns a tensor
inputs=Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x=Dense(64,activation='relu')(inputs)
x=Dense(64,activation='relu')(x)
predictions=Dense(10,activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
model=Model(inputs=inputs,outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data,labels)# starts training
```
-----
## All models are callable, just like layers
With the functional API, it is easy to re-use trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just re-using the *architecture* of the model, you are also re-using its weights.
```python
x=Input(shape=(784,))
# This works, and returns the 10-way softmax we defined above.
y=model(x)
```
This can allow, for instance, to quickly create models that can process *sequences* of inputs. You could turn an image classification model into a video classification model, in just one line.
```python
fromkeras.layersimportTimeDistributed
# Input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences=Input(shape=(20,784))
# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
Here's a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams.
Let's consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.
# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out=LSTM(32)(x)
```
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model.
Another good use for the functional API are models that use shared layers. Let's take a look at shared layers.
Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from the same person or not (this can allow us to compare users by the similarity of their tweets, for instance).
One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the vectors and adds a logistic regression of top, outputting a probability that the two tweets share the same author. The model would then be trained on positive tweet pairs and negative tweet pairs.
Because the problem is symmetric, the mechanism that encodes the first tweet should be reused (weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.
Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape `(140, 256)`, i.e. a sequence of 140 vectors of size 256, where each dimension in the 256-dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent characters).
```python
importkeras
fromkeras.layersimportInput,LSTM,Dense
fromkeras.modelsimportModel
tweet_a=Input(shape=(140,256))
tweet_b=Input(shape=(140,256))
```
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs as you want:
Let's pause to take a look at how to read the shared layer's output or output shape.
-----
## The concept of layer "node"
Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer), and you are adding a "node" to the layer, linking the input tensor to the output tensor. When you are calling the same layer multiple times, that layer owns multiple nodes indexed as 0, 1, 2...
In previous versions of Keras, you could obtain the output tensor of a layer instance via `layer.get_output()`, or its output shape via `layer.output_shape`. You still can (except `get_output()` has been replaced by the property `output`). But what if a layer is connected to multiple inputs?
As long as a layer is only connected to one input, there is no confusion, and `.output` will return the one output of the layer:
```python
a=Input(shape=(140,256))
lstm=LSTM(32)
encoded_a=lstm(a)
assertlstm.output==encoded_a
```
Not so if the layer has multiple inputs:
```python
a=Input(shape=(140,256))
b=Input(shape=(140,256))
lstm=LSTM(32)
encoded_a=lstm(a)
encoded_b=lstm(b)
lstm.output
```
```
>> AssertionError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.
```
Okay then. The following works:
```python
assertlstm.get_output_at(0)==encoded_a
assertlstm.get_output_at(1)==encoded_b
```
Simple enough, right?
The same is true for the properties `input_shape` and `output_shape`: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of "layer output/input shape" is well defined, and that one shape will be returned by `layer.output_shape`/`layer.input_shape`. But if, for instance, you apply a same `Conv2D` layer to an input of shape `(3, 32, 32)`, and then to an input of shape `(3, 64, 64)`, the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to:
```python
a=Input(shape=(3,32,32))
b=Input(shape=(3,64,64))
conv=Conv2D(16,(3,3),padding='same')
conved_a=conv(a)
# Only one input so far, the following will work:
assertconv.input_shape==(None,3,32,32)
conved_b=conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assertconv.get_input_shape_at(0)==(None,3,32,32)
assertconv.get_input_shape_at(1)==(None,3,64,64)
```
-----
## More examples
Code examples are still the best way to get started, so here are a few more.
### Inception module
For more information about the Inception architecture, see [Going Deeper with Convolutions](http://arxiv.org/abs/1409.4842).
This model can select the correct one-word answer when asked a natural-language question about a picture.
It works by encoding the question into a vector, encoding the image into a vector, concatenating the two, and training on top a logistic regression over some vocabulary of potential answers.
```python
fromkeras.layersimportConv2D,MaxPooling2D,Flatten
fromkeras.layersimportInput,LSTM,Embedding,Dense
fromkeras.modelsimportModel,Sequential
# First, let's define a vision model using a Sequential model.
# The next stage would be training this model on actual data.
```
### Video question answering model
Now that we have trained our image QA model, we can quickly turn it into a video QA model. With appropriate training, you will be able to show it a short video (e.g. 100-frame human action) and ask a natural language question about the video (e.g. "what sport is the boy playing?" -> "football").
```python
fromkeras.layersimportTimeDistributed
video_input=Input(shape=(100,3,224,224))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence=TimeDistributed(vision_model)(video_input)# the output will be a sequence of vectors
encoded_video=LSTM(256)(encoded_frame_sequence)# the output will be a vector
# This is a model-level representation of the question encoder, reusing the same weights as before:
The `Sequential` model is a linear stack of layers.
You can create a `Sequential` model by passing a list of layer instances to the constructor:
```python
fromkeras.modelsimportSequential
fromkeras.layersimportDense,Activation
model=Sequential([
Dense(32,input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
```
You can also simply add layers via the `.add()` method:
```python
model=Sequential()
model.add(Dense(32,input_dim=784))
model.add(Activation('relu'))
```
----
## Specifying the input shape
The model needs to know what input shape it should expect. For this reason, the first layer in a `Sequential` model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:
- Pass an `input_shape` argument to the first layer. This is a shape tuple (a tuple of integers or `None` entries, where `None` indicates that any positive integer may be expected). In `input_shape`, the batch dimension is not included.
- Some 2D layers, such as `Dense`, support the specification of their input shape via the argument `input_dim`, and some 3D temporal layers support the arguments `input_dim` and `input_length`.
- If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a `batch_size` argument to a layer. If you pass both `batch_size=32` and `input_shape=(6, 8)` to a layer, it will then expect every batch of inputs to have the batch shape `(32, 6, 8)`.
As such, the following snippets are strictly equivalent:
```python
model=Sequential()
model.add(Dense(32,input_shape=(784,)))
```
```python
model=Sequential()
model.add(Dense(32,input_dim=784))
```
----
## Compilation
Before training a model, you need to configure the learning process, which is done via the `compile` method. It receives three arguments:
- An optimizer. This could be the string identifier of an existing optimizer (such as `rmsprop` or `adagrad`), or an instance of the `Optimizer` class. See: [optimizers](/optimizers).
- A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as `categorical_crossentropy` or `mse`), or it can be an objective function. See: [losses](/losses).
- A list of metrics. For any classification problem you will want to set this to `metrics=['accuracy']`. A metric could be the string identifier of an existing metric or a custom metric function.
```python
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# For a binary classification problem
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
loss='mse')
# For custom metrics
importkeras.backendasK
defmean_pred(y_true,y_pred):
returnK.mean(y_pred)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy',mean_pred])
```
----
## Training
Keras models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the `fit` function. [Read its documentation here](/models/sequential).
```python
# For a single-input model with 2 classes (binary classification):
Initializations define the way to set the initial random weights of Keras layers.
The keyword arguments used for passing initializers to layers will depend on the layer. Usually it is simply `kernel_initializer` and `bias_initializer`:
```python
model.add(Dense(64,
kernel_initializer='random_uniform',
bias_initializer='zeros'))
```
## Available initializers
The following built-in initializers are available as part of the `keras.initializers` module:
{{autogenerated}}
An initializer may be passed as a string (must match one of the available initializers above), or as a callable:
If a layer has a single node (i.e. if it isn't a shared layer), you can get its input tensor, output tensor, input shape and output shape via:
-`layer.input`
-`layer.output`
-`layer.input_shape`
-`layer.output_shape`
If the layer has multiple nodes (see: [the concept of layer node and shared layers](/getting-started/functional-api-guide/#the-concept-of-layer-node)), you can use the following methods:
For simple, stateless custom operations, you are probably better off using `layers.core.Lambda` layers. But for any custom operation that has trainable weights, you should implement your own layer.
Here is the skeleton of a Keras layer, **as of Keras 2.0** (if you have an older version, please upgrade). There are only three methods you need to implement:
-`build(input_shape)`: this is where you will define your weights. This method must set `self.built = True`, which can be done by calling `super([Layer], self).build()`.
-`call(x)`: this is where the layer's logic lives. Unless you want your layer to support masking, you only have to care about the first argument passed to `call`: the input tensor.
-`compute_output_shape(input_shape)`: in case your layer modifies the shape of its input, you should specify here the shape transformation logic. This allows Keras to do automatic shape inference.
```python
fromkerasimportbackendasK
fromkeras.engine.topologyimportLayer
importnumpyasnp
classMyLayer(Layer):
def__init__(self,output_dim,**kwargs):
self.output_dim=output_dim
super(MyLayer,self).__init__(**kwargs)
defbuild(self,input_shape):
# Create a trainable weight variable for this layer.
self.kernel=self.add_weight(name='kernel',
shape=(input_shape[1],self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer,self).build(input_shape)# Be sure to call this somewhere!
defcall(self,x):
returnK.dot(x,self.kernel)
defcompute_output_shape(self,input_shape):
return(input_shape[0],self.output_dim)
```
The existing Keras layers provide examples of how to implement almost anything. Never hesitate to read the source code!
You can either pass the name of an existing loss function, or pass a TensorFlow/Theano symbolic function that returns a scalar for each data-point and takes the following two arguments:
- __y_pred__: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
The actual optimized objective is the mean of the output array across all datapoints.
For a few examples of such functions, check out the [losses source](https://github.com/fchollet/keras/blob/master/keras/losses.py).
## Available loss functions
{{autogenerated}}
----
**Note**: when using the `categorical_crossentropy` loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample). In order to convert *integer targets* into *categorical targets*, you can use the Keras utility `to_categorical`:
A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the `metrics` parameter when a model is compiled.
There are two types of models available in Keras: [the Sequential model](/models/sequential) and [the Model class used with functional API](/models/model).
These models have a number of methods in common:
-`model.summary()`: prints a summary representation of your model.
-`model.get_config()`: returns a dictionary containing the configuration of the model. The model can be reinstantiated from its config via:
```python
config=model.get_config()
model=Model.from_config(config)
# or, for Sequential:
model=Sequential.from_config(config)
```
-`model.get_weights()`: returns a list of all weight tensors in the model, as Numpy arrays.
-`model.set_weights(weights)`: sets the values of the weights of the model, from a list of Numpy arrays. The arrays in the list should have the same shape as those returned by `get_weights()`.
-`model.to_json()`: returns a representation of the model as a JSON string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the JSON string via:
```python
frommodelsimportmodel_from_json
json_string=model.to_json()
model=model_from_json(json_string)
```
-`model.to_yaml()`: returns a representation of the model as a YAML string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the YAML string via:
```python
frommodelsimportmodel_from_yaml
yaml_string=model.to_yaml()
model=model_from_yaml(yaml_string)
```
-`model.save_weights(filepath)`: saves the weights of the model as a HDF5 file.
- `model.load_weights(filepath, by_name=False)`: loads the weights of the model from a HDF5 file (created by `save_weights`). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use `by_name=True` to load only those layers with the same name.
You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
```python
# pass optimizer by name: default parameters will be used
- __rotation_range__: Int. Degree range for random rotations.
- __width_shift_range__: Float (fraction of total width). Range for random horizontal shifts.
- __height_shift_range__: Float (fraction of total height). Range for random vertical shifts.
- __shear_range__: Float. Shear Intensity (Shear angle in counter-clockwise direction as radians)
- __zoom_range__: Float or [lower, upper]. Range for random zoom. If a float, `[lower, upper] = [1-zoom_range, 1+zoom_range]`.
- __channel_shift_range__: Float. Range for random channel shifts.
- __fill_mode__: One of {"constant", "nearest", "reflect" or "wrap"}. Points outside the boundaries of the input are filled according to the given mode.
- __cval__: Float or Int. Value used for points outside the boundaries when `fill_mode = "constant"`.
- __rescale__: rescaling factor. Defaults to None. If None or 0, no rescaling is applied,
otherwise we multiply the data by the value provided (before applying
any other transformation).
- __preprocessing_function__: function that will be implied on each input.
The function will run before any other modification on it.
The function should take one argument:
one image (Numpy tensor with rank 3),
and should output a Numpy tensor with the same shape.
- _data_format_: One of {"channels_first", "channels_last"}.
"channels_last" mode means that the images should have shape `(samples, height, width, channels)`,
"channels_first" mode means that the images should have shape `(samples, channels, height, width)`.
It defaults to the `image_data_format` value found in your
Keras config file at `~/.keras/keras.json`.
If you never set it, then it will be "channels_last".
- __Methods__:
- __fit(x)__: Compute the internal data stats related to the data-dependent transformations, based on an array of sample data.
Only required if featurewise_center or featurewise_std_normalization or zca_whitening.
- __Arguments__:
- __x__: sample data. Should have rank 4.
In case of grayscale data,
the channels axis should have value 1, and in case
of RGB data, it should have value 3.
- __augment__: Boolean (default: False). Whether to fit on randomly augmented samples.
- __rounds__: int (default: 1). If augment, how many augmentation passes over the data to use.
- __seed__: int (default: None). Random seed.
- __flow(x, y)__: Takes numpy data & label arrays, and generates batches of augmented/normalized data. Yields batches indefinitely, in an infinite loop.
- __Arguments__:
- __x__: data. Should have rank 4.
In case of grayscale data,
the channels axis should have value 1, and in case
of RGB data, it should have value 3.
- __y__: labels.
- __batch_size__: int (default: 32).
- __shuffle__: boolean (defaut: True).
- __seed__: int (default: None).
- __save_to_dir__: None or str (default: None). This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str (default: `''`). Prefix to use for filenames of saved pictures (only relevant if `save_to_dir` is set).
- __save_format__: one of "png", "jpeg" (only relevant if `save_to_dir` is set). Default: "png".
- __yields__: Tuples of `(x, y)` where `x` is a numpy array of image data and `y` is a numpy array of corresponding labels.
The generator loops indefinitely.
- __flow_from_directory(directory)__: Takes the path to a directory, and generates batches of augmented/normalized data. Yields batches indefinitely, in an infinite loop.
- __Arguments__:
- __directory__: path to the target directory. It should contain one subdirectory per class.
Any PNG, JPG or BMP images inside each of the subdirectories directory tree will be included in the generator.
See [this script](https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d) for more details.
- __target_size__: tuple of integers, default: `(256, 256)`. The dimensions to which all images found will be resized.
- __color_mode__: one of "grayscale", "rbg". Default: "rgb". Whether the images will be converted to have 1 or 3 color channels.
- __classes__: optional list of class subdirectories (e.g. `['dogs', 'cats']`). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under `directory`, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute `class_indices`.
- __class_mode__: one of "categorical", "binary", "sparse" or None. Default: "categorical". Determines the type of label arrays that are returned: "categorical" will be 2D one-hot encoded labels, "binary" will be 1D binary labels, "sparse" will be 1D integer labels. If None, no labels are returned (the generator will only yield batches of image data, which is useful to use `model.predict_generator()`, `model.evaluate_generator()`, etc.). Please note that in case of class_mode None, the data still needs to reside in a subdirectory of `directory` for it to work correctly.
- __batch_size__: size of the batches of data (default: 32).
- __shuffle__: whether to shuffle the data (default: True)
- __seed__: optional random seed for shuffling and transformations.
- __save_to_dir__: None or str (default: None). This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str. Prefix to use for filenames of saved pictures (only relevant if `save_to_dir` is set).
- __save_format__: one of "png", "jpeg" (only relevant if `save_to_dir` is set). Default: "png".
- __follow_links__: whether to follow symlinks inside class subdirectories (default: False).
Transform a list of `nb_samples sequences` (lists of scalars) into a 2D numpy array of shape `(nb_samples, nb_timesteps)`. `nb_timesteps` is either the `maxlen` argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than `nb_timesteps` are padded with zeros at the end.
Transform a list of `num_samples` sequences (lists of scalars) into a 2D Numpy array of shape `(num_samples, num_timesteps)`. `num_timesteps` is either the `maxlen` argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than `num_timesteps` are padded with `value` at the end. Sequences longer than `num_timesteps` are truncated so that it fits the desired length. Position where padding or truncation happens is determined by `padding` or `truncating`, respectively.
- __Return__: 2D numpy array of shape `(nb_samples, nb_timesteps)`.
- __Return__: 2D Numpy array of shape `(num_samples, num_timesteps)`.
- __Arguments__:
- __sequences__: List of lists of int or float.
- __maxlen__: None or int. Maximum sequence length, longer sequences are truncated and shorter sequences are padded with zeros at the end.
- __dtype__: datatype of the numpy array returned.
- __dtype__: datatype of the Numpy array returned.
- __padding__: 'pre' or 'post', pad either before or after each sequence.
- __truncating__: 'pre' or 'post', remove values from sequences larger than maxlen either in the beginning or in the end of the sequence
- __value__: float, value to pad the sequences to the desired value.
- `couples` is a list of 2-elements lists of int: `[word_index, other_word_index]`.
- __Return__: tuple `(couples, labels)`.
- `couples` is a list of 2-elements lists of int: `[word_index, other_word_index]`.
- `labels` is a list of 0 and 1, where 1 indicates that `other_word_index` was found in the same window as `word_index`, and 0 indicates that `other_word_index` was random.
- if categorical is set to True, the labels are categorical, ie. 1 becomes [0,1], and 0 becomes [1, 0].
- __negative_samples__: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
- __shuffle__: boolean. Whether to shuffle the samples.
- __categorical__: boolean. Whether to make the returned labels categorical.
- __sampling_table__: numpy array of shape `(vocabulary_size,)` where `sampling_table[i]` is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).
- __sampling_table__: Numpy array of shape `(vocabulary_size,)` where `sampling_table[i]` is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).
Used for generating the `sampling_table` argument for `skipgrams`. `sampling_table[i]` is the probability of sampling the word i-th most common word in a dataset (more common words should be sampled less frequently, for balance).
Class for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
- __Arguments__: Same as `text_to_word_sequence` above.
- __num_words__: None or int. Maximum number of words to work with (if set, tokenization will be restricted to the top num_words most common words in the dataset).
- __char_level__: if True, every character will be treated as a token.
- __Methods__:
- __fit_on_texts(texts)__:
- __Arguments__:
- __texts__: list of texts to train on.
- __texts_to_sequences(texts)__
- __Arguments__:
- __texts__: list of texts to turn to sequences.
- __Return__: list of sequences (one per text input).
- __texts_to_sequences_generator(texts)__: generator version of the above.
- __Return__: yield one sequence per input text.
- __texts_to_matrix(texts)__:
- __Return__: numpy array of shape `(len(texts), num_words)`.
- __Arguments__:
- __texts__: list of texts to vectorize.
- __mode__: one of "binary", "count", "tfidf", "freq" (default: "binary").
- __fit_on_sequences(sequences)__:
- __Arguments__:
- __sequences__: list of sequences to train on.
- __sequences_to_matrix(sequences)__:
- __Return__: numpy array of shape `(len(sequences), num_words)`.
- __Arguments__:
- __sequences__: list of sequences to vectorize.
- __mode__: one of "binary", "count", "tfidf", "freq" (default: "binary").
- __Attributes__:
- __word_counts__: dictionary mapping words (str) to the number of times they appeared on during fit. Only set after fit_on_texts was called.
- __word_docs__: dictionary mapping words (str) to the number of documents/texts they appeared on during fit. Only set after fit_on_texts was called.
- __word_index__: dictionary mapping words (str) to their rank/index (int). Only set after fit_on_texts was called.
- __document_count__: int. Number of documents (texts/sequences) the tokenizer was trained on. Only set after fit_on_texts or fit_on_sequences was called.
Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers `Dense`, `Conv1D`, `Conv2D` and `Conv3D` have a unified API.
These layers expose 3 keyword arguments:
-`kernel_regularizer`: instance of `keras.regularizers.Regularizer`
-`bias_regularizer`: instance of `keras.regularizers.Regularizer`
-`activity_regularizer`: instance of `keras.regularizers.Regularizer`
## Example
```python
fromkerasimportregularizers
model.add(Dense(64,input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
```
## Available penalties
```python
keras.regularizers.l1(0.)
keras.regularizers.l2(0.)
keras.regularizers.l1_l2(0.)
```
## Developing new regularizers
Any function that takes in a weight matrix and returns a loss contribution tensor can be used as a regularizer, e.g.:
```python
fromkerasimportbackendasK
defl1_reg(weight_matrix):
return0.01*K.sum(K.abs(weight_matrix))
model.add(Dense(64,input_dim=64,
kernel_regularizer=l1_reg)
```
Alternatively, you can write your regularizers in an object-oriented way;
see the [keras/regularizers.py](https://github.com/fchollet/keras/blob/master/keras/regularizers.py) module for examples.
You can use `Sequential` Keras models (single-input only) as part of your Scikit-Learn workflow via the wrappers found at `keras.wrappers.scikit_learn.py`.
There are two wrappers available:
`keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)`, which implements the Scikit-Learn classifier interface,
`keras.wrappers.scikit_learn.KerasRegressor(build_fn=None, **sk_params)`, which implements the Scikit-Learn regressor interface.
### Arguments
- __build_fn__: callable function or class instance
- __sk_params__: model parameters & fitting parameters
`build_fn` should construct, compile and return a Keras model, which
will then be used to fit/predict. One of the following
three values could be passed to build_fn:
1. A function
2. An instance of a class that implements the __call__ method
3. None. This means you implement a class that inherits from either
`KerasClassifier` or `KerasRegressor`. The __call__ method of the
present class will then be treated as the default build_fn.
`sk_params` takes both model parameters and fitting parameters. Legal model
parameters are the arguments of `build_fn`. Note that like all other
estimators in scikit-learn, 'build_fn' should provide default values for
its arguments, so that you could create the estimator without passing any
values to `sk_params`.
`sk_params` could also accept parameters for calling `fit`, `predict`,
`predict_proba`, and `score` methods (e.g., `epochs`, `batch_size`).
fitting (predicting) parameters are selected in the following order:
1. Values passed to the dictionary arguments of
`fit`, `predict`, `predict_proba`, and `score` methods
2. Values passed to `sk_params`
3. The default values of the `keras.models.Sequential`
`fit`, `predict`, `predict_proba` and `score` methods
When using scikit-learn's `grid_search` API, legal tunable parameters are
those you could pass to `sk_params`, including fitting parameters.
In other words, you could use `grid_search` to search for the best
`batch_size` or `epochs` as well as the model parameters.
Trains a Hierarchical RNN (HRNN) to classify MNIST digits.
[mnist_irnn.py](mnist_irnn.py)
Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in "A Simple Way to Initialize Recurrent Networks of Rectified Linear Units" by Le et al.
[mnist_mlp.py](mnist_mlp.py)
Trains a simple deep multi-layer perceptron on the MNIST dataset.
[mnist_net2net.py](mnist_net2net.py)
Reproduction of the Net2Net experiment with MNIST in "Net2Net: Accelerating Learning via Knowledge Transfer".
[mnist_siamese_graph.py](mnist_siamese_graph.py)
Trains a Siamese multi-layer perceptron on pairs of digits from the MNIST dataset.
Loads pre-trained word embeddings (GloVe embeddings) into a frozen Keras Embedding layer, and uses it to train a text classification model on the 20 Newsgroup dataset.
[reuters_mlp.py](reuters_mlp.py)
Trains and evaluate a simple MLP on the Reuters newswire topic classification task.
[stateful_lstm.py](stateful_lstm.py)
Demonstrates how to use stateful RNNs to model long sequences efficiently.
# Importable from root because it's technically not a layer
from.layersimportInput
__version__='2.0.5'
Alguns arquivos não foram exibidos porque demasiados arquivos foram alterados neste diff
Mostrar Mais
Referência em uma Nova Issue
Bloquear um usuário
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.