* Update topology.py
* Update topology.py
* Update topology.py
* white space fix
* indentation fix
* add tests
* fix all tests
* add arguments arg to merge
* space after period
* add test with arguments
* add test with arguments for lambda layer too
* pep8 fixes
* fix tf test
* try fixing tf test; again
* bug fix
* finally
* Added SpatialDropout1D
This is a straightforward modification of SpatialDropout2D but for 1D data.
* Added SpatialDropout1D to docs
* SpatialDropout1D test
* Fixed indent issue
* Combined TF and TH dimension conditions
Use the same 1D dimensions for TensorFlow and Theano in SpatialDropout1D.
* trailing whitespace
* Removed dim_ordering variable
* Removing dim_ordering values
removing dim_ordering values as requested
* fix batch_norm when axis!=1
* fix dimshuffle for all backends
* moving cudnn bn fix to theano backend
* fix pep8
* dont use cudnn when bn axis is non broadcastable, ie dim=1
* Display wrapped layers in graph visualization
* Check parent class instead of class's module
* Check instance instead for brevity
* More consistent naming
* Fixed weights.sort for Python 3
In Python 3 weights.sort could throw a TypeError exception, if the
names are all None
* Fixed _flattened_layers under Python 3
If self.layers is empty, an IndexError appears when accessing it. So
it’s necessary to check if it’s non-empty first
* Fixed weight sorting for Theano backend
* Added missing import statement
* Improved backend handling for weight calculation
* Simplified weight sorting and backend check
* Changed behavior of weights sorting
* Removed unnecessary import
* manually terminate threads process returned by `generator_queue()`
Recently I custum a video sequence DataGenerator (based on ImageDataGenerator) for experiment. When I use model.fit_generator as following:
>history = model.fit_generator(train_data_generator, samples_per_epoch=train_data_generator.nb_sample,
nb_epoch=nb_epoch, verbose=1, callbacks=[early_stopping, model_checkpoint],
validation_data=test_data_generator, nb_val_samples=test_data_generator.nb_sample,
max_q_size=10, nb_worker=8, pickle_safe=True)
I found that the validation process consumes much longer time than training despite it contains less data.
I read the code and changed the `self.evaluate_generator()` (line 1482) in `fit_generator' to use a multiprocessing approach as training process did. However, the memory usage quikly increases and it only last for a few epoches.
Through analysis, I think it is caused by the processes weren't freed after the `evaluate_generator` accomplished. Thus I suggest returning `generator_threads` from function `generator_queue()` and manually terminate these threads in `fit_generator`, `evaluate_generator`, `predict_generator`.
* stastify the PEP style
* correct the PEP8's E128 error
* Switch use of TF cond function to use public function.
Prior to newer TFs, cond was unavailable and thus was being
imported via private module namespaces.
Newer TFs expose tf.cond as the public interface. There
are plans to remove private module namespace access so
this fixes keras to first try accessing through the public
namespace, and then going through the private one for older
versions of TF.
* PEP8 fix
* Make ZeroPadding2D and ZeroPadding1D optionally asymmetric
* Make padding argument polymorphic.
Add test case for asymmetric padding.
Remove excessive imports.
* Fix layer config saving.
* Duck typing (as soon as test passes tuple as a list)
* Doc update
* Set padding value for the missing keys to 0.
Raise exception if unexpected keys are found in the padding dict.
* Add test for ZeroPadding1D
* add categorical accuracy metric which tracks over top-k predictions
* remove top_k_categorical_accuracy from being tested together with other all_metrics
* fix in_top_k to work with batches. correct metrics.py and test_metrics.py appropriately
* style fixes for documentation on in_top_k function
* default to k=5 for top_k_categorical_accuracy metric
* Added optional path argument
* Added optional field name argument
* Added LambdaCallback callback
* Fixed on_epoch_begin assignment
* Match default signatures
* Whitespace
* Test LambdaCallback examples
* Only test process termination
* Imports
* Fixed test
* Wait on process to terminate
* Add zero threshold and set F measure to zero if no true samples exist
* Reduce zero threshold
* Flip thresholded non-zero count
* Add F measure test
* Updated test
* Remove lambda, simplify
* Whitespace
* Update docstring
* Update test
* Whitespace
* bypass shape inference in deconv2d
* * more doc in deconv layer
* more deconv layers in var autoencoder example
* * typo doc
* replicate deconv example with with paper's params
* replicate example with paper's params
* typo doc
* + relus in the deconv
* typo in var autoencodeur example
* + mult by ndim
* style fixes
* pep8
`os.listdir` to `sorted(os.listdir)` for alphabetical order instead of arbitrary order. Following PR#3751 this allows mask and images with the same name to be read together.
* add audio models: audio_convnet and audio_conv_rnn
* add audio models: audio_convnet and audio_conv_rnn
* remove white spaces at the end of lines
* add audio_conv_utils.py, update applications.md
* remove useless line in example in application.md
* remove useless line in example in application.md
* rename models (MusicTaggerCNN,CRNN), BN mode=0 weights
* pep8
* remove MusicTaggerCNN, add include_top argument
* update to follow pep8
* ReduceLROnPlateau Callback and CSVLogger Callback
* Added documentation and cleanup.
* Added examples.
* Added test for ReduceLROnPlateau()
* Minor changes to naming.
* Added epsilon for lr comparison.
* Fix sensitivity issue
* PEP8
Some of the variable names in this guide were misleadingly named. The outputs were named as `*_loss` implying that they held loss values, whereas they in fact held the outputs. It rather confused me; I believe my proposed naming is clearer.
* Added optional path argument
* Added optional field name argument
* Added LambdaCallback callback
* Fixed on_epoch_begin assignment
* Match default signatures
* Whitespace
* Test LambdaCallback examples
* Only test process termination
* Imports
* Fixed test
* Wait on process to terminate
* make ImageDataGenerator behaviour fully seedable/repeatable
This makes ImageDataGenerator fully seedable.
- the seed argument in fit is now used
- the seed argument in flow and flow_from_directory now effects
transforms
- added example to docs of transforming images and masks together
- added test of using two seeded streams at once
* implemented requested changes
- PEP8
- explicit names
- classes=None
- remove test
My reading of regularizers is that they cannot be reused, but it doesn't actually fail in any way and seems like it results in only regularizing the last layer. Having an exception prevent this would probably improve the ergonomics.
* Minimal SparseTensor support for TensorFlow
* Basic Theano support for Sparse dot product
* Sparse Input for Both + Sparse Concat for TF
* Fixed issue with _keras_shape for sparse Inputs
* pep8
* Cleanup + Theano concat (untested)
* Bug fix & pep8
* Fix Theano concat
* Bugfix & simplification
* Next step: Unit tests
* Basic unit test for sparse dot; TF works, TH fails
* Fix KTH is_sparse
* pep8
* more tests, sparse KTH.eval, pep8
* sparse model test
* address code review comments
* make sparse boolean in K.placeholder
* skip sparse tests when TH.sparse import fails
* pep8
* pep8
* fixed flakey test, auto-dense in KTH.eval
* fixed some more len/shape issues for fit_generator
* fixed some more len/shape issues for prediction
* Added better exceptions when theano.sparse fails to import
* betterer
* pep8
* Added stacked what where autoencoder.
SWWAE uses residual blocks. Trains fast. Creates very good reconstructions.
* Added newline at end for PEP8
* Went through PEP8 errors and corrected all (except for the imports which following the numpy seed, but this should be ok). Also, for the pool_size of 2, we halved the number of features maps and the number of epochs, and it still trains a net that can very nicely reconstruct the input.
* Added spaces arround - and + when they are used as binary operators (more PEP8).
* In decoder, the index of the features and pool size and wheres are all equal to nlayers-1-i, so set ind variable to this value and passed it to them.
* With ind variable in decoder, don't need two lines for the upsampling layer.
* Added title to plot, got rid of ticks on plot.
* PEP8 for * binary operator. Corrected some grammar issues in the docstring.
* Add Matthews correlation coefficient to metrics
I needed this for a Kaggle competition and it seemed useful in general so I thought I'd contribute it back.
* Enabled test for matthews metric
* Remove unnecessary cast garbage
* Addresses code review comments
* Renamed to matthews_corrcoef to be consistent with sklearn
* Update test_metrics.py
* pep8
* rename to mathews_correlation
* Update metrics.py
* Fixed typo
* CTC import compatibility with tensorflow 0.10
Try except clause to import ctc_loss in new path on tensorflow 0.10.
* Fixed ctc_decode and added tests for tensorflow.
ctc_decode when using beam search decoder has been fixed to conform with
tensorflow API. Function documentation has been updated to reflect the
changes. Two tests, for greedy and beam search decoding, have also been
added to test_backends.py.
* Fix pep8 styling.
* Fixed styling on long lines on ctc_decode tests.
* Fix Batch Norm compatibility with 3D inputs
the theano backend now uses dnn_batch_normalization which only supports
up to 4-dimensional input. This breaks any 5-d layers such as 3D
convolutions.
* using intermediate variable
By default TensorFlow allocates all gradient matricies on gpu:0, which makes it pretty much impossible to do parallelize a large model.
colocate_gradients_with_ops puts these matricies next to the operations, allowing you to split your model across multiple GPUs. I ran into this issue myself and this fixed it for me.
I think it's also meant to set gradient computations to be done on the device where the operations are stored, but my belief about that comes from https://github.com/tensorflow/tensorflow/issues/2441
I'm not sure why this isn't the default in TF, so I'm not sure if this should be behind a flag or something, but having to make my own patches to keras to do multi-GPU training seems like the wrong answer.
* Add support for dynamic RNNs in TensorFlow.
* Fix return states
* Add support for go_backwards in dynamic TF RNNs
* Currently broken: TF RNN dropout, go_backwards
* Finalize dynamic RNNs in TF
* Remove unnecessary comment
* Comment out added test
* Comment out functional guide test
* add cropping1d/2d/3d layers
* fix PEP8 issue, fix incorrect doc strings
* add example code on Cropping2D
* fix init/get_config of crop1d/3d, add test codes for cropping1d/2d/3d
* fix test code - PEP8
It doesnt pass test (only in cropping2d and basic_test), but my laptop setting is not correct (it doesnt pass some other existing layes as well), so committing to test it in a correct way.
* change to follow PEP8 again
* update test_convolutonal.py for PEP8, test code to us K.image_dim_ordering()
* PEP8 for test_convolutional.py - indentation
* fix typo. add assert to check cropping lengths
* Upload examples/imdb_fasttext.py which implement the fasttext model
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Fix a issue when only specify one dot_axes for in the Merge layer
* Fix a issue when only specify one dot_axes for in the Merge layer
* Updated dataset documentation to reflect removal of test_split argument
from imbd dataset. Added docstring to reuters dataset load_data.
* Updated imbd and reuters examples in dataset docs to reflect all
available arguments with current default values.
* Added CTC to Theano and Tensorflow backend along with image OCR example
* Fixed python style issues, made data files remote, and made code more idiomatic to Keras
* Fixed a couple more style issues brought up in the original PR
* Reverted wrappers.py
* Fixed potential training-on-validation issue and removed unused imports
* Fixed PEP8 issue
* Remaining PEP8 issues fixed
* Upload examples/imdb_fasttext.py which implement the fasttext model
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Remove Dropout and unnecessary imports
* Added Convolution1D instead of Conv1D, which is depreceated
* updated rest of the example using Conv1D
* Python3 fails to decode utf-8 data, thus using encoding='latin-1'
* added condition for Encoding line 65-67
* Conv1D reverted back to the way it was
* One hot op
* tf too
* Update theano_backend.py
* Use built-in theano op
* Update theano_backend.py
* Add test
* Update test_backends.py
* Update test_backends.py
* Generalize for nD tensors
* Fix docstring on TF backend
* Update theano_backend.py
* Update theano_backend.py
* remove usage of tf.assign() in the tensorflow backend (#3316)
Usage of the tf.assign() function in the set_value() and batch_set_values() functions creates new nodes on the Tensorflow graph which can eventually overflow the memory.
Therefore, the function has been rewritten using placeholders and feed_dict to avoid allocating additional memory.
* Correction to the set_value() function
Change to the set_value() function that had a bug when the variable "value" was a float.
The *1. dummy multiplication was added to avoid having to deal with tf.float32_ref dtypes.
* update set_value() of the tensorflow backend
Removal of the *1. dummy multiplication, replacement with a split() to avoid creating a new operation in the graph.
* fix to have session.run() called once in batch_set_value()
Rewriting of the batch_set_value() to avoid multiple calls to session.run() to improve speed.
* Docker image for test and experiment Keras
- Docker image with CUDA support on ubuntu 14.04
- nvidia-docker script to forward the GPU to the container
- MakeFile to simplify docker commands for build, run, test, ..etc
- Add useful tools like jupyter notebook, ipdb, sklearn for experiments
* update nvidia-docker plugin
* use .theanorc in Dockerfile
* Add tensorflow to the docker image
* update Docker image to cuDNN v5
* test fixes
* move docker to sub directory
* README for docker
* Fix typos
* Add visualization to Dockerfile
* theano backend now supports transposed convolutions
* working deconv
* new example file with deconv vae
* merged with #3273, fixed based on comments, pep8 tested
* test fix
* passes theano test
* start fixing deconv test
* fix deconv layer tests
* fix the right test
sorry, I "fixed" the wrong test last time
* clean up
* replace with_None with fixed_batch_size
* with_None --> fixed_batch_size
* comment edit
* fixed comments online
A number of changes:
1. Switch from Lambda to merge, otherwise code will not run.
2. Rename z_log_std to z_log_var in order for the objective function to make sense
3. Adjust reparameterization trick to reflect use of z_log_var, not z_log_std
4. Remove epsilon_std, since (standard) VAE uses isotropic gaussian prior.
5. Re-balance the weighting of KL and reconstruction terms
6. Use adam instead of rmsprop
7. Increase hidden unit size to improve model
8. Increase batch size to speed up training
* make examples/pretrained_word_embeddings.py more memory efficient
* make examples/pretrained_word_embeddings.py more memory efficient
* rename NB_WORDS to nb_words as it is not a global constant
The method get_uid on common.py first check if a prefix is in _UID_PREFIXED dict
and if it is not, a variable is added to the dict.
However, using a defaultdict, this check is no longer necessary.
* Added 'max' operation to Merge layer. It allows to implement convolutional maxout with two (or more) convoluion layers and one Merge.
* Added 'max' to merge test
* Add multiprocessing for fit generator
* Change maxproc to nb_worker and update documentation
* Simplify multiprocessing test, clarify doc replace maxproc by nb_worker
* Replace maxproc by nb_worker in test
* Replace maxproc by nb_worker in test
* Update the doc: specify non picklable arguments should not be used with multiprocessing
* Add multiprocessing as an option with the pickle_safe argument
* New function signature for conv2d in backend
* Clean up stuff
* Touch-up TF deconv op
* More cleanup
* Support for TF 3D conv/pool
* Move pooling layers to their own file
* Update TF version in Travis config
* Fix conv3d tests
The documentation says that [1]:
> If [classes are] not provided, the list of classes will be automatically inferred (and the order of the classes, which will map to the label indices, will be alphanumeric).
However, the code was adding classes in the order `os.listdir` returned them. This commit alphanumerically sorts the sub-directories before mapping them to label indices.
[1] http://keras.io/preprocessing/image/
On method on_epoch_end, to add new keys to the history dict, first it is
verified if a key is not on the history dict and if that is the case, a new key
is created on the history dict with an empty list as value.
However, this operation search for a key twice in the dict. This same behavior
can be achieved in a single step using dict setdefault method.
An EarlyStopping callback object has internal state variables to tell it
when it has reached its stopping point. These were initialized in __init__(),
so attempting to re-use the same object resulted in immediate stopping. This
prevents (for example) performing early stopping during cross-validation with
the scikit-learn wrapper.
This patch initializes the variables in on_train_begin(), so they are re-set
for each training fold. Tests included.
* Resolve#2960
Introduce `K.var` so that the standard deviation computation can
be made numerically stable. Instead of
K.std(x)
the user is able to write
K.sqrt(K.var(x) + self.epsilon)
avoiding a division by zero in the gradient computation of `sqrt`.
* Fix typos
This issue is due to an unexpected loss of dimensionality when
composing the backend tensor operations "reshape" and "squeeze"
when there are dimensions of length 1.
For example, using a Theano backend the following fails with a
complaint about dimension mismatch:
UpSampling1D(2)(MaxPooling1D(2)(Reshape((2,1))(Input(shape=(2,)))))
The issue arises due to the conflict of two behaviors specific
to the Theano backend:
- Reshape uses Theano's reshape function. Theano's reshape
automatically makes dimensions with length 1 "broadcastable"
- MaxPooling1D's implementation class _Pooling1D has a call method
which uses a dummy dimension which it has to remove. The manner
in which this dummy method is removed it to call "squeeze(x, axis)"
from the backend. The squeeze implementation tells Theano to make
the dummy dimension broadcastable, and then calls Theano's "squeeze",
which removes ALL the broadcastable dimensions; not just the dummy
dimension, but also the length 1 dimension flagged as broadcastable
by reshape. This causes the problem observed above. This behavior
is distinct from the behavior of the TensorFlow backend, which
removes only the requested dimension.
This PR addresses this issue in two ways:
First, it introduces a test which checks the composition of "reshape"
and "squeeze" to make sure we get the same result using both Theano
and TensorFlow backends.
Second, it changes the implementation of squeeze(x,axis) so that the
Theano backend should behave similarly to the TensorFlow backend. With
this change the introduced test passes and the above example works.
* Update regularizers.py
I included a new regularizer named Eigenvalue Decay to the deep learning practitioner that aims at maximum-margin learning. This version approximates the dominant eigenvalue by a soft function given by the power method. For details, see:
Oswaldo Ludwig. "Deep learning with Eigenvalue Decay regularizer." ArXiv eprint arXiv:1604.06985 [cs.LG], (2016). https://www.researchgate.net/publication/301648136_Deep_Learning_with_Eigenvalue_Decay_Regularizer
The syntax for Eigenvalue Decay is similar to the other Keras weight regularizers, e.g.:
model.add(Dense(100, W_regularizer=EigenvalueRegularizer(0.0005)))
* Example with Eigenvalue Decay regularization.
An example from Keras including regularization with Eigenvalue Decay. After training, you have to save the trained weights, create/compile a similar model without Eingenvalue Decay and save this model. Then, you can use your trained weights with this model, see lines 123-153 of CIFAR10_with_Eigenvalue_Decay.py (This is still an open issue).
This example yields a gain in the accuracy by the use of Eigenvalue Decay of 2.71% (averaged over 10 runs).
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update CIFAR10_with_Eigenvalue_Decay.py
* Update regularizers.py
* Update regularizers.py
* Delete CIFAR10_with_Eigenvalue_Decay.py
* Update test_regularizers.py
* Update regularizers.py
* Update test_regularizers.py
* Update regularizers.py
* Update regularizers.py
I needed another reading in Keras backend...
* Issue to get shape of a tensor.
Issue to get shape of a tensor in the class EigenvalueRegularizer: the type returned for shape is different for Theano backend (Theano tensor type) and TF backend (TF TensorShape).
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* Update regularizers.py
* limit progress bar update rate
Limit progress bar update rate in verbose=1 mode. This patch allows to
reduce terminal I/O throughput while keeping reasonable high visual
update rate (defaults to 100 refreshes per second). It helps greatly
when working with large but simple data sets with small batches, which
leads to millions of relatively useless screen updates per second. Also
it helps to keep network traffic at reasonable rates, which
exceptionally useful within laggy networking conditions when using
keras over telnet/ssh, and improve web browser responsibility when
using keras within Jupyter Notebook.
* add docstrings for 'interval' and 'force' arguments
* bug fixed, numpy randint only output positive numbers ranging from 1 to 10e6
* Update theano_backend.py
changed style and numpy randint range
* Update theano_backend.py
removed extra spaces
From the documentation it is not entirely clear that if mask_zero is set
to True, the input_dim argument should be equal to the size of the
vocabulary + 2, as index 0 cannot be used anymore.
(This behaviour seems a bit strange, as it has as a consequence that the
first column of the weights of the embeddings will never be used or
updated. The resulting network thus has a redundant set of parameters).
* add a simple named entity recognition example
add a simple named entity recognition example
* add fast version of GRU
add fast version of GRU
* remove useless stuff
* Faster LSTM
* PEP8
* RNN dropout fix
* PEP
* PEP
* Less code duplication
* LSTM benchmark example
* PEP
* Test implementation modes
* Go through Keras backend
* Much better image data augmentor
* removed unnecessary functions
* shift origin to centre of the image for homographies
* init commit
* change to zoom_range
* Added scikit-image to extras_require in setup.py
* add zoom_range test, exception for invalid zoom_range
* add scikit-image to dependency
* fix fit and retain old functions for unit test
* use ndi insteadskimage in random_transform
* removed buggy code in random_rotations, shears etc and replaced it with todos.
* remove sci-image, implement ndimage based methods, refactor random_transform
* random_zoom, array_to_img consider dim_ordering
* add random_channel_shift, support fill_mode and cval
* image doc, update test_image, PEP8
* fix channel shift clip
* fix doc, refine code
* detail explain of zoom range
* check coding style
* adding a disable_b boolean to Dense
* changing 'disable_b' to 'bias'
Changing the name of the boolean & flipping its behavior so that the default is True and when set to False the bias is not used.
* integrating bias flag fully
changed the bias flag to affect the creation of the self.b variable as well as the output calculation
* fixing a blank line to appease pep8
* Max Over Time in imdb_cnn.py
Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR.
The mayor optimisation a part of the Max over time are:
- Dropout in the Embedding layer.
- Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time.
- Adam optimizer.
Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore.
* Update imdb_cnn.py
* added learning phase to callbacks (#2297)
* cleaned imports
* replaced tabs by spaces
* added case where uses_learning_phase is False
* fixed pep8 blank line bug
Previously, strides were required to be smaller than the convolution
kernel. Usually, this is what a user wants, but there are edge
cases where one might want to do this (for instance, projection
shortcuts in Residual Networks).
LeakyReLU returns a tensor with float64 dtype.
It is stupid, but this line actually produces a float64 array:
```
0.5*np.array(0.2, dtype=np.float32)
```
The theano nnet.relu function does something similar like this with the
LeakyReLU alpha parameter, which lead to a float64 tensor.
The solution is to not cast the alpha to float32.
Furthermore I tighten the `test_utils.layer_test`. It is now
required that the layer's output dtype is equal to the input dtype.
* add in predict_generator and tests
* fix PEP8 details
* Pre-allocate predictions
* make predictions return list if neccessary
* reset batch_size for other tests, make less wonky generator
* Fix merge_dot tests
* Make batch_dot unique
batch_dot is not tensordot! It only accepts one reduce dimension at a
time. Other reduce dimensions should be dome afterwards with K.sum
This means that K.batch_dot will have the same behavior in both
tensorflow and theano. This also means that we have less parenthesis and
less nested lists.
New usage:
merge_mode = 'dot', dot_axes=[axis1, axis2]
Before:
merge_mode = 'dot', dot_axes=[[axis1], [axis2]]
* Backport sign by @the-moliver
* Fix docstrings
* Fix backend batch_dot tests
When saving the weights a TypeError is raised by h5py.
See this issue https://github.com/h5py/h5py/issues/289 for details.
As it is recommended in the issue, the strings are now encoded as utf8.
* Fix merge_dot tests
* Make batch_dot unique
batch_dot is not tensordot! It only accepts one reduce dimension at a
time. Other reduce dimensions should be dome afterwards with K.sum
This means that K.batch_dot will have the same behavior in both
tensorflow and theano. This also means that we have less parenthesis and
less nested lists.
New usage:
merge_mode = 'dot', dot_axes=[axis1, axis2]
Before:
merge_mode = 'dot', dot_axes=[[axis1], [axis2]]
* Backport sign by @the-moliver
* Fix docstrings
Move caches to properties so that containers can override the
implementation to ensure that the cache gets propagated correctly
to child layers when it is changed.
Reset instead of disabling layer and shape cache in __call__
Previously, __call__ did not get the speed benefits from caching
because it disabled it in order to feed the layer new input. This
meant that __call__ could be very slow on complicated structures.
Now, instead of disabling it, we temporarily empty it, then restore
the original when we're done.
This refactor allows the inherited method to work properly for
Sequential and Graph (with single input) containers in addition
to normal layers, so there's no need to override the method.
Previously, __call__ did not work correctly for Graph containers.
Implement Graph.__call__ for multiple inputs
Add option (re-)initialize weights in set_previous
This allows us to use set_previous in places where we previously
manually adjusted the previous layer, which means that layers
that have non-standard set_previous implementations (like Graph)
work properly when they are, for example, the first layer in a
Sequential model.
This commit also adds a clear_previous method.
Add input_shape property to Graph container
---------------------------------------
Squashed from the following commits
add Convolution3D and MaxPooling3D layers
fix 5D tensor in theano, add examples
update conv3d, pool3d, add resize_volumes and spatial_3d_padding
update Convolution3D, MaxPooling3D and AveragePooling3D, add UpSampling3D and ZeroPadding3D
add test functions for Convolution3D, MaxPooling3D, AveragePooling3D, ZeroPadding3D and UpSampling3D
small fix by changing pad_z to pad_t
update comment
skip some tests for tenforflow, @pytest.mark.skipif(K._BACKEND != theano, reason="Requires Theano backend")
use autopep8 to fix the code to match pep8 coding style
small fix (caused by autopep8)
small fix (caused by autopep8)
small fix (caused by autopep8)
fixed the document string for all newly added layers
remove the example and the dataset for 3d
add error messge for tensorflow backend
support stride in pool3d
Rename "params" to "trainable_weights"
change notations and docstrings for 3D layers
fix pep8 error
change variable name in test code
small fix for pep8
add error message and docstring for strides in conv3d
fix test error caused by wrong strides in conv3d
support strides in conv3d by slicing the output
add if statement for stride (1,1,1)
fix get_config according to mdering, and other small fix
fix model_from_json issue by passing a 3d border_mode
fix according to jruales' review
change docstring in Convolution3D
delete docstring about TensorFlow
change docstring in Convolution3D and theano_backend
---------------------------------------
Author: Wei OUYANG <oeway007@gmail.com>
Have noticed how default GRUs works usually worse than LSTMs? It seems that "tanh" is a more sensible activation choice. Also for GRUs, tanh seems to be the default:
see http://arxiv.org/pdf/1412.3555v1.pdf Section 3.2
Squashed commit of the following:
commit 39a59192e96fe4098f1d663384b79b10e3bcc979
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 02:15:29 2016 +0000
Squashed commit of the following:
commit 88faa440d02df8ff356011258e3e89ce44a13e1d
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 02:13:24 2016 +0000
Clean up
commit f55245199a11a202857efb1413ffa3b97c1dcfaf
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:57:50 2016 +0000
Ported dropout for LSTM, GRU, SimpleRNN, and Embedding layer to latest Keras (turned off by default).
Squashed commit of the following:
commit 574c4549da69f8c0831f02dce1ad05331d8b38ed
Merge: 19ef51c bdb149d
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:23:54 2016 +0000
Merge branch 'BRNN_latest' of https://github.com/yaringal/keras into BRNN_latest
commit 19ef51c633544f847cddebeb7a3add0936051f19
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:12:23 2016 +0000
implemented dropout in GRU and SimpleRNN
commit bdb149d1bbff64cc6b4d694090b905153d28e33a
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 01:12:23 2016 +0000
implemented dropout in GRU and SimpleLSTM
commit 72ade3f493dd725fb414cbc65a847259360be138
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 00:52:01 2016 +0000
clean up
commit 9f3d213c91906b3be5c876d539819a8577bc438c
Author: Yarin <yaringal@gmail.com>
Date: Sat Feb 20 00:42:58 2016 +0000
Model test callback
commit d4ffffc26cf24c8b7927209caad4379aac3db9c5
Author: Yarin <yaringal@gmail.com>
Date: Fri Feb 19 23:47:40 2016 +0000
removed dependence on theano
commit 89a4e6576278564ffb882032d5a7ec5758fe00e4
Author: Yarin <yg279@cam.ac.uk>
Date: Fri Feb 19 23:25:13 2016 +0000
working BayesianLSTM and embedding dropout for theano backend
commit 1ab4e19dfe9d49defd5575a5c2b0b880b5c46eb5
Author: Yarin <yg279@cam.ac.uk>
Date: Fri Feb 19 16:41:48 2016 +0000
working BayesianLSTM with dependence on theano
commit 672c27401ee345a69592771cfc9ab017642b6af3
Merge: 9360ea6 b8a9f84
Author: Yarin <yaringal@gmail.com>
Date: Fri Feb 19 00:30:44 2016 +0000
Merge https://github.com/fchollet/keras into BRNN_latest
commit 9360ea6c25eab90e83aebb32eb187c65ed63c01d
Author: Yarin <yaringal@gmail.com>
Date: Thu Feb 18 23:28:35 2016 +0000
work in progress on BayesianLSTM
commit b8a9f84fad
Merge: a1544950f3f563
Author: François Chollet <francois.chollet@gmail.com>
Date: Thu Feb 18 11:24:42 2016 -0800
Merge pull request #1756 from gw0/fix-for-refactor-callbacks
Fix missing callback refactoring.
commit 0f3f56327b
Author: gw0 [http://gw.tnode.com/] <gw.2016@tnode.com>
Date: Thu Feb 18 17:01:45 2016 +0100
Fix missing callback refactoring.
`Sequential.set_weights` would fail for nested `Sequential`
containers. Borrow the implementation of `Graph.set_weights` to
get it working. Also add tests for triply-nested `Sequential`
models.
If sample_weights is to be used as a mask as well as for re-weighting
then it's important that, at least when used as a mask, the output be
rescaled. Otherwise the order of magnitude of your objective changes
purely based on the number of masked entries in your training data.
- Categorical_crossentropy was taking an extra mean, the function
already removes the final dimension of your input, so you don't need
to take a mean as you would with, say, L2 loss.
- The RNN backend call can now take a mask with or without the same
number of dimensions as the input data
- Fix Masking layer for Tensorflow
- Add some tests to confirm objective function shapes
Currently, polynomial interpolation of 3rd order is done when shifting. However, that is not needed because the images are shifted by integer values (crop_left_pixels, crop_top_pixels), and there is nothing to interpolate.
Setting ```order=0``` will speed up random shifts significantly.
The change to the dimshuffle/transpose call to support >3d inputs was
correct for the inputs array but did not apply to the mask array. This
fixes that.
Currently, only 3D input is supported by the rnn function.
Update theano_backend.py
Fix tf too
Avoid slicing
assert ndim>=3
Update theano_backend.py
typo
Update theano_backend.py
This commit fixes the DisconnectedInputError described in issue
the `get_output` method. Before this commit the `updates` member
could would use another input as the `get_output` method, if the
input was changed.
The `params`, `regularizers`, `constraints` and `updates` member of the
AutoEncoder were set in the `__init__` method.
When set_previous was called, the mentioned members were not updated.
This behavior resulted in a DisconnectedInputError.
Now the mentioned members are set in the `build` method and the
`set_previous` method calls the `build` method every time the
input changes. This commit fixes issue #1275.
# The first commit's message is:
test image preprocessing
# The 2nd commit message will be skipped:
# add PIL to enable testing of preprocessing code
# The 3rd commit message will be skipped:
# try a different way to install PIL on travis
# The 4th commit message will be skipped:
# include PIL only in python 2.7
# The 5th commit message will be skipped:
# test image preprocessing
# The 6th commit message will be skipped:
# fall back to Pillow for python 3 image processing
In order to propagate state through _predictions_, I created a new
property of the model, `state_updates` that returns any model step
updates that are needed when doing a stateful prediction. These updates
are identified as *any updates defined by a stateful layer*.
thresholded activations
parametric softplus
some bugfixes
fix error caused by calling layer.build on PReLU
seed the rng in every test individually to make them deterministic
Summary of changes:
- py.test is configured to display test profiling information that shows 10 slowest tests. This would allow additional speed ups if anyone has ideas on some particular test. The slowest test is usually cifar dataset test and tensorflow convolutions. It seems that there are some other IT tests that could be sped up.
- py.test is configured to run with pytest-xdist with 2 processes in parallel because travis does provide multicore support (1.5 cores) and because the slowest cifar test spends time on download which can run in parallel with other tests.
- travis is configured to split backend tests into test matrix to make parallel theano vs tensorflow testing as opposed to rerun all the tests twice for python 2.7.
- pickle filenames in tests are renamed to avoid clashes during multiprocessing
As the graph container was not using each individual layer's get/set
weights, but rather the super class layer.get_weights, which works on
self.params(), it was missing some weights in the process, i.e., the
BatchNormalizationLayer has custom get_weights which allows to save the
running mean/std. However, these running computations are not added to
BatchNormalizationLayer.params(), resulting in losing these weights
after serializing a graph model utilizing a BatchNormalizationLayer.
Fixed to use each node's get/set weights.
When mode='ave', and the dtype of the input is float32, dividing the sum
by shape[1], which is of dtype int64, results in an output of dtype
float64, which is wrong.
fixed to use theano.tensor.mean instead.
When loading regularizers/constraints from config, and the object isn't
found, don't consume the 'name' key.
This enables expansions to keras to be saved/loaded with dictionaries as
some of their parameters.
Signed-off-by: Amit Beka <amit.beka@gmail.com>
This should fix the problem`Exception: Invalid layer: LRN2D` while loading a model that includes LRN2D.
```py
model = Sequential()
model.add(Convolution2D(30, 3, 3, input_shape=(1, 28, 28)))
model.add(LRN2D())
model_def = model.to_yaml()
# this line raises Exception: Invalid layer: LRN2D
model_from_yaml(model_def)
```
The code above could reproduce the problem.
“TypeError: Cannot cast ufunc subtract output from dtype('float64') to
dtype('uint8') with casting rule 'same_kind'” in
keras/preprocessing/image.py, line 239, when using data augmentation.
A bit surprised that keras was using globals() to access layers (doesn't work
across modules.) Hacky solution was to pass a dict mapping name -> class.
I called this dict `custom_layers`.
Is there a better way of doing this that I'm not seeing?
This allows you to do nice things like save JSON models so that they're human
readable & editable. For example:
>>> with open('output.json', 'w') as f:
... f.write(model.to_json(indent=4, sort_keys=True))
...
This makes merge_mode='join' complaint with keras API. Also, the OrderedDict
allows the user to simple .values() and use it as a list if he knows in which
order the inputs were merged.
By allowing sum_values[k] to be other things than lists, it makes it easier for children classes to print "any value" (in my case, a timedelta object).
enable string formatted filenames (e.g. weights.{epoch:02d}.hdf5), so
every epoch will be saved to a different file without overwriting.
Signed-off-by: Amit Beka <amit.beka@gmail.com>
We used nonzero() on the weights in order to ensure that if there
happened to be a NaN or an Inf in the output that was going to be masked
about by the weights anyway, it wouldn't propagate (because 0*inf = NaN)
however this was causing interaction issues if you also used a mask,
because that wasn't using nonzero() properly.
This fixes that, and also fixes what I believe was an issue where I was
calling mean() instead of dividing by the sum of the sample weights.
With lr and momentum being scalars we can change their values without
needing to recompile the model. This PR also includes a Callback called
LrSetter that gets a dict with epoch x lr pairs and set the values of
the later at the begging of the associated epoch.
`refs` is a class attribute, not an instance attribute. If you make `refs` an instance attribute, this will cause `HDF5Matrix` to open the same HDF5 file more than once (which should never happen).
Calling sequences_to_matrix results in an IndexError when nb_words = None. This is caused by a 1-indexed word_index, since sequences_to_matrix expects 0-indexing. Converts word_index to 0-based indexing.
As far as I can tell there is no reason not to support class_weight with
time distributed data, rewriting the standardize_weights function with
that in mind.
urlretrieve will blindly swallow any 4xx and 5xx responses
and then save the html error response in the local file. This
is probably exactly what we don't want, because not only will
the program crash if there is a network hiccup when the error
file cannot be opened, but it will continue to do so when rerun
until the corrupt cached file is found and manually removed.
Luckily, urlretrieve is just a thin wrapper around
FancyURLopener, so we can make our own thin wrapper
that throws an exception instead of caching the
wrong file.
Tested to be working as before when running cached and
uncached datasets, and also verified to fail loudly
when asked to fetch http://httpstat.us/500
Updated adam solver to v8 of paper. The kappa (lambda) parameter has no
practical use and has been removed.
Fixed the calculations for beta_1_t and beta_2_t where also wrong.
Modify to use proper multinomial sampling, with temperature to control diversity. This seems to generate qualitatively better results and is technically more correct.
applying a Convolution2D with border_mode='Full', images will grow in
size, this Layer allows to shrink them back to its original size (or any
other size)
Standard deviation values were being passed as scale values for uniform distributions.
But the relationship is: scale = standard deviation * sqrt(3).
So, the s values in glorot_uniform, lecun_uniform, and he_uniform should have been multiplied by sqrt(3) before being passed into uniform() function. Now it is fixed.
The scan in get_output TimeDistributedDense leaked memory like crazy. Changing it to match get_output in Dense seems to have fixed the problem and behaves identically.
This changes objective functions to no longer return scalars, but
rather tensors of dimension one less than y, representing the loss for
each datapoint in y, on which it is expected you will calculate a weighted mean.
There is no reason to have two different functions for this! The softmax
function can just be configured to always perform the softmax across the
trailing dimension (i.e. nb_dimensions)
Both the training features and labels can be represented as numpy
booleans instead of float32 / float64. This enables standard low RAM
machines to scale up to large datasets. Especially important if you
either have many characters (ASCII), long sequences, or a large dataset.
Both the training features and labels can be represented as numpy
booleans instead of float32 / float64. This enables standard low RAM
machines to scale up to large datasets. Especially important if you
either have many characters (ASCII), long sequences, or a large dataset.
I realized that it makes more sense to have _step *apply* a mask, but
then to set the masked entries to mask_value outside of step. This
should be more efficient, but more importantly should make
implementations easier to understand.
Another nice effect: an alternative masking scheme can be introduced
without changing _step at all.
This led me to realize that I also was not properly passing masks out of
recurrent layers, nor were my tests properly checking for this. I've
resolved this here.
Found a bug? Have a new feature to suggest? Want to contribute changes to the codebase? Make sure to read this first.
## Bug reporting
Your code doesn't work, and you have determined that the issue lies with Keras? Follow these steps to report a bug.
1. Your bug may already be fixed. Make sure to update to the current Keras master branch, as well as the latest Theano/TensorFlow master branch.
To easily update Theano: `pip install git+git://github.com/Theano/Theano.git --upgrade`
2. Search for similar issues. Make sure to delete `is:open` on the issue search to find solved tickets as well. It's possible somebody has encountered this bug already. Also remember to check out Keras' [FAQ](http://keras.io/faq/). Still having a problem? Open an issue on Github to let us know.
3. Make sure you provide us with useful information about your configuration: what OS are you using? What Keras backend are you using? Are you running on GPU? If so, what is your version of Cuda, of cuDNN? What is your GPU?
4. Provide us with a script to reproduce the issue. This script should be runnable as-is and should not require external data download (use randomly generated data if you need to run a model on some test data). We recommend that you use Github Gists to post your code. Any issue that cannot be reproduced is likely to be closed.
5. If possible, take a stab at fixing the bug yourself --if you can!
The more information you provide, the easier it is for us to validate that there is a bug and the faster we'll be able to take action. If you want your issue to be resolved quickly, following the steps above is crucial.
## Requesting a Feature
You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API.
1. Provide a clear and detailed explanation of the feature you want and why it's important to add. Keep in mind that we want features that will be useful to the majority of our users and not just a small subset. If you're just targeting a minority of users, consider writing an add-on library for Keras. It is crucial for Keras to avoid bloating the API and codebase.
2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of your feature. Of course, you don't need to write any real code at this point!
3. After discussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.
## Pull Requests
We love pull requests. Here's a quick guide:
1. If your PR introduces a change in functionality, make sure you start by opening an issue to discuss whether the change should be made, and how to handle it. This will save you from having your PR closed down the road! Of course, if your PR is a simple bug fix, you don't need to do that.
2. Write the code. This is the hard part!
3. Make sure any new function or class you introduce has proper docstrings. Make sure any code you touch still has up-to-date docstrings and documentation.
4. Write tests. Your code should have full unit test coverage. If you want to see your PR merged promptly, this is crucial.
5. Run our test suite locally. It's easy: from the Keras folder, simply run: `py.test tests/`.
- You will need to install `pytest`, `coveralls`, `pytest-cov`, `pytest-xdist`: `pip install pytest pytest-cov python-coveralls pytest-xdist pep8 pytest-pep8`
6. Make sure all tests are passing:
- with the Theano backend, on Python 2.7 and Python 3.5
- with the TensorFlow backend, on Python 2.7
7. We use PEP8 syntax conventions, but we aren't dogmatic when it comes to line length. Make sure your lines stay reasonably sized, though. To make your life easier, we recommend running a PEP8 linter:
- Run a standalone PEP8 check: `py.test --pep8 -m pep8`
- You can automatically fix some PEP8 error by running: `autopep8 -i --select <errors> <FILENAME>` for example: `autopep8 -i --select E128 tests/keras/backend/test_backends.py`
8. When committing, use appropriate, descriptive commit messages. Make sure that your branch history is not a string of "bug fix", "fix", "oops", etc. When submitting your PR, squash your commits into a single commit with an appropriate commit message, to make sure the project history stays clean and readable. See ['rebase and squash'](http://rebaseandsqua.sh/) for technical help on how to squash your commits.
9. Update the documentation. If introducing new functionality, make sure you include code snippets demonstrating the usage of your new feature.
10. Submit your PR. If your changes have been approved in a previous discussion, and if you have complete (and passing) unit tests, your PR is likely to be merged promptly. Otherwise, well...
## Adding new examples
Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. [Existing examples](https://github.com/fchollet/keras/tree/master/examples) show idiomatic Keras code: make sure to keep your own script in the same spirit.
[](https://gitter.im/Keras-io/Lobby)
## You have just found Keras.
Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python / Theano so as not to have to deal with the dearth of ecosystem in Lua. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Keras is a high-level neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research.*
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
-supports both convolutional networks (for vision) and recurrent networks (for sequence data). As well as combinations of the two.
-runs seamlessly on the CPU and the GPU.
-Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
-Supports both convolutional networks and recurrent networks, as well as combinations of the two.
- Supports arbitrary connectivity schemes (including multi-input and multi-output training).
- Runs seamlessly on CPU and GPU.
Read the documentation at [Keras.io](http://keras.io).
Keras is compatible with __Python 2.7-3.4__.
Keras is compatible with:__Python 2.7-3.5__.
------------------
## Guiding principles
- __Modularity.__ A model is understood as a sequence of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions and dropout are all standalone modules that you can combine to create new models.
- __Modularity.__ A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Minimalism.__ Each module should be kept short and simple. Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Easy extensibility.__ New features (a new module, per the above definition, or a new way to combine modules together) are dead simple to add (as new classes/functions), and existing modules provide ample examples.
- __Easy extensibility.__ New modules are dead simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, benefits from syntax highlighting, and most of all, allows for ease of extensibility. See for yourself with the examples below.
- __Work with Python__. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
## Examples
### Multilayer Perceptron (MLP):
------------------
## Getting started: 30 seconds to Keras
The core data structure of Keras is a __model__, a way to organize layers. The main type of model is the [`Sequential`](http://keras.io/getting-started/sequential-model-guide) model, a linear stack of layers. For more complex architectures, you should use the [Keras functional API](http://keras.io/getting-started/functional-api-guide).
If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Building a question answering system, an image classification model, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?
For a more in-depth tutorial about Keras, you can check out:
-[Getting started with the Sequential model](http://keras.io/getting-started/sequential-model-guide)
- [Getting started with the functional API](http://keras.io/getting-started/functional-api-guide)
In the [examples folder](https://github.com/fchollet/keras/tree/master/examples) of the repository, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, etc.
## Current capabilities
------------------
For complete coverage of the API, check out [the Keras documentation](http://keras.io).
A few highlights: convnets, LSTM, GRU, word2vec-style embeddings, PReLU, batch normalization...
## Installation
Keras uses the following dependencies:
- numpy, scipy
- Theano
- See installation instructions: http://deeplearning.net/software/theano/install.html#install
- pyyaml
- HDF5 and h5py (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: cuDNN.
Once you have the dependencies installed, cd to the Keras folder and run the install command:
To install Keras, `cd` to the Keras folder and run the install command:
```sh
sudo python setup.py install
```
You can also install Keras from PyPI:
```sh
sudo pip install keras
```
------------------
## Switching from TensorFlow to Theano
By default, Keras will use TensorFlow as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
------------------
## Support
You can ask questions and join the development discussion:
- On the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
- On the [Keras Gitter channel](https://gitter.im/Keras-io/Lobby).
You can also post bug reports and feature requests in [Github issues](https://github.com/fchollet/keras/issues). Make sure to read [our guidelines](https://github.com/fchollet/keras/blob/master/CONTRIBUTING.md) first.
------------------
## Why this name, Keras?
Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
>_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
- __softmax__: Should only be applied to 2D layers (expected shape: `(nb_samples, nb_dims)`).
- __time_distributed_softmax__: Softmax applied to every sample at every timestep of a layer of shape `(nb_samples, nb_timesteps, nb_dims)`.
- __softplus__
- __relu__
- __tanh__
- __sigmoid__
- __hard_sigmoid__
- __linear__
## On Advanced Activations
Activations that are more complex than a simple Theano function (eg. learnable activations, configurable activations, etc.) are available as [Advanced Activation layers](layers/advanced_activations.md), and can be found in the module `keras.layers.advanced_activations`. These include PReLU and LeakyReLU.
A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callback (as the keyword argument `callbacks`) to the `.fit()` method of the `Sequential` model. The relevant methods of the callbacks will then be called at each stage of the training.
---
## Base class
```python
keras.callbacks.Callback()
```
- __Properties__:
- __params__: dict. Training parameters (eg. verbosity, batch size, number of epochs...).
- __model__: `keras.models.Model`. Reference of the model being trained.
- __Methods__:
- __on_train_begin__(logs={}): Method called at the beginning of training.
- __on_train_end__(logs={}): Method called at the end of training.
- __on_epoch_begin__(epoch, logs={}): Method called at the beginning of epoch `epoch`.
- __on_epoch_end__(epoch, logs={}): Method called at the end of epoch `epoch`.
- __on_batch_begin__(batch, logs={}): Method called at the beginning of batch `batch`.
- __on_batch_end__(batch, logs={}): Method called at the end of batch `batch`.
The `logs` dictionary will contain keys for quantities relevant to the current batch or epoch. Currently, the `.fit()` method of the `Sequential` model class will include the following quantities in the `logs` that it passes to its callbacks:
- __on_epoch_end__: logs optionally include `val_loss` (if validation is enabled in `fit`), and `val_accuracy` (if validation and accuracy monitoring are enabled).
- __on_batch_begin__: logs include `size`, the number of samples in the current batch.
- __on_batch_end__: logs include `loss`, and optionally `accuracy` (if accuracy monitoring is enabled).
---
## Create a callback
You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.
Here's a simple example saving a list of losses over each batch during training:
Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses [Theano](http://deeplearning.net/software/theano/) under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- supports both __convolutional networks__ and __recurrent networks__ (LSTM, GRU, etc). As well as combinations of the two.
- runs seamlessly on the CPU and the GPU.
## Guiding principles
- __Modularity.__ A model is understood as a sequence of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions and dropout are all standalone modules that you can combine to create new models.
- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Easy extensibility.__ A new feature (a new module, per the above definition, or a new way to combine modules together) are dead simple to add (as new classes/functions), and existing modules provide ample examples.
- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, benefits from syntax highlighting, and most of all, allows for ease of extensibility.
## Code
Find the code on Github: [fchollet/keras](https://github.com/fchollet/keras).
## License
Keras is licensed under the [MIT license](http://opensource.org/licenses/MIT).
## Getting started: 30 seconds to Keras
The core datastructure of Keras is a __model__, a way to organize layers. Here's a sequential model (a linear pile of layers).
If you need to, you can further configure your optimizer. A core principle of Keras is make things things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Building a network of LSTMs, a deep CNN, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?
Have a look at the [examples](examples.md).
## Installation
Keras uses the following dependencies:
- numpy, scipy
- Theano
- See [installation instructions](http://deeplearning.net/software/theano/install.html#install).
- HDF5 and h5py (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: cuDNN.
Once you have the dependencies installed, clone the repo:
```bash
git clone https://github.com/fchollet/keras.git
```
Go to the Keras folder and run the install command:
```bash
cd keras
sudo python setup.py install
```
## Support
You can ask questions and join the development discussion on the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
## Contribution Guidelines
Keras welcomes all contributions from the community.
- Keep a pragmatic mindset and avoid bloat. Only add to the source if that is the only path forward.
- New features should be documented. Make sure you update the documentation along with your Pull Request.
- The documentation for every new feature should include a usage example in the form of a code snippet.
- All changes should be tested. A formal test process will be introduced very soon.
- Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of [examples](https://github.com/fchollet/keras/tree/master/examples).
## Why this name, Keras?
Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
> _"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_
Parametrized linear unit. Similar to a LeakyReLU, where each input unit has its alpha coefficient, and where these coefficients are learned during training.
- __Input shape__: Same as `input_shape`. This layer cannot be used as first layer in a model.
- __Output shape__: Same as input.
- __Arguments__:
- __input_shape__: tuple.
- __References__:
- [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://arxiv.org/pdf/1502.01852v1.pdf)
Containers are ensembles of layers that can be interacted with through the same API as `Layer` objects.
## Sequential
```python
keras.layers.containers.Sequential(layers=[])
```
The Sequential container is a linear stack of layers. Apart from the `add` methods and the `layers` constructor argument, the API is identical to that of the `Layer` class.
This class is also the basis for the `keras.models.Sequential` architecture.
The `layers` constructor argument is a list of Layer instances.
Connect the input of the current layer to the output of the argument layer.
- __Return__: None.
- __Arguments__:
- __previous_layer__: Layer object.
```python
output(train)
```
Get the output of the layer.
- __Return__: Theano tensor.
- __Arguments__:
- __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
```python
get_input(train)
```
Get the input of the layer.
- __Return__: Theano tensor.
- __Arguments__:
- __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
```python
get_weights()
```
Get the weights of the parameters of the layer.
- __Return__: List of numpy arrays (one per layer parameter).
```python
set_weights(weights)
```
Set the weights of the parameters of the layer.
- __Arguments__:
- __weights__: List of numpy arrays (one per layer parameter). Should be in the same order as what `get_weights(self)` returns.
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
Fully-connected layer distributed over the time dimension. Useful after a recurrent network set to `return_sequences=True`.
- __Input shape__: 3D tensor with shape: `(nb_samples, nb_timesteps, input_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
A customizable autoencoder model. If `output_reconstruction = True` then dim(input) = dim(output) else dim(output) = dim(hidden)
- __Input shape__: The layer shape is defined by the encoder definitions
- __Output shape__: The layer shape is defined by the decoder definitions
- __Arguments__:
- __encoder__: A [layer](./) or [layer container](./containers.md).
- __decoder__: A [layer](./) or [layer container](./containers.md).
- __output_reconstruction__: If this is False the when .predict() is called the output is the deepest hidden layer's activation. Otherwise the output of the final decoder layer is presented. Be sure your validation data confirms to this logic if you decide to use any.
- __tie_weights__: If True then the encoder bias is tied to the decoder bias. **Note**: This required the encoder layer corresponding to this decoder layer to be of the same time, eg: Dense:Dense
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
A denoising autoencoder model that inherits the base features from autoencoder.
Since this layer uses similar logic to Dropout it cannot be the first layer in a pipeline.
- __Input shape__: The layer shape is defined by the encoder definitions
- __Output shape__: The layer shape is defined by the decoder definitions
- __Arguments__:
- __encoder__: A [layer](./) or [layer container](./containers.md).
- __decoder__: A [layer](./) or [layer container](./containers.md).
- __output_reconstruction__: If this is False the when .predict() is called the output is the deepest hidden layer's activation. Otherwise the output of the final decoder layer is presented. Be sure your validation data confirms to this logic if you decide to use any.
- __tie_weights__: If True then the encoder bias is tied to the decoder bias. **Note**: This required the encoder layer corresponding to this decoder layer to be of the same time, eg: Dense:Dense
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __corruption_level__: the amount of binomial noise added to the input layer of the model.
- __Input shape__: This layer does not assume a specific input shape. As a result, it cannot be used as the first layer in a model.
- __Output shape__: Same as input.
- __Arguments__:
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
---
## Dropout
```python
keras.layers.core.Dropout(p)
```
Apply dropout to the input. Dropout consists in randomly setting a fraction `p` of input units to 0 at each update during training time, which helps prevent overfitting. Reference: [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)
- __Input shape__: This layer does not assume a specific input shape.
- __Output shape__: Same as input.
- __Arguments__:
- __p__: float (0 <= p < 1). Fraction of the input that gets dropped out at training time.
---
## Reshape
```python
keras.layers.core.Reshape(*dims)
```
Reshape the input to a new shape containing the same number of units.
- __Input shape__: This layer does not assume a specific input shape.
A dense maxout layer. A `MaxoutDense` layer takes the element-wise maximum of `nb_feature``Dense(input_dim, output_dim)` linear layers. This allows the layer to learn a convex, piecewise linear activation function over the inputs. See [this paper](http://arxiv.org/pdf/1302.4389.pdf) for more details. Note that this is a *linear* layer -- if you wish to apply activation function (you shouldn't need to -- they are universal function approximators), an `Activation` layer must be added after.
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0.
- __output_dim__: int >= 0.
- __nb_feature__: int >= 0. the number of features to create for the maxout. This is equivalent to the number of piecewise elements to be allowed for the activation function.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the main weights matrix.
- __b_regularizer__: instance of the [regularizers](../regularizers.md) module, applied to the bias.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
- __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
Merge the output of a list of models into a single tensor, following one of two modes: `sum` or `concat`.
- __Arguments__:
- __models__: List of `Sequential` models.
- __mode__: String, one of `{'sum', 'concat'}`. `sum` will simply sum the outputs of the models (therefore all models should have an output with the same shape). `concat` will concatenate the outputs along the last dimension (therefore all models should have an output that only differ along the last dimension).
Turn positive integers (indexes) into denses vectors of fixed size,
eg. `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]`
- __Input shape__: 2D tensor with shape: `(nb_samples, maxlen)`.
- __Output shape__: 3D tensor with shape: `(nb_samples, maxlen, output_dim)`.
- __Arguments__:
- __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occuring in the input data.
- __output_dim__: int >= 0. Dimension of the dense embedding.
- __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
- __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the embedding matrix.
- __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the embedding matrix.
This layer turns a pair of words (a pivot word + a context word, ie. a word from the same context as a pivot, or a random, out-of-context word), indentified by their indices in a vocabulary, into two dense reprensentations (word representation and context representation).
Then it returns `activation(dot(pivot_embedding, context_embedding))`, which can be trained to encode the probability of finding the context word in the context of the pivot word (or reciprocally depending on your training procedure).
For more context, see Mikolov et al.: [Efficient Estimation of Word reprensentations in Vector Space](http://arxiv.org/pdf/1301.3781v3.pdf)
- __Input shape__: 2D tensor with shape: `(nb_samples, 2)`.
- __Output shape__: 2D tensor with shape: `(nb_samples, 1)`.
- __Arguments__:
- __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occuring in the input data.
- __proj_dim__: int >= 0. Dimension of the dense embedding used internally.
- __init__: name of initialization function for the embeddings (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
- __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
- __weights__: list of numpy arrays to set as initial weights. The list should have 2 element, both of shape `(input_dim, proj_dim)`. The first element is the word embedding weights, the second one is the context embedding weights.
Fully connected RNN where output is to fed back to input. Not a particularly useful model, included for demonstration purposes.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __activation__: activation function. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __weights__: list of numpy arrays to set as initial weights. The list should have 3 elements, of shapes: `[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]`.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
Not a particularly useful model, included for demonstration purposes.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __depth__: int >= 1. Lookback depth (eg. depth=1 is equivalent to SimpleRNN).
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have depth+2 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have 9 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __References__:
- [On the Properties of Neural Machine Translation: Encoder–Decoder Approaches](http://www.aclweb.org/anthology/W14-4012)
- [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](http://arxiv.org/pdf/1412.3555v1.pdf)
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
- __Output shape__:
- if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, ouput_dim)`.
- else: 2D tensor with shape: `(nb_samples, output_dim)`.
- __Arguments__:
- __input_dim__: dimension of the input.
- __output_dim__: dimension of the internal projections and the final output.
- __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
- __inner_init__: weight initialization function for the inner cells.
- __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
- __inner_activation__: activation function for the inner cells.
- __weights__: list of numpy arrays to set as initial weights. The list should have 12 elements.
- __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
- __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
- __loss__: str (name of objective function) or objective function. See [objectives](objectives.md).
- __class_mode__: one of "categorical", "binary". This is only used for computing classification accuracy or using the predict_classes method.
- __theano_mode__: A `theano.compile.mode.Mode` ([reference](http://deeplearning.net/software/theano/library/compile/mode.html)) instance controlling specifying compilation options.
- __fit__(X, y, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, show_accuracy=False, callbacks=[]): Train a model for a fixed number of epochs.
- __Return__: a history dictionary with a record of training loss values at successive epochs, as well as validation loss values (if applicable), accuracy (if applicable), etc.
- __Arguments__:
- __X__: data.
- __y__: labels.
- __batch_size__: int. Number of samples per gradient update.
- __nb_epoch__: int.
- __verbose__: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
- __validation_split__: float (0. < x < 1). Fraction of the data to use as held-out validation data.
- __validation_data__: tuple (X, y) to be used as held-out validation data. Will override validation_split.
- __shuffle__: boolean. Whether to shuffle the samples at each epoch.
- __show_accuracy__: boolean. Whether to display class accuracy in the logs to stdout at each epoch.
- __callbacks__: `keras.callbacks.Callback` list. List of callbacks to apply during training. See [callbacks](callbacks.md).
- __evaluate__(X, y, batch_size=128, show_accuracy=False, verbose=1): Show performance of the model over some validation data.
- __Return__: The loss score over the data.
- __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
- __predict__(X, batch_size=128, verbose=1):
- __Return__: An array of predictions for some test data.
- __Arguments__: Same meaning as fit method above.
- __predict_classes__(X, batch_size=128, verbose=1): Return an array of class predictions for some test data.
- __Return__: An array of labels for some test data.
- __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
- __train__(X, y, accuracy=False): Single gradient update on one batch. if accuracy==False, return tuple (loss_on_batch, accuracy_on_batch). Else, return loss_on_batch.
- __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
- __test__(X, y, accuracy=False): Single performance evaluation on one batch. if accuracy==False, return tuple (loss_on_batch, accuracy_on_batch). Else, return loss_on_batch.
- __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
- __save_weights__(fname, overwrite=False): Store the weights of all layers to a HDF5 file. If overwrite==False and the file already exists, an exception will be thrown.
- __load_weights__(fname): Sets the weights of a model, based to weights stored by __save__weights__. You can only __load__weights__ on a savefile from a model with an identical architecture. __load_weights__ can be called either before or after the __compile__ step.
You can either pass the name of an existing objective, or pass a Theano symbolic function that returns a scalar and takes the following two arguments:
- __y_true__: True labels. Theano tensor.
- __y_pred__: Predictions. Theano tensor of the same shape as y_true.
For a few examples of such functions, check out the [objectives source](https://github.com/fchollet/keras/blob/master/keras/objectives.py).
## Available objectives
- __mean_squared_error__ / __mse__
- __mean_absolute_error__ / __mae__
- __squared_hinge__
- __hinge__
- __binary_crossentropy__: Also known as logloss.
- __categorical_crossentropy__: Also known as multiclass logloss. __Note__: using this objective requires that your labels are binary arrays of shape `(nb_samples, nb_classes)`.
You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
```python
# pass optimizer by name: default parameters will be used
Adam optimizer, proposed by Kingma and Lei Ba in [Adam: A Method For Stochastic Optimization](http://arxiv.org/pdf/1412.6980v4.pdf). Default parameters are those suggested in the paper. The parameter "lambda" from the paper has been renamed kappa, for syntactic reasons.
__Arguments__:
- __lr__: float >= 0. Learning rate.
- __beta_1__, __beta_2__: floats, 0 < beta < 1. Generally close to 1.
- __epsilon__: float >= 0. Fuzz factor.
- __kappa__: float 0 < kappa < 1. Lambda parameter in the original paper.
- __fit(X)__: Required if featurewise_center or featurewise_std_normalization or zca_whitening. Compute necessary quantities on some sample data.
- __Arguments__:
- __X__: sample data.
- __augment__: Boolean (default: False). Whether to fit on randomly augmented samples.
- __rounds__: int (default: 1). If augment, how many augmentation passes over the data to use.
- __flow(X, y)__:
- __Arguments__:
- __X__: data.
- __y__: labels.
- __batch_size__: int (default: 32).
- __shuffle__: boolean (defaut: False).
- __save_to_dir__: None or str. This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str. Prefix to use for filenames of saved pictures.
Activations can either be used through an `Activation` layer, or through the `activation` argument supported by all forward layers:
```python
fromkeras.layers.coreimportActivation,Dense
model.add(Dense(64))
model.add(Activation('tanh'))
```
is equivalent to:
```python
model.add(Dense(64,activation='tanh'))
```
You can also pass an element-wise Theano/TensorFlow function as an activation:
```python
fromkerasimportbackendasK
deftanh(x):
returnK.tanh(x)
model.add(Dense(64,activation=tanh))
model.add(Activation(tanh))
```
## Available activations
- __softmax__: Softmax applied across inputs last dimension. Expects shape either `(nb_samples, nb_timesteps, nb_dims)` or `(nb_samples, nb_dims)`.
- __softplus__
- __softsign__
- __relu__
- __tanh__
- __sigmoid__
- __hard_sigmoid__
- __linear__
## On Advanced Activations
Activations that are more complex than a simple Theano/TensorFlow function (eg. learnable activations, configurable activations, etc.) are available as [Advanced Activation layers](layers/advanced-activations.md), and can be found in the module `keras.layers.advanced_activations`. These include PReLU and LeakyReLU.
Keras Applications are deep learning models that are made available alongside pre-trained weights.
These models can be used for prediction, feature extraction, and fine-tuning.
Weights are downloaded automatically when instantiating a model. They are stored at `~/.keras/models/`.
## Available models
### Models for image classification with weights trained on ImageNet:
- [Xception](#xception)
- [VGG16](#vgg16)
- [VGG19](#vgg19)
- [ResNet50](#resnet50)
- [InceptionV3](#inceptionv3)
All of these architectures (except Xception) are compatible with both TensorFlow and Theano, and upon instantiation the models will be built according to the image dimension ordering set in your Keras configuration file at `~/.keras/keras.json`. For instance, if you have set `image_dim_ordering=tf`, then any model loaded from this repository will get built according to the TensorFlow dimension ordering convention, "Width-Height-Depth".
The Xception model is only available for TensorFlow, due to its reliance on `SeparableConvolution` layers.
### Model for music audio file auto-tagging (taking as input Mel-spectrograms):
VGG16 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
### Returns
A Keras model instance.
### References
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556): please cite this paper if you use the VGG models in your work.
### License
These weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/).
VGG19 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
### Returns
A Keras model instance.
### References
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
### License
These weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/).
ResNet50 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both
with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
### Arguments
- include_top: whether to include the fully-connected layer at the top of the network.
- weights: one of `None` (random initialization) or "imagenet" (pre-training on ImageNet).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
### Returns
A Keras model instance.
### References
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
### License
These weights are ported from the ones [released by Kaiming He](https://github.com/KaimingHe/deep-residual-networks) under the [MIT license](https://github.com/KaimingHe/deep-residual-networks/blob/master/LICENSE).
A convolutional-recurrent model taking as input a vectorized representation of the MelSpectrogram of a music track and capable of outputting the musical genre of the track. You can use `keras.applications.music_tagger_crnn.preprocess_input` to convert a sound file to a vectorized spectrogram. This requires to have installed the [Librosa](http://librosa.github.io/librosa/) library. See [the usage example](#music-tagging-and-feature-extraction-with-musictaggercrnn).
### Arguments
- weights: one of `None` (random initialization) or "msd" (pre-training on [Million Song Dataset](http://labrosa.ee.columbia.edu/millionsong/)).
- input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
- include_top: whether to include the 1 fully-connected layer (output layer) at the top of the network. If False, the network outputs 32-dim features.
### Returns
A Keras model instance.
### References
- [Convolutional Recurrent Neural Networks for Music Classification](https://arxiv.org/abs/1609.04243)
### License
These weights are ported from the ones [released by Keunwoo Choi](https://github.com/keunwoochoi/music-auto_tagging-keras) under the [MIT license](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/LICENSE.md).
### Examples: music tagging and audio feature extraction
Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.
At this time, Keras has two backend implementations available: the **TensorFlow** backend and the **Theano** backend.
- [TensorFlow](http://www.tensorflow.org/) is an open-source symbolic tensor manipulation framework developed by Google, Inc.
- [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA/MILA Lab at Université de Montréal.
In the future, we are likely to add more backend options. If you are interested in developing a new backend, get in touch!
----
## Switching from one backend to another
If you have run Keras at least once, you will find the Keras configuration file at:
`~/.keras/keras.json`
If it isn't there, you can create it.
The default configuration file looks like this:
```
{
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
```
Simply change the field `backend` to either `"theano"` or `"tensorflow"`, and Keras will use the new configuration next time you run any Keras code.
You can also define the environment variable ``KERAS_BACKEND`` and this will
override what is defined in your config file :
```bash
KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
```
----
## Using the abstract Keras backend to write new code
If you want the Keras modules you write to be compatible with both Theano and TensorFlow, you have to write them via the abstract Keras backend API. Here's an intro.
You can import the backend module via:
```python
from keras import backend as K
```
The code below instantiates an input placeholder. It's equivalent to `tf.placeholder()` or `T.matrix()`, `T.tensor3()`, etc.
```python
input = K.placeholder(shape=(2, 4, 5))
# also works:
input = K.placeholder(shape=(None, 4, 5))
# also works:
input = K.placeholder(ndim=3)
```
The code below instantiates a shared variable. It's equivalent to `tf.variable()` or `theano.shared()`.
```python
val = np.random.random((3, 4, 5))
var = K.variable(value=val)
# all-zeros variable:
var = K.zeros(shape=(3, 4, 5))
# all-ones:
var = K.ones(shape=(3, 4, 5))
```
Most tensor operations you will need can be done as you would in TensorFlow or Theano:
A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument `callbacks`) to the `.fit()` method of the `Sequential` model. The relevant methods of the callbacks will then be called at each stage of the training.
---
{{autogenerated}}
---
# Create a callback
You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.
Here's a simple example saving a list of losses over each batch during training:
Functions from the `constraints` module allow setting constraints (eg. non-negativity) on network parameters during optimization.
The keyword arguments used for passing constraints to parameters in a layer will depend on the layer.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers `Dense`, `TimeDistributedDense`, `MaxoutDense`, `Convolution1D` and `Convolution2D` have a unified API.
In the `Dense` layer it is simply `W_constraint` for the main weights matrix, and `b_constraint` for the bias.
@@ -44,8 +44,6 @@ Dataset of 50,000 32x32 color training images, labeled over 100 categories, and
## IMDB Movie reviews sentiment classification
`keras.datasets.imdb`
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a [sequence](preprocessing/sequence.md) of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".
As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word.
@@ -53,8 +51,16 @@ As a convention, "0" does not stand for a specific word, but instead is used to
- __nb_words__: integer or None. Top most frequent words to consider. Any less frequent word will appear as 0 in the sequence data.
- __skip_top__: integer. Top most frequent words to ignore (they will appear as 0s in the sequence data).
- __maxlen__: int. Maximum sequence length. Any longer sequence will be truncated.
- __test_split__: float. Fraction of the dataset to be used as test data.
- __seed__: int. Seed for reproducible data shuffling.
- __start_char__: char. The start of a sequence will be marked with this character.
Set to 1 because 0 is usually the padding character.
- __oov_char__: char. words that were cut out because of the `nb_words`
or `skip_top` limit will be replaced with this character.
- __index_from__: int. Index actual words with this index and higher.
---
## Reuters newswire topics classification
`keras.datasets.reuters`
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with the IMDB dataset, each wire is encoded as a sequence of word indexes (same conventions).
- [How should I cite Keras?](#how-should-i-cite-keras)
- [How can I run Keras on GPU?](#how-can-i-run-keras-on-gpu)
- [How can I save a Keras model?](#how-can-i-save-a-keras-model)
- [Why is the training loss much higher than the testing loss?](#why-is-the-training-loss-much-higher-than-the-testing-loss)
- [How can I visualize the output of an intermediate layer?](#how-can-i-visualize-the-output-of-an-intermediate-layer)
- [How can I use Keras with datasets that don't fit in memory?](#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory)
- [How can I interrupt training when the validation loss isn't decreasing anymore?](#how-can-i-interrupt-training-when-the-validation-loss-isnt-decreasing-anymore)
- [How is the validation split computed?](#how-is-the-validation-split-computed)
- [Is the data shuffled during training?](#is-the-data-shuffled-during-training)
- [How can I record the training / validation loss / accuracy at each epoch?](#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch)
- [How can I "freeze" layers?](#how-can-i-freeze-keras-layers)
- [How can I use stateful RNNs?](#how-can-i-use-stateful-rnns)
- [How can I remove a layer from a Sequential model?](#how-can-i-remove-a-layer-from-a-sequential-model)
- [How can I use pre-trained models in Keras?](#how-can-i-use-pre-trained-models-in-keras)
---
### How should I cite Keras?
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
The name 'gpu' might have to be changed depending on your device's identifier (e.g. `gpu0`, `gpu1`, etc).
Method 2: set up your `.theanorc`: [Instructions](http://deeplearning.net/software/theano/library/config.html)
Method 3: manually set `theano.config.device`, `theano.config.floatX` at the beginning of your code:
```python
importtheano
theano.config.device='gpu'
theano.config.floatX='float32'
```
---
### How can I save a Keras model?
*It is not recommended to use pickle or cPickle to save a Keras model.*
You can use `model.save(filepath)` to save a Keras model into a single HDF5 file which will contain:
- the architecture of the model, allowing to re-create the model
- the weights of the model
- the training configuration (loss, optimizer)
- the state of the optimizer, allowing to resume training exactly where you left off.
You can then use `keras.models.load_model(filepath)` to reinstantiate your model.
`load_model` will also take care of compiling the model using the saved training configuration
(unless the model was never compiled in the first place).
Example:
```python
fromkeras.modelsimportload_model
model.save('my_model.h5')# creates a HDF5 file 'my_model.h5'
delmodel# deletes the existing model
# returns a compiled model
# identical to the previous one
model=load_model('my_model.h5')
```
If you only need to save the **architecture of a model**, and not its weights or its training configuration, you can do:
```python
# save as JSON
json_string=model.to_json()
# save as YAML
yaml_string=model.to_yaml()
```
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
```python
# model reconstruction from JSON:
fromkeras.modelsimportmodel_from_json
model=model_from_json(json_string)
# model reconstruction from YAML
model=model_from_yaml(yaml_string)
```
If you need to save the **weights of a model**, you can do so in HDF5 with the code below.
Note that you will first need to install HDF5 and the Python library h5py, which do not come bundled with Keras.
```python
model.save_weights('my_model_weights.h5')
```
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the *same* architecture:
```python
model.load_weights('my_model_weights.h5')
```
If you need to load weights into a *different* architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by *layer name*:
model.add(Dense(2,input_dim=3,name="dense_1"))# will be loaded
model.add(Dense(10,name="new_dense"))# will not be loaded
# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname,by_name=True)
```
---
### Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
---
### How can I visualize the output of an intermediate layer?
You can build a Keras function that will return the output of a certain layer given a certain input, for example:
Another more flexible way of getting output from intermediate layers is to use the [functional API](/getting-started/functional-api-guide). For example, if you have created an autoencoder for MNIST:
```python
inputs=Input(shape=(784,))
encoded=Dense(32,activation='relu')(inputs)
decoded=Dense(784)(encoded)
model=Model(input=inputs,output=decoded)
```
After compiling and training the model, you can get the output of the data from the encoder like this:
```python
encoder=Model(input=inputs,output=encoded)
X_encoded=encoder.predict(X)
```
---
### How can I use Keras with datasets that don't fit in memory?
You can do batch training using `model.train_on_batch(X, y)` and `model.test_on_batch(X, y)`. See the [models documentation](/models/sequential).
Alternatively, you can write a generator that yields batches of training data and use the method `model.fit_generator(data_generator, samples_per_epoch, nb_epoch)`.
You can see batch training in action in our [CIFAR10 example](https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py).
---
### How can I interrupt training when the validation loss isn't decreasing anymore?
Find out more in the [callbacks documentation](/callbacks).
---
### How is the validation split computed?
If you set the `validation_split` argument in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc.
---
### Is the data shuffled during training?
Yes, if the `shuffle` argument in `model.fit` is set to `True` (which is the default), the training data will be randomly shuffled at each epoch.
Validation data is never shuffled.
---
### How can I record the training / validation loss / accuracy at each epoch?
The `model.fit` method returns an `History` callback, which has a `history` attribute containing the lists of successive losses and other metrics.
```python
hist=model.fit(X,y,validation_split=0.2)
print(hist.history)
```
---
### How can I "freeze" Keras layers?
To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
You can pass a `trainable` argument (boolean) to a layer constructor to set a layer to be non-trainable:
```python
frozen_layer=Dense(32,trainable=False)
```
Additionally, you can set the `trainable` property of a layer to `True` or `False` after instantiation. For this to take effect, you will need to call `compile()` on your model after modifying the `trainable` property. Here's an example:
```python
x=Input(shape=(32,))
layer=Dense(32)
layer.trainable=False
y=layer(x)
frozen_model=Model(x,y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.fit(data,labels)# this does NOT update the weights of `layer`
trainable_model.fit(data,labels)# this updates the weights of `layer`
```
---
### How can I use stateful RNNs?
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If `X1` and `X2` are successive batches of samples, then `X2[i]` is the follow-up sequence to `X1[i]`, for every `i`.
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a `batch_input_shape` argument to the first layer in your model. It should be a tuple of integers, e.g. `(32, 10, 16)` for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set `stateful=True` in your RNN layer(s).
To reset the states accumulated:
- use `model.reset_states()` to reset the states of all layers in the model
- use `layer.reset_states()` to reset the states of a specific stateful RNN layer
Example:
```python
X# this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10
Notes that the methods `predict`, `fit`, `train_on_batch`, `predict_classes`, etc. will *all* update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
---
### How can I remove a layer from a Sequential model?
You can remove the last added layer in a Sequential model by calling `.pop()`:
For a few simple usage examples, see [the documentation for the Applications module](/applications).
For a detailed example of how to use such a pre-trained model for feature extraction or for fine-tuning, see [this blog post](http://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).
The VGG16 model is also the basis for several Keras example scripts:
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
This guide assumes that you are already familiar with the `Sequential` model.
Let's start with something simple.
-----
## First example: fully connected network
The `Sequential` model is probably a better choice to implement such a network, but it helps to start with something really simple.
- A layer instance is callable (on a tensor), and it returns a tensor
- Input tensor(s) and output tensor(s) can then be used to define a `Model`
- Such a model can be trained just like Keras `Sequential` models.
```python
fromkeras.layersimportInput,Dense
fromkeras.modelsimportModel
# this returns a tensor
inputs=Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x=Dense(64,activation='relu')(inputs)
x=Dense(64,activation='relu')(x)
predictions=Dense(10,activation='softmax')(x)
# this creates a model that includes
# the Input layer and three Dense layers
model=Model(input=inputs,output=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data,labels)# starts training
```
-----
## All models are callable, just like layers
With the functional API, it is easy to re-use trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just re-using the *architecture* of the model, you are also re-using its weights.
```python
x=Input(shape=(784,))
# this works, and returns the 10-way softmax we defined above.
y=model(x)
```
This can allow, for instance, to quickly create models that can process *sequences* of inputs. You could turn an image classification model into a video classification model, in just one line.
```python
fromkeras.layersimportTimeDistributed
# input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences=Input(shape=(20,784))
# this applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
Here's a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams.
Let's consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out=LSTM(32)(x)
```
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model.
Another good use for the functional API are models that use shared layers. Let's take a look at shared layers.
Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from the same person or not (this can allow us to compare users by the similarity of their tweets, for instance).
One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the vectors and adds a logistic regression of top, outputting a probability that the two tweets share the same author. The model would then be trained on positive tweet pairs and negative tweet pairs.
Because the problem is symmetric, the mechanism that encodes the first tweet should be reused (weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.
Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape `(140, 256)`, i.e. a sequence of 140 vectors of size 256, where each dimension in the 256-dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent characters).
```python
fromkeras.layersimportInput,LSTM,Dense,merge
fromkeras.modelsimportModel
tweet_a=Input(shape=(140,256))
tweet_b=Input(shape=(140,256))
```
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs as you want:
Let's pause to take a look at how to read the shared layer's output or output shape.
-----
## The concept of layer "node"
Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer), and you are adding a "node" to the layer, linking the input tensor to the output tensor. When you are calling the same layer multiple times, that layer owns multiple nodes indexed as 0, 1, 2...
In previous versions of Keras, you could obtain the output tensor of a layer instance via `layer.get_output()`, or its output shape via `layer.output_shape`. You still can (except `get_output()` has been replaced by the property `output`). But what if a layer is connected to multiple inputs?
As long as a layer is only connected to one input, there is no confusion, and `.output` will return the one output of the layer:
```python
a=Input(shape=(140,256))
lstm=LSTM(32)
encoded_a=lstm(a)
assertlstm.output==encoded_a
```
Not so if the layer has multiple inputs:
```python
a=Input(shape=(140,256))
b=Input(shape=(140,256))
lstm=LSTM(32)
encoded_a=lstm(a)
encoded_b=lstm(b)
lstm.output
```
```
>> AssertionError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.
```
Okay then. The following works:
```python
assertlstm.get_output_at(0)==encoded_a
assertlstm.get_output_at(1)==encoded_b
```
Simple enough, right?
The same is true for the properties `input_shape` and `output_shape`: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of "layer output/input shape" is well defined, and that one shape will be returned by `layer.output_shape`/`layer.input_shape`. But if, for instance, you apply a same `Convolution2D` layer to an input of shape `(3, 32, 32)`, and then to an input of shape `(3, 64, 64)`, the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to:
```python
a=Input(shape=(3,32,32))
b=Input(shape=(3,64,64))
conv=Convolution2D(16,3,3,border_mode='same')
conved_a=conv(a)
# only one input so far, the following will work:
assertconv.input_shape==(None,3,32,32)
conved_b=conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assertconv.get_input_shape_at(0)==(None,3,32,32)
assertconv.get_input_shape_at(1)==(None,3,64,64)
```
-----
## More examples
Code examples are still the best way to get started, so here are a few more.
### Inception module
For more information about the Inception architecture, see [Going Deeper with Convolutions](http://arxiv.org/abs/1409.4842).
# the vision model will be shared, weights and all
out_a=vision_model(digit_a)
out_b=vision_model(digit_b)
concatenated=merge([out_a,out_b],mode='concat')
out=Dense(1,activation='sigmoid')(concatenated)
classification_model=Model([digit_a,digit_b],out)
```
### Visual question answering model
This model can select the correct one-word answer when asked a natural-language question about a picture.
It works by encoding the question into a vector, encoding the image into a vector, concatenating the two, and training on top a logistic regression over some vocabulary of potential answers.
# the next stage would be training this model on actual data.
```
### Video question answering model
Now that we have trained our image QA model, we can quickly turn it into a video QA model. With appropriate training, you will be able to show it a short video (e.g. 100-frame human action) and ask a natural language question about the video (e.g. "what sport is the boy playing?" -> "football").
```python
fromkeras.layersimportTimeDistributed
video_input=Input(shape=(100,3,224,224))
# this is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence=TimeDistributed(vision_model)(video_input)# the output will be a sequence of vectors
encoded_video=LSTM(256)(encoded_frame_sequence)# the output will be a vector
# this is a model-level representation of the question encoder, reusing the same weights as before:
The `Sequential` model is a linear stack of layers.
You can create a `Sequential` model by passing a list of layer instances to the constructor:
```python
fromkeras.modelsimportSequential
fromkeras.layersimportDense,Activation
model=Sequential([
Dense(32,input_dim=784),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
```
You can also simply add layers via the `.add()` method:
```python
model=Sequential()
model.add(Dense(32,input_dim=784))
model.add(Activation('relu'))
```
----
## Specifying the input shape
The model needs to know what input shape it should expect. For this reason, the first layer in a `Sequential` model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:
- pass an `input_shape` argument to the first layer. This is a shape tuple (a tuple of integers or `None` entries, where `None` indicates that any positive integer may be expected). In `input_shape`, the batch dimension is not included.
- pass instead a `batch_input_shape` argument, where the batch dimension is included. This is useful for specifying a fixed batch size (e.g. with stateful RNNs).
- some 2D layers, such as `Dense`, support the specification of their input shape via the argument `input_dim`, and some 3D temporal layers support the arguments `input_dim` and `input_length`.
As such, the following three snippets are strictly equivalent:
```python
model=Sequential()
model.add(Dense(32,input_shape=(784,)))
```
```python
model=Sequential()
model.add(Dense(32,batch_input_shape=(None,784)))
# note that batch dimension is "None" here,
# so the model will be able to process batches of any size.
Multiple `Sequential` instances can be merged into a single output via a `Merge` layer. The output is a layer that can be added as first layer in a new `Sequential` model. For instance, here's a model with two separate input branches getting merged:
Now you know enough to be able to define *almost* any model with Keras. For complex models that cannot be expressed via `Sequential` and `Merge`, you can use [the functional API](/getting-started/functional-api-guide).
----
## Compilation
Before training a model, you need to configure the learning process, which is done via the `compile` method. It receives three arguments:
- an optimizer. This could be the string identifier of an existing optimizer (such as `rmsprop` or `adagrad`), or an instance of the `Optimizer` class. See: [optimizers](/optimizers).
- a loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as `categorical_crossentropy` or `mse`), or it can be an objective function. See: [objectives](/objectives).
- a list of metrics. For any classification problem you will want to set this to `metrics=['accuracy']`. A metric could be the string identifier of an existing metric or a custom metric function. Custom metric function should return either a single tensor value or a dict `metric_name -> metric_value`. See: [metrics](/metrics).
```python
# for a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# for a binary classification problem
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# for a mean squared error regression problem
model.compile(optimizer='rmsprop',
loss='mse')
# for custom metrics
importkeras.backendasK
defmean_pred(y_true,y_pred):
returnK.mean(y_pred)
deffalse_rates(y_true,y_pred):
false_neg=...
false_pos=...
return{
'false_neg':false_neg,
'false_pos':false_pos,
}
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy',mean_pred,false_rates])
```
----
## Training
Keras models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the `fit` function. [Read its documentation here](/models/sequential).
```python
# for a single-input model with 2 classes (binary):
# Keras: Deep Learning library for Theano and TensorFlow
## You have just found Keras.
Keras is a high-level neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research.*
Use Keras if you need a deep learning library that:
- Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- Supports both convolutional networks and recurrent networks, as well as combinations of the two.
- Supports arbitrary connectivity schemes (including multi-input and multi-output training).
- Runs seamlessly on CPU and GPU.
Read the documentation at [Keras.io](http://keras.io).
Keras is compatible with: __Python 2.7-3.5__.
------------------
## Guiding principles
- __Modularity.__ A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
- __Minimalism.__ Each module should be kept short and simple. Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- __Easy extensibility.__ New modules are dead simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
- __Work with Python__. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
------------------
## Getting started: 30 seconds to Keras
The core data structure of Keras is a __model__, a way to organize layers. The main type of model is the [`Sequential`](http://keras.io/getting-started/sequential-model-guide) model, a linear stack of layers. For more complex architectures, you should use the [Keras functional API](http://keras.io/getting-started/functional-api-guide).
Here's the `Sequential` model:
```python
fromkeras.modelsimportSequential
model=Sequential()
```
Stacking layers is as easy as `.add()`:
```python
fromkeras.layersimportDense,Activation
model.add(Dense(output_dim=64,input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))
```
Once your model looks good, configure its learning process with `.compile()`:
If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Building a question answering system, an image classification model, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?
For a more in-depth tutorial about Keras, you can check out:
- [Getting started with the Sequential model](http://keras.io/getting-started/sequential-model-guide)
- [Getting started with the functional API](http://keras.io/getting-started/functional-api-guide)
In the [examples folder](https://github.com/fchollet/keras/tree/master/examples) of the repository, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, etc.
------------------
## Installation
Keras uses the following dependencies:
- numpy, scipy
- pyyaml
- HDF5 and h5py (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: cuDNN.
To install Keras, `cd` to the Keras folder and run the install command:
```sh
sudo python setup.py install
```
You can also install Keras from PyPI:
```sh
sudo pip install keras
```
------------------
## Switching from TensorFlow to Theano
By default, Keras will use TensorFlow as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
------------------
## Support
You can ask questions and join the development discussion:
- On the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
- On the [Keras Gitter channel](https://gitter.im/Keras-io/Lobby).
You can also post bug reports and feature requests in [Github issues](https://github.com/fchollet/keras/issues). Make sure to read [our guidelines](https://github.com/fchollet/keras/blob/master/CONTRIBUTING.md) first.
------------------
## Why this name, Keras?
Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
>_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
If a layer has a single node (i.e. if it isn't a shared layer), you can get its input tensor, output tensor, input shape and output shape via:
-`layer.input`
-`layer.output`
-`layer.input_shape`
-`layer.output_shape`
If the layer has multiple nodes (see: [the concept of layer node and shared layers](/getting-started/functional-api-guide/#the-concept-of-layer-node)), you can use the following methods:
For simple, stateless custom operations, you are probably better off using `layers.core.Lambda` layers. But for any custom operation that has trainable weights, you should implement your own layer.
Here is the skeleton of a Keras layer. There are only three methods you need to implement:
-`build(input_shape)`: this is where you will define your weights. Trainable weights should be added to the list `self.trainable_weights`. Other attributes of note are: `self.non_trainable_weights` (list) and `self.updates` (list of update tuples (tensor, new_tensor)). For an example of how to use `non_trainable_weights` and `updates`, see the code for the `BatchNormalization` layer. This method must set `self.built = True`, which can be done by calling `super([Layer], self).build()`.
-`call(x)`: this is where the layer's logic lives. Unless you want your layer to support masking, you only have to care about the first argument passed to `call`: the input tensor.
-`get_output_shape_for(input_shape)`: in case your layer modifies the shape of its input, you should specify here the shape transformation logic. This allows Keras to do automatic shape inference.
A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the `metrics` parameter when a model is compiled.
A metric function is similar to an [objective function](/objectives), except that the results from evaluating a metric are not used when training the model.
You can either pass the name of an existing metric, or pass a Theano/TensorFlow symbolic function (see [Custom metrics](#custom-metrics)).
There are two types of models available in Keras: [the Sequential model](/models/sequential) and [the Model class used with functional API](/models/model).
These models have a number of methods in common:
-`model.summary()`: prints a summary representation of your model.
-`model.get_config()`: returns a dictionary containing the configuration of the model. The model can be reinstantiated from its config via:
```python
config=model.get_config()
model=Model.from_config(config)
# or, for Sequential:
model=Sequential.from_config(config)
```
-`model.get_weights()`: returns a list of all weight tensors in the model, as Numpy arrays.
-`model.set_weights(weights)`: sets the values of the weights of the model, from a list of Numpy arrays. The arrays in the list should have the same shape as those returned by `get_weights()`.
-`model.to_json()`: returns a representation of the model as a JSON string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the JSON string via:
```python
frommodelsimportmodel_from_json
json_string=model.to_json()
model=model_from_json(json_string)
```
-`model.to_yaml()`: returns a representation of the model as a YAML string. Note that the representation does not include the weights, only the architecture. You can reinstantiate the same model (with reinitialized weights) from the YAML string via:
```python
frommodelsimportmodel_from_yaml
yaml_string=model.to_yaml()
model=model_from_yaml(yaml_string)
```
-`model.save_weights(filepath)`: saves the weights of the model as a HDF5 file.
- `model.load_weights(filepath, by_name=False)`: loads the weights of the model from a HDF5 file (created by `save_weights`). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use `by_name=True` to load only those layers with the same name.
You can either pass the name of an existing objective, or pass a Theano/TensorFlow symbolic function that returns a scalar for each data-point and takes the following two arguments:
- __y_pred__: Predictions. Theano/TensorFlow tensor of the same shape as y_true.
The actual optimized objective is the mean of the output array across all datapoints.
For a few examples of such functions, check out the [objectives source](https://github.com/fchollet/keras/blob/master/keras/objectives.py).
## Available objectives
- __mean_squared_error__ / __mse__
- __mean_absolute_error__ / __mae__
- __mean_absolute_percentage_error__ / __mape__
- __mean_squared_logarithmic_error__ / __msle__
- __squared_hinge__
- __hinge__
- __binary_crossentropy__: Also known as logloss.
- __categorical_crossentropy__: Also known as multiclass logloss. __Note__: using this objective requires that your labels are binary arrays of shape `(nb_samples, nb_classes)`.
- __sparse_categorical_crossentropy__: As above but accepts sparse labels. __Note__: this objective still requires that your labels have the same number of dimensions as your outputs; you may need to add a length-1 dimension to the shape of your labels, e.g with `np.expand_dims(y, -1)`.
- __kullback_leibler_divergence__ / __kld__: Information gain from a predicted probability distribution Q to a true probability distribution P. Gives a measure of difference between both distributions.
- __poisson__: Mean of `(predictions - targets * log(predictions))`
- __cosine_proximity__: The opposite (negative) of the mean cosine proximity between predictions and targets.
**Note**: when using the `categorical_crossentropy` objective, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample). In order to convert *integer targets* into *categorical targets*, you can use the Keras utility `to_categorical`:
You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
```python
# pass optimizer by name: default parameters will be used
- __rotation_range__: Int. Degree range for random rotations.
- __width_shift_range__: Float (fraction of total width). Range for random horizontal shifts.
- __height_shift_range__: Float (fraction of total height). Range for random vertical shifts.
- __shear_range__: Float. Shear Intensity (Shear angle in counter-clockwise direction as radians)
- __zoom_range__: Float or [lower, upper]. Range for random zoom. If a float, `[lower, upper] = [1-zoom_range, 1+zoom_range]`.
- __channel_shift_range__: Float. Range for random channel shifts.
- __fill_mode__: One of {"constant", "nearest", "reflect" or "wrap"}. Points outside the boundaries of the input are filled according to the given mode.
- __cval__: Float or Int. Value used for points outside the boundaries when `fill_mode = "constant"`.
- __rescale__: rescaling factor. Defaults to None. If None or 0, no rescaling is applied,
otherwise we multiply the data by the value provided (before applying
any other transformation).
- __dim_ordering__: One of {"th", "tf"}.
"tf" mode means that the images should have shape `(samples, width, height, channels)`,
"th" mode means that the images should have shape `(samples, channels, width, height)`.
It defaults to the `image_dim_ordering` value found in your
Keras config file at `~/.keras/keras.json`.
If you never set it, then it will be "tf".
- __Methods__:
- __fit(X)__: Compute the internal data stats related to the data-dependent transformations, based on an array of sample data.
Only required if featurewise_center or featurewise_std_normalization or zca_whitening.
- __Arguments__:
- __X__: sample data.
- __augment__: Boolean (default: False). Whether to fit on randomly augmented samples.
- __rounds__: int (default: 1). If augment, how many augmentation passes over the data to use.
- __seed__: int (default: None). Random seed.
- __flow(X, y)__: Takes numpy data & label arrays, and generates batches of augmented/normalized data. Yields batches indefinitely, in an infinite loop.
- __Arguments__:
- __X__: data.
- __y__: labels.
- __batch_size__: int (default: 32).
- __shuffle__: boolean (defaut: True).
- __seed__: int (default: None).
- __save_to_dir__: None or str (default: None). This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str (default: `''`). Prefix to use for filenames of saved pictures (only relevant if `save_to_dir` is set).
- __save_format__: one of "png", "jpeg" (only relevant if `save_to_dir` is set). Default: "jpeg".
- __yields__: Tuples of `(x, y)` where `x` is a numpy array of image data and `y` is a numpy array of corresponding labels.
The generator loops indefinitely.
- __flow_from_directory(directory)__: Takes the path to a directory, and generates batches of augmented/normalized data. Yields batches indefinitely, in an infinite loop.
- __Arguments__:
- __directory__: path to the target directory. It should contain one subdirectory per class,
and the subdirectories should contain PNG or JPG images. See [this script](https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d) for more details.
- __target_size__: tuple of integers, default: `(256, 256)`. The dimensions to which all images found will be resized.
- __color_mode__: one of "grayscale", "rbg". Default: "rgb". Whether the images will be converted to have 1 or 3 color channels.
- __classes__: optional list of class subdirectories (e.g. `['dogs', 'cats']`). Default: None. If not provided, the list of classes will be automatically inferred (and the order of the classes, which will map to the label indices, will be alphanumeric).
- __class_mode__: one of "categorical", "binary", "sparse" or None. Default: "categorical". Determines the type of label arrays that are returned: "categorical" will be 2D one-hot encoded labels, "binary" will be 1D binary labels, "sparse" will be 1D integer labels. If None, no labels are returned (the generator will only yield batches of image data, which is useful to use `model.predict_generator()`, `model.evaluate_generator()`, etc.).
- __batch_size__: size of the batches of data (default: 32).
- __shuffle__: whether to shuffle the data (default: True)
- __seed__: optional random seed for shuffling and transformations.
- __save_to_dir__: None or str (default: None). This allows you to optimally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).
- __save_prefix__: str. Prefix to use for filenames of saved pictures (only relevant if `save_to_dir` is set).
- __save_format__: one of "png", "jpeg" (only relevant if `save_to_dir` is set). Default: "jpeg".
Transform a list of `nb_samples sequences` (lists of scalars) into a 2D numpy array of shape `(nb_samples, nb_timesteps)`. `nb_timesteps` is either the `maxlen` argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than `nb_timesteps` are padded with zeros at the end.
Transform a list of `nb_samples sequences` (lists of scalars) into a 2D Numpy array of shape `(nb_samples, nb_timesteps)`. `nb_timesteps` is either the `maxlen` argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than `nb_timesteps` are padded with zeros at the end.
- __Return__: 2D numpy array of shape `(nb_samples, nb_timesteps)`.
- __Return__: 2D Numpy array of shape `(nb_samples, nb_timesteps)`.
- __Arguments__:
- __sequences__: List of lists of int or float.
- __maxlen__: None or int. Maximum sequence length, longer sequences are truncated and shorter sequences are padded with zeros at the end.
- __dtype__: datatype of the numpy array returned.
- __dtype__: datatype of the Numpy array returned.
- __padding__: 'pre' or 'post', pad either before or after each sequence.
- __truncating__: 'pre' or 'post', remove values from sequences larger than maxlen either in the beginning or in the end of the sequence
- __value__: float, value to pad the sequences to the desired value.
- `couples` is a list of 2-elements lists of int: `[word_index, other_word_index]`.
- __Return__: tuple `(couples, labels)`.
- `couples` is a list of 2-elements lists of int: `[word_index, other_word_index]`.
- `labels` is a list of 0 and 1, where 1 indicates that `other_word_index` was found in the same window as `word_index`, and 0 indicates that `other_word_index` was random.
- if categorical is set to True, the labels are categorical, ie. 1 becomes [0,1], and 0 becomes [1, 0].
- __negative_samples__: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
- __shuffle__: boolean. Whether to shuffle the samples.
- __categorical__: boolean. Whether to make the returned labels categorical.
- __sampling_table__: numpy array of shape `(vocabulary_size,)` where `sampling_table[i]` is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).
- __sampling_table__: Numpy array of shape `(vocabulary_size,)` where `sampling_table[i]` is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).
Used for generating the `sampling_table` argument for `skipgrams`. `sampling_table[i]` is the probability of sampling the word i-th most common word in a dataset (more common words should be sampled less frequently, for balance).
Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers `Dense`, `TimeDistributedDense`, `MaxoutDense`, `Convolution1D` and `Convolution2D` have a unified API.
These layers expose 3 keyword arguments:
-`W_regularizer`: instance of `keras.regularizers.WeightRegularizer`
-`b_regularizer`: instance of `keras.regularizers.WeightRegularizer`
-`activity_regularizer`: instance of `keras.regularizers.ActivityRegularizer`
You can use `Sequential` Keras models (single-input only) as part of your Scikit-Learn workflow via the wrappers found at `keras.wrappers.scikit_learn.py`.
There are two wrappers available:
`keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)`, which implements the Scikit-Learn classifier interface,
`keras.wrappers.scikit_learn.KerasRegressor(build_fn=None, **sk_params)`, which implements the Scikit-Learn regressor interface.
### Arguments
- __build_fn__: callable function or class instance
- __sk_params__: model parameters & fitting parameters
`build_fn` should construct, compile and return a Keras model, which
will then be used to fit/predict. One of the following
three values could be passed to build_fn:
1. A function
2. An instance of a class that implements the __call__ method
3. None. This means you implement a class that inherits from either
`KerasClassifier` or `KerasRegressor`. The __call__ method of the
present class will then be treated as the default build_fn.
`sk_params` takes both model parameters and fitting parameters. Legal model
parameters are the arguments of `build_fn`. Note that like all other
estimators in scikit-learn, 'build_fn' should provide default values for
its arguments, so that you could create the estimator without passing any
values to `sk_params`.
`sk_params` could also accept parameters for calling `fit`, `predict`,
`predict_proba`, and `score` methods (e.g., `nb_epoch`, `batch_size`).
fitting (predicting) parameters are selected in the following order:
1. Values passed to the dictionary arguments of
`fit`, `predict`, `predict_proba`, and `score` methods
2. Values passed to `sk_params`
3. The default values of the `keras.models.Sequential`
`fit`, `predict`, `predict_proba` and `score` methods
When using scikit-learn's `grid_search` API, legal tunable parameters are
those you could pass to `sk_params`, including fitting parameters.
In other words, you could use `grid_search` to search for the best
`batch_size` or `nb_epoch` as well as the model parameters.
Trains a Hierarchical RNN (HRNN) to classify MNIST digits.
[mnist_irnn.py](mnist_irnn.py)
Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in "A Simple Way to Initialize Recurrent Networks of Rectified Linear Units" by Le et al.
[mnist_mlp.py](mnist_mlp.py)
Trains a simple deep multi-layer perceptron on the MNIST dataset.
[mnist_net2net.py](mnist_net2net.py)
Reproduction of the Net2Net experiment with MNIST in "Net2Net: Accelerating Learning via Knowledge Transfer".
[mnist_siamese_graph.py](mnist_siamese_graph.py)
Trains a Siamese multi-layer perceptron on pairs of digits from the MNIST dataset.
Loads pre-trained word embeddings (GloVe embeddings) into a frozen Keras Embedding layer, and uses it to train a text classification model on the 20 Newsgroup dataset.
[reuters_mlp.py](reuters_mlp.py)
Trains and evaluate a simple MLP on the Reuters newswire topic classification task.
[stateful_lstm.py](stateful_lstm.py)
Demonstrates how to use stateful RNNs to model long sequences efficiently.
Alguns arquivos não foram exibidos porque demasiados arquivos foram alterados neste diff
Mostrar Mais
Referência em uma Nova Issue
Bloquear um usuário
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.