Welcome to cottoncandy! ======================= *sugar for s3* https://github.com/gallantlab/cottoncandy What is cottoncandy? -------------------- A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session. This library relies heavily on boto3 (https://github.com/boto/boto3) Installation ------------ Clone the repo from GitHub and do the usual python install :: git clone https://github.com/gallantlab/cottoncandy.git cd cottoncandy sudo python setup.py install or install from PyPI :: pip install cottoncandy A configuration file will be saved under ``~/.config/cottoncandy/options.cfg``. Upon installation cottoncandy will try to find your AWS keys and store them in this file. See the `default file `_ for more configuration options. Object and bucket permissions are set to ``authenticated-read`` by default. If you wish to keep all your objects private, modify the configuration file and set ``default_acl = private``. See `AWS ACL overview `_ for more information on S3 permissions. Getting started --------------- Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead):: >>> import cottoncandy as cc >>> cci = cc.get_interface('my_bucket', ACCESS_KEY='FAKEACCESSKEYTEXT', SECRET_KEY='FAKESECRETKEYTEXT', endpoint_url='https://s3.amazonaws.com') Storing numpy arrays ******************** :: >>> import numpy as np >>> arr = np.random.randn(100) >>> s3_response = cci.upload_raw_array('myarray', arr) >>> arr_down = cci.download_raw_array('myarray') >>> assert np.allclose(arr, arr_down) Storing dask arrays ******************* :: >>> arr = np.random.randn(100,600,1000) >>> s3_response = cci.upload_dask_array('test_dim', arr, axis=-1) >>> dask_object = cci.download_dask_array('test_dim') >>> dask_object dask.array >>> dask_slice = dask_object[..., :200] >>> dask_slice dask.array >>> downloaded_data = np.asarray(dask_slice) # this downloads the array >>> downloaded_data.shape (100, 600, 200) Command-line search ******************* :: >>> cci.glob('/path/to/*/file01*.grp/image_data') ['/path/to/my/file01a.grp/image_data', '/path/to/my/file01b.grp/image_data', '/path/to/your/file01a.grp/image_data', '/path/to/your/file01b.grp/image_data'] >>> cci.glob('/path/to/my/file02*.grp/*') ['/path/to/my/file02a.grp/image_data', '/path/to/my/file02a.grp/text_data', '/path/to/my/file02b.grp/image_data', '/path/to/my/file02b.grp/text_data'] Filesystem-like object browsing ******************************* :: >>> import cottoncandy as cc >>> browser = cc.get_browser('my_bucket_name', ACCESS_KEY='FAKEACCESSKEYTEXT', SECRET_KEY='FAKESECRETKEYTEXT', endpoint_url='https://s3.amazonaws.com') >>> browser.sweet_project.sub browser.sweet_project.sub01_awesome_analysis_DOT_grp browser.sweet_project.sub02_awesome_analysis_DOT_grp >>> browser.sweet_project.sub01_awesome_analysis_DOT_grp (sub01_awesome_analysis.grp: 3 keys)> >>> browser.sweet_project.sub01_awesome_analysis_DOT_grp.result_model01 Learn more: ----------- .. toctree:: :maxdepth: 2 api/index.rst .. only:: html :Release: |version| :Date: |today| .. only:: html * :ref:`genindex` * :ref:`modindex` * :ref:`search`