URI strings¶
Blaze uses strings to specify data resources. This is purely for ease of use.
Example¶
Interact with a set of CSV files or a SQL database
>>> from blaze import *
>>> from blaze.utils import example
>>> t = data(example('accounts_*.csv'))
>>> t.peek()
id name amount
0 1 Alice 100
1 2 Bob 200
2 3 Charlie 300
3 4 Dan 400
4 5 Edith 500
>>> t = data('sqlite:///%s::iris' % example('iris.db'))
>>> t.peek()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
...
Migrate CSV files into a SQL database
>>> from odo import odo
>>> odo(example('iris.csv'), 'sqlite:///myfile.db::iris')
Table('iris', MetaData(bind=Engine(sqlite:///myfile.db)), ...)
What sorts of URIs does Blaze support?¶
- Paths to files on disk, including the following extensions
.csv
.json
.csv.gz/json.gz
.hdf5
(usesh5py
).hdf5::/datapath
hdfstore://filename.hdf5
(uses specialpandas.HDFStore
format).bcolz
.xls(x)
- SQLAlchemy strings like the following
sqlite:////absolute/path/to/myfile.db::tablename
sqlite:////absolute/path/to/myfile.db
(specify a particular table)postgresql://username:password@hostname:port
impala://hostname
(usesimpyla
)- anything supported by SQLAlchemy
- MongoDB Connection strings of the following form
mongodb://username:password@hostname:port/database_name::collection_name
- Blaze server strings of the following form
blaze://hostname:port
(port defaults to 6363)
In all cases when a location or table name is required in addition to the traditional URI (e.g. a data path within an HDF5 file or a Table/Collection name within a database) then that information follows on the end of the URI after a separator of two colons ::
.
How it works¶
Blaze depends on the Odo library to handle URIs.
URIs are managed through the resource
function which is dispatched based on regular expressions. For example a simple resource function to handle .json
files might look like the following (although Blaze’s actual solution is a bit more comprehensive):
from blaze import resource
import json
@resource.register('.+\.json')
def resource_json(uri):
with open(uri):
data = json.load(uri)
return data
Can I extend this to my own types?¶
Absolutely. Import and extend resource
as shown in the “How it works” section. The rest of Blaze will pick up your change automatically.