"""Quick and Dirty, a high level file access wrapper.
The QnD interface looks like this::
f = format_specific_open(filename, mode) # e.g. openh5
var = f.varname # read var
var = f['varname']
var = f.get('varname', default)
f.var = var_value # declare and write var
f['var'] = something
f.var = dtype, shape # declare var without writing
f.update({...vars...}, another_var=value, ...)
f.grpname = {...vars...} # declare a subgroup and some members
if name in f: do_something
varnames = list(f)
for name in f: do_something
for name, var in f.items(): do_something
g = f.grpname
f = g.root() # Use the root method to get the top-level QGroup.
f.close() # important if you have written to f
Generally, a QnD QGroup like `f` in the example behaves like a dict.
However, you may also reference variables or subgroups as if they were
attributes. Use attributes to access variables when you know the
variable name. In short, use square brackets when the variable name
is the value of an expression. (QnD will remove a single trailing
underscore from any attribute reference, so you can use ``f.yield_``
for ``f['yield']`` or ``f.items_`` for ``f['items']``.) The adict
module has an `ADict` class and a `redict` function to produce ordinary
in-memory dict objects with their items accessible as attributes with
the same rules. You can read a whole file (or a whole subgroup) like
this::
ff = f(2)
The optional `2` argument is the auto-read mode flag. By default, the
auto-read mode flag is set to 1, which causes ``f.varname`` to read an
array variable and return its value, but to simply return a QGroup
object (like `f`) if the name refers to a subgroup. When the `auto`
flag equals `2`, any subgroups are read recursively, and their values
become ADict instances. (QnD also supports QList variables, and
``auto=2`` mode returns those as python list instances.)
The ``items()`` method also accepts an optional `auto` argument to
temporarily change auto-read mode used for the iteration.
You can turn auto-read mode off by setting the `auto` flag to `0`. In
this mode, referencing a variable returns a QLeaf instance without
reading it. This enables you to query a variable without reading it.
You can also do that by retrieving the attributes object::
with f.push(): # Use f as a context manager to temporarily change modes.
f.auto(0) # Turn off auto-read mode.
v = f.varname
value = v() # Read a QLeaf by calling it...
value = v[:] # ...or by indexing it.
v(value) # Write a QLeaf by calling it with an argument...
v[:] = value # ...or by setting a slice.
v.dtype, v.shape, v.size, v.ndim # properties of the QLeaf v
# An alternate method which pays no attention to auto mode:
va = f.attrs.varname # Get attributes of varname.
va.dtype, va.shape, va.size, va.ndim # Built-in pseudo-attributes.
# You can use va to get or set real attributes of varname as well:
units = va.units # retrieve units attribute
va.centering = 1 # set centering attribute
When you call a QGroup like `f` as a function, you may also pass it a
list of variable names to read only that subset of variables. With
auto-read mode turned off, this results in a sort of "casual subgroup"::
g = f(0, 'vname1', 'vname2', ...)
h = f(1, 'vname1', 'vname2', ...)
ff = f(2, 'vname1', 'vname2', ...)
Here, g is an ADict containing QLeaf and QGroup objects, with nothing at
all read from the file, while h is and ADict containing ndarray and QGroup
objects, while ff is an ADict containing ndarray and ADict objects, with
no references at all to `f`.
If you want to use `f` as a context manager in the manner of other
python file handles, so that the file is closed when you exit the with
statement, just do it::
with openh5(filename, "a") as f:
do_something(f)
# f has been properly flushed and closed on exit from the with.
------
QnD also supports old netCDF style UNLIMITED dimensions, and their
equivalents in HDF5. Unlike the netCDF or HDF5 interface, in QnD the
first (slowest varying) dimension of these arrays maps to a python
list, so we regard the entire collected variable as a list of
ndarrays. The netCDF record number is the index into the list, while
any faster varying dimensions are real ndarray dimensions. This
subtle difference in approach is more consistent with the way these
variables are stored, and also generalizes to the fairly common case
that the array dimensions -- often mesh dimensions -- change from one
record to the next.
To write records using QnD, turn on "recording mode"::
f.recording(1) # 0 for off, 2 for generalized records
f.time = 0.
f.x = x = arange(10)
f.time = 0.5
f.x = x**2
Ordinarily, when you set the value of ``f.time`` or ``f.x``, any
previous value will be overwritten. But in recording mode, each time
you write a variable, you create a new record, saving the new value
without overwriting the previous value. If you want all record
variables to have the same number of records, you need to be sure
you write them each the same number of times. One way to do that is
to use the update function rather than setting them one at a time::
record = ADict()
record.time, record.x = 0., arange(10)
f.recording(1)
f.update(record)
record.time, record.x = 0.5, record.x**2
f.update(record)
You cannot change a variable from not having records to having records
(or from recording mode 1 to recording mode 2); the recording mode in
force when a variable was first declared determines if and how all
future write operations behave.
Reading back record variables introduces "goto mode". Initially, goto
mode is off or None, so that reading a record variable gets the whole
collection of values as a QList, or as an ordinary python list if
auto mode is on::
f.goto(None) # explicitly turn off goto mode
f.auto(2)
times = f.time # python list of f.time values
xs = f.x # python list of f.x arrays
f.auto(0)
time = f.time # QList for the collection of time values
nrecords = len(time)
On the other hand, with goto mode turned on, the fact that `time` and `x`
are record variables disappears, so that your view of ``f.time`` and
``f.x`` matches what it was when you recorded them. You use the goto
function to set the record::
f.goto(0) # first record is 0, like any python list
t = f.time # == 0.
f.goto(1) # set to second record
t = f.time # == 0.5
x = f.x # == arange(10)**2
f.goto(-1) # final record, negative index works like any python list
# You can also pass a keyword to goto, which can be the name of any
# scalar record variable, to go to the record nearest that value.
f.goto(time=0.1) # will select record 0 here
current_record = f.goto() # goto() returns current record number
for r in f.gotoit(): do_something # f.goto(r) is set automatically
Note the ``gotoit()`` method returns an iterator over all records,
yielding the record number for each pass, and setting the goto record
for each pass automatically. You can use ``f.push()`` in a with
statement to temporarily move to a different record.
If you set the recording mode to `2`, the record variables need not
have the same shape or same type from one record to the next (indeed,
they can be a subgroup on one record and an array on another). This
cannot be represented as an UNLIMITED array dimension in an HDF5 or
netCDF file, so the QList variable in QnD will become an HDF5 group in
this case, where variable names in the group are _0, _1, _2, and so on
for QList element 0, 1, 2, and so on (plus a hidden element _ which
identifies this group as a list when it is empty). You can create a
QList of this general type without using recording or goto mode at
all::
f.recording(0) # Turn off recording and goto modes.
f.goto(None)
f.varname = list # Make an empty QList
ql = f.varname
ql.append(value0)
ql.extend([value1, value2, ...])
var = ql[1] # retrieves value1
nelements = len(ql) # current number of elements (also works for QGroup)
ql.auto(0) # a QList has auto mode just like a QGroup
for var in ql: do_something # var depends on ql auto mode setting
------
"""
from __future__ import absolute_import
# The three kinds of objects we support here are:
# 1. QGroup --> dict with str keys
# 2. QList --> list
# 3. QLeaf --> ndarray with dtype.kind in buifcS, with U encoded as S
# and V handled as recarray.
# We attempt to store arbitrary python objects as a QGroup with member
# __class__ = 'modulename.classname' (or just classname for builtin)
# remainder of QGroup is the instance __dict__, unless __class__ has a
# __setstate__, in which case argument to that method stored in the
# __setstate__ variable.
# If __class__ has a __getnewargs__, result is written sub-QGroup with
# _0, _1, ..., which will be passed to the class constructor -- otherwise
# the class starts empty and neither __new__ nor __init__ is called.
# List or tuple objects not distinguished, becoming QList items.
# Dict objects with non-text keys stored with __class__ = 'dict' and
# members _0, _1, _2, etc., where even item is key and following odd item
# is corresponding value.
# We ask the backend to support the variable value None as a QLeaf in
# addition to the arrays, if possible. Arrays with zero length dimensions
# should also be supported if possible.
#
# This qnd module also provides a low level QnDList implementation of the
# QList in terms of the backend QGroup (recording=2) and QLeaf (recording=1)
# implementations, for backends which do not support a native list type.
# The convention is that a generic QList is a QGroup with a blank member _
# and members _0, _1, _2, etc. If the backend supports QLeaf arrays with an
# UNLIMITED leading dimension, these can also be presented as QList
# variables by the QnD API.
# Backend object methods assumed here:
# qnd_group methods: close(), flush(), root()
# isgroup() -> 1, islist() -> 0, isleaf() -> 0
# __len__, __iter__ returns names
# lookup(name) -> None if not found
# declare(name, dtype, shape, unlim) dtype can be dict, list, or None
# attget(vname) --> variable attributes, vname='' for group attributes
# attset(vname, aname, dtype, shape, value) --> variable attributes
# qnd_list methods: root()
# isgroup() -> 0, isleaf() -> 0
# islist() -> 1 if this is UNLIMITED dimension, -> 2 if anonymous group
# __len__, __iter__ returns unread elements
# index(i) -> None if i out of range
# declare(dtype, shape) dtype can be dict, list, or None
# qnd_leaf methods: root()
# isgroup() -> 0, islist() -> 0, isleaf() -> 1
# query() -> dtype, shape, sshape (None, (), () for None)
# read(args)
# write(value, args)
import sys
from weakref import proxy, ProxyTypes
from importlib import import_module
import re
# Major change in array(x) function semantics when x is a list of ragged
# arrays: This now generates a DeprecationWarning as of 1.19, and
# presumably an exception for some future numpy (beyond 1.21). See the
# _categorize function below for the workaround.
from warnings import catch_warnings, simplefilter
from numpy import VisibleDeprecationWarning
from numpy import (dtype, asfarray, asanyarray, arange, interp, where, prod,
ndarray)
from numpy.core.defchararray import encode as npencode, decode as npdecode
from .adict import ItemsAreAttrs, ADict
PY2 = sys.version_info < (3,)
if PY2:
range = xrange
else:
basestring = str
_NOT_PRESENT_ = object()
_us_digits = re.compile(r"^_\d*$")
[docs]class QGroup(ItemsAreAttrs):
"""Group of subgroups, lists, and ndarrays.
You reference QGroup items by name, either as ``qg['name']`` like a
dict item, or equivalently as ``qg.name`` like an object attribute.
Use ``[]`` when the item name is an expression or the contents of
a variable; use ``.`` when you know the name of the item. You can
use ``[]`` or ``.`` to both get and set items in the QGroup. To
read the entire group into a ADict, call it like a function, ``qg()``;
you may supply a list of names to read only a subset of items.
A QGroup acts like a dict in many ways::
if 'name' in qg: do_something
for name in qg: do_something
item_names = list(qg) # qg.keys() exists but is never necessary
for name, item in qg.items(): do_something
qg.update({name0: val0, ...}, [(name1, val1), ...], name2=val2, ...)
value = qg.get('name', default)
A QGroup has several possible states or modes:
1. Recording mode, turned on by ``qg.recording(1)`` and off by
``qg.recording(0)``, affects what happens when you set group items.
With recording mode off, setting an item to an array creates the
item as an array if its name has not been used, or otherwise writes
its new value, requiring it be compatible with the dtype and shape
of the previous declaration. With recording mode on, setting an
item for the first time creates a QList and sets its first element
to the given value, and subsequently setting that item appends the
given value to the existing QList. There is also a recording mode
``qg.recording(2)`` in which subsequent values need not match the
dtype or shape of the first item. You may not switch recording
modes for a given item; the mode in effect when an item is first
created governs the behavior of that item.
2. Goto mode, in which you set a current record with ``qg.goto(rec)``.
Any item you retrieve or query which is a QList retrieves or queries
the element with 0-origin index ``rec`` instead of the whole QList.
You turn off goto mode with ``qg.goto(None)``. There is also a
``qg.gotoit()`` function which returns an iterator over all the
records (generally the longest QList in ``qg``).
3. Auto mode, turned on by ``qg.auto(1)`` and off by ``qg.auto(0)``,
in which getting any item reads and returns its value, rather than
a QLeaf object. There is also a ``qg.auto(2)`` mode in which
the auto-read feature applies to any QGroup or QList (if goto mode
is off) items recursively.
A QGroup has `push` and `drop` methods which can be used to save and
restore all its modes. The `drop` method is called implicitly upon
exit from a with statement, so you can use the QGroup as a context
manager::
with openh5('myfile.h5', 'a') as qg:
do_something(qg)
with qg.push():
qg.goto(rec)
do_something_else(qg)
# qg restored to goto mode state before with.
do_even_more(qg)
# qg flushed and closed upon exit from with clause that has no
# no corresponding push
Attributes
----------
islist
isleaf
Always 0.
isgroup
Always 1.
dtype
Always ``dict``, the builtin python type.
shape
ndim
size
sshape
Always None.
"""
__slots__ = "_qnd_group", "_qnd_state", "_qnd_cache", "__weakref__"
isgroup = 1
islist = isleaf = 0
dtype, shape, ndim, size, sshape = dict, None, None, None, None
def __init__(self, item=None, state=None, auto=None, recording=None,
goto=None):
object.__setattr__(self, "_qnd_group", item)
object.__setattr__(self, "_qnd_state",
QState() if state is None else QState(state))
object.__setattr__(self, "_qnd_cache", None)
state = self._qnd_state
if auto is not None:
state.auto = int(auto)
if recording is not None:
state.recording = int(recording)
if goto is not None:
state.goto = int(goto)
[docs] def recording(self, flag):
"""Change recording mode for this QGroup.
With recording mode off, writing to a variable overwrites that
variable. With recording mode on, new variables are declared as
a QList and subsequent write operations append a new element to
this QList instead of overwriting any previously stored values.
In netCDF parlance, variables declared in recording mode are
record variables. Writing to a variable declared when recording
mode was off will always overwrite it; once declared, you cannot
convert a variable to a QList simply by turning on recording mode.
See goto mode for handling record variable read operations.
A `flag` value of 0 turns off recording mode. A `flag` of 1 turns
on recording mode, utilizing a trailing UNLIMITED array dimension
in netCDF or HDF5 parlance, which promises that all values written
will have the same dtype and shape. A `flag` of 2 places no
restrictions on the dtype or shape of the QList elements; such
an unrestricted QList resembles an anonymous QGroup.
"""
self._qnd_state.recording = int(flag)
[docs] def goto(self, record=_NOT_PRESENT_, **kwargs):
"""Set the current record for this QGroup, or turn off goto mode.
Pass `record` of None to turn off goto mode, so that QList
variables appear as the whole QList. Setting an integer `record`
makes any QList variable appear to be the specified single
element. A `record` value may be negative, with the usual python
interpretation for a negative sequence index. If different QList
variables have different lengths, the current `record` may be
out of range for some variables but not for others. (Hence using
goto mode may be confusing in such situations.)
Note that you can temporarily set goto mode using a with clause.
This `goto` method also accepts a keyword argument instead of a
`record` number. The keyword name must match the name of a
QList variable in this QGroup, whose vaules are scalars. This
will set `record` to the record where that variable is nearest
the keyword value. Thus, ``goto(time=t)`` selects the record
nearest `time` t.
As a special case, you can get the current record number by calling
`goto` with neither a `record` nor a keyword::
current_record = qg.goto()
"""
if kwargs:
if record is not _NOT_PRESENT_:
raise TypeError("either use keyword or record index")
if len(kwargs) != 1:
raise TypeError("only one keyword argument accepted")
name, val = list(kwargs.items())[0]
records, values = self._qnd_goto_recs(name)
val = float(val)
n = values.size
if n > 1:
# result of interp is float scalar in older numpy versions
# rather than numpy.float64, cannot use astype
i = int(interp(val, values, arange(n) + 0.5))
else:
i = 0
record = records[min(i, n-1)]
elif record is _NOT_PRESENT_:
return self._qnd_state.goto
elif record is not None:
record = int(record)
self._qnd_state.goto = record
def _qnd_goto_recs(self, name):
cache = self._qnd_cache
values = cache.get(name) if cache else None
if values is None:
item = self._qnd_group.lookup(name)
if item is not None and item.islist():
with self.push():
self.goto(None)
self.auto(2)
values = self[name]
values = asfarray(values)
if values.ndim != 1 or values.size < 1:
values = None
if values is None:
raise TypeError("{} is not scalar record variable"
"".format(name))
values = _monotonize(values)
if not cache:
cache = {}
object.__setattr__(self, "_qnd_cache", cache)
cache[name] = values
return values # returned by _monotonize
[docs] def gotoit(self, name=None):
"""Iterate over goto records, yielding current record.
Optional `name` argument is the name of a `goto` method keyword,
which may implicitly remove records corresponding to non-monotonic
changes of that variable. If `name` is a decreasing variable,
the record order will be reversed.
As a side effect, the current record of this QGroup will be set
during each pass. If the loop completes, the original goto state
will be restored, but breaking out of the loop will leave the
goto record set.
"""
if name is not None:
records, _ = self._qnd_goto_recs(name)
else:
# scan through all variables to find largest recrod count
nrecords = 0
for name in self._qnd_group:
item = self._qnd_group.lookup(name)
if item.islist():
n = len(item)
if n > nrecords:
nrecords = n
records = arange(nrecords)
r0 = self._qnd_state.goto
for r in records:
self._qnd_state.goto = r
yield r
self._qnd_state.goto = r0
[docs] def auto(self, recurse):
"""Set the auto-read mode for this QGroup.
In auto-read mode, getting an item returns its value, rather than a
QLeaf. If the item is a QGroup or QList, that is returned if
the `recurse` value is 1, whereas if `recurse` is 2, the QGroup
or QList variables will be read recursively. Setting `recurse` to
0 turns off auto-read mode entirely.
Note that you can temporarily set auto mode using a with clause.
"""
self._qnd_state.auto = int(recurse)
[docs] def push(self):
"""Push current recording, goto, and auto mode onto state stack."""
self._qnd_state.push()
return self
[docs] def drop(self, nlevels=None, close=False):
"""Restore previous recording, goto, and auto mode settings.
Default ``drop()`` drops one pushed state, ``drop(n)`` drops n,
``drop('all')`` drops all pushed states. By default, `drop` is
a no-op if no pushed states to drop, ``drop(close=1)`` closes
the file if no pushed states to drop, which is called implicitly
on exit from a with suite.
"""
if nlevels is None:
nlevels = 1
elif nlevels == "all":
nlevels = len(self._qnd_state) - 3
while nlevels >= 0:
if self._qnd_state.drop() and close:
self.close()
nlevels -= 1
[docs] def close(self):
"""Close associated file."""
this = self._qnd_group
if this is not None:
for nm in ["_qnd_group", "_qnd_state", "_qnd_cache"]:
object.__setattr__(self, nm, None)
this.close()
[docs] def flush(self):
"""Flush associated file."""
this = self._qnd_group
if this is not None:
this.flush()
[docs] def root(self):
"""Return root QGroup for this item."""
qgroup = self._qnd_group
root = qgroup.root()
if root is qgroup:
return self
state = QState(self._qnd_state) # copy
return QGroup(root, state)
[docs] def attrs(self):
"""Return attribute tree for variables in this group."""
return QAttributes(self._qnd_group)
[docs] def get(self, key, default=None):
"""like dict.get method"""
try:
return self[key]
except KeyError:
return default
[docs] def items(self, auto=None):
"""like dict.items method (iteritems in python2)"""
if auto == self._qnd_state.auto:
auto = None
for name in self._qnd_group:
if auto is None:
value = self[name]
else:
with self.push():
self.auto(auto)
value = self[name]
yield name, value
def __repr__(self):
this = self._qnd_group
if this is not None:
return "<QGroup with {} items>".format(len(this))
else:
return "<closed QGroup>"
def __len__(self):
return len(self._qnd_group)
def __contains__(self, name):
return self._qnd_group.lookup(name) is not None
def __iter__(self):
return iter(self._qnd_group)
keys = __iter__
__enter__ = push
def __exit__(self, etype, evalue, etrace):
self.drop(close=1)
def __call__(self, auto=None, *args):
# Make qg() shorthand for qg[()], returning whole group.
if auto == self._qnd_state.auto:
auto = None
if auto is None:
value = self[args]
else:
with self.push():
self.auto(auto)
value = self[args]
return value
def __getitem__(self, key):
if not isinstance(key, tuple):
key = (key,)
if not key: # qg[()] retrieves entire group
key = (list(self._qnd_group),)
name, args = key[0], key[1:]
if isinstance(name, basestring):
if "/" in name:
if name.startswith("/"):
return self.root()[(name[1:],) + args]
name = name.split("/")
name, args = name[0], tuple(name[1:]) + args
else:
# qg[["name1", "name2", ...], slice0, ...]
# returns [qg.name1[slice0, ...], qg.name2[slice0, ...], ...]
items = []
for key in name:
if not isinstance(key, basestring):
# Prevent recursive name lists inside name lists.
raise KeyError("expecting item name or list of item names")
items.append((key, self[(key,)+args]))
return ADict(items)
item = self._qnd_group.lookup(name)
if item is None:
raise KeyError("no such item in QGroup as {}".format(name))
state = self._qnd_state
auto, recording = state.auto, state.recording
record = state.goto
if item.islist():
item = QList(item, auto)
if record is None:
if not args and auto <= 1:
return item
else:
args = (record,) + args
return item[args] if args else item[:]
if item.isleaf():
return _reader(item, args) if args or auto else QLeaf(item)
# Item must be a group, set up inherited part of state.
# Note that goto record was not used, so subgroup inherits it.
cls = item.lookup("__class__")
if cls is not None and auto:
return _load_object(item, cls)
item = QGroup(item, auto=auto, recording=recording, goto=record)
return item() if auto > 1 else item
def __setitem__(self, key, value):
name, args = (key[0], key[1:]) if isinstance(key, tuple) else (key, ())
if not isinstance(name, basestring):
name = "/".join(name)
if "/" in name:
if name.startswith("/"):
item = self.root()
name = name[1:]
else:
path, name = name.rsplit("/", 1)
with self.push():
self.auto = 0
item = self[path]
item[(name,) + args] = value
return
dtype, shape, value = _categorize(value)
state = self._qnd_state
recording, record = state.recording, state.goto
this = self._qnd_group
item = this.lookup(name)
if item is None:
# Declare item now.
if args:
raise KeyError("partial write during declaration of {}"
"".format(name))
if recording:
# numpy (1.16.4) misfeature dtype('f8') tests == None
# (other dtypes are != None as expected), so cannot
# ask if (list, dict, object, None) contains dtype
if recording != 1 or (dtype is None or
dtype in (list, dict, object)):
# Declare an anonymous-group-style list.
item = this.declare(name, list, None)
else: # Declare item with UNLIMITED dimension.
item = this.declare(name, dtype, shape, 1)
# item now an empty list
elif dtype == dict:
item = this.declare(name, dict, None)
if value:
QGroup(item).update(value)
return
elif dtype == list:
item = this.declare(name, list, None)
if value:
QList(item).extend(value)
return
elif dtype == object:
item = this.declare(name, dict, None)
_dump_object(item, value)
return
else:
item = this.declare(name, dtype, shape)
if value is None:
return
while item.islist():
if recording:
if args:
raise KeyError("partial write while recording {}"
"".format(name))
record = len(item) # index of next record
item = item.declare(dict if dtype == object else dtype,
shape) # declare the next item
state.goto = record
if dtype is None:
return
if dtype == list:
if value:
QList(item).extend(value)
return
if dtype == dict:
if value:
QGroup(item).update(value)
return
if dtype == object:
_dump_object(item, value)
return
break
if record is None:
if args:
record, args = args[0], args[1:]
elif dtype == list and not value:
# qg.lst = list is no-op for existing QList
return
else:
raise ValueError("cannot set existing QList {}, use "
"append, goto, or recording".format(name))
item = item.index(record)
if item is None:
raise KeyError("no such item in QList as {}".format(record))
record = None # makes no sense to use record recursively
recording = 0
if item.isgroup():
if args:
QGroup(item)[args] = value
return
if dtype == dict and not value:
# qg.grp = {} is no-op for existing QGroup
return
raise ValueError("cannot set existing QGroup {}, use update"
"".format(name))
# item is a leaf (neither a list nor a group)
if dtype in (dict, list, object):
raise TypeError("type mismatch in QLeaf {}".format(name))
elif item.query()[0] is None:
# None QLeaf objects need not support write() method.
if dtype is None and not args:
return
raise TypeError("QLeaf {} declared as None".format(name))
item.write(value, args)
def _monotonize(values):
# This function ensures values are monotonically increasing,
# searching backwards for decreasing sequences.
decreasing = values[-1] < values[0]
if decreasing:
values = -values
mask = values == values
vnext = values[-1]
for i in range(-2, -values.size-1, -1):
v = values[i]
if v >= vnext:
mask[i] = False
else:
vnext = v
records, values = where(mask)[0], values[mask]
if decreasing:
# Reverse both records and values so latter is strictly increasing.
# In this way, values can always be used as x in the interp function.
records = records[::-1]
values = -values[::-1]
return records, values
def _categorize(value, attrib=False):
# This function defines the various sorts of values QnD recognizes:
# 1. None dtype = shape = value = None
# 2. list [, seq] dtype = list, shape = None, value = [] or seq
# 3. {...} dtype = dict, shape = None, value = {...}
# 4. type|dtype [, shape] dtype = dtype(type), shape, value = None
# 5. array_like value.dtype, value.shape, value = asanyarray(...)
# 6. object or dtype('O') dtype = object, shape = None, value
if value is None:
dtype = shape = None
elif isinstance(value, (type, _dtype)):
if value == list:
dtype, shape, value = list, None, []
else:
dtype, shape, value = _dtype(value), (), None
elif isinstance(value, dict):
if all(isinstance(key, basestring) for key in value):
dtype = dict
else:
dtype = object
shape = None
elif (isinstance(value, tuple) and len(value) == 2+bool(attrib) and
isinstance(value[0], (type, _dtype))):
dtype = value[0]
if dtype is not None and dtype not in (list, dict, object):
dtype = _dtype(dtype) # no-op if already a dtype
if not attrib:
if dtype == list:
value, shape = value[1], None
else:
value, shape = None, tuple(value[1])
else:
shape, value = value[1:]
else:
# The array(a) constructor used to accept essentially any argument a.
# At numpy 1.19 it began issues a VisibleDeprecationWarning when a
# was a list whose items were of differing lengths (or shapes).
# Prior to that, it simply produced an ndarray of dtype object whose
# items were the python entities in the original list. This is the
# behavior we want in QnD, so we do not want to print a warning.
# Moreover, when the feature is eventually removed, this case will
# throw a (currently unknown) exception, which we need to avoid.
# Passing the dtype=object keyword to the array() constructor
# produces the pre-1.19 behavior (as far as I can tell), but of
# course we cannot do that here.
# The following code must work in three cases: (1) pre-1.19 numpy,
# (2) numpy 1.19-1.21 (at least) which print unwanted warnings without
# special treatment, and (3) future numpy which throws an error
# without the dtype=object keyword. Since QnD must always run in
# all three cases, there is no way to remove the protection against
# the deprecation wawrning, even when numpy move past it.
with catch_warnings():
# Make case 2 (numpy 1.19) behave like case 3 (future numpy)
simplefilter("error", VisibleDeprecationWarning)
try:
v = asanyarray(value)
except Exception:
# As far as I can tell, the original numpy array() constructor
# would accept any argument whatsoever, returning either a
# scalar or 1D array of type object if its argument could not
# be interpreted. Therefore I believe only a ragged array
# argument reaches this point, and we can return to the
# original behavior by specifying dtype explicitly.
# Nevertheless, we protect against a possible exception.
simplefilter("ignore", VisibleDeprecationWarning)
try:
v = asanyarray(value, dtype=object)
except Exception:
return object, None, value
dtype, shape = v.dtype, v.shape
if dtype.kind == "O":
if not shape:
dtype, shape = object, None
else:
# Note that this does not work as expected when the contents
# of the list were themselves lists (not ndarrays) of numbers
# of varying lengths, since the asanyarray function will not
# convert those inner lists to ndarrays. Hence v.tolist() is
# really the same as the original value here.
# A QnD user must ensure that the inner lists are ndarrays if
# that is what they intended.
dtype, shape, value = list, None, v.tolist()
else:
value = v
if isinstance(dtype, _dtype):
kind = dtype.kind
if kind == "U":
if value is not None:
value = npencode(value, "utf8") # convert to 'S'
dtype = value.dtype
elif kind == "O":
raise ValueError("numpy dtype.kind 'O' not supported")
return dtype, shape, value
def _reader(item, args):
value = item.read(args)
dtyp = getattr(value, "dtype", None)
if dtyp is not None:
kind = dtyp.kind
if kind == "V":
if dtyp.names:
# The recarray has some significant misfeatures. The worst
# is that it will not print (repr or str) if it is aligned,
# or simply if the itemsize does not match what it expects.
# value = value.view(recarray)
pass
elif kind in "SU":
if not PY2:
if dtyp.kind == "S":
try:
value = npdecode(value, "utf8")
except UnicodeDecodeError:
value = npdecode(value, "latin1")
if isinstance(value, ndarray) and not value.shape:
value = value[()]
return value
_dtype = dtype # to allow access in methods using local name dtype
_builtin_module = str.__class__.__module__
def _dump_object(item, value):
# item.isgroup() is true, as yet empty, value is an object
item = QGroup(item)
if isinstance(value, dict):
# special case for dict with non-text keys
item["__class__"] = "dict"
items = value.iteritems if PY2 else value.items
for i, (k, v) in enumerate(items()):
item["_" + str(2*i)] = k
item["_" + str(2*i+1)] = v
else:
cls = value.__class__
cname, module = cls.__name__, cls.__module__
if module is not None and module != _builtin_module:
cname = ".".join((module, cname))
item["__class__"] = cname
# Note that __getnewargs_ex__ is python3 only, so we skip
# it here. The recommendation in the python3 docs is to use
# the _ex version only if __new__ requires keyword arguments.
# Similarly, we do not support the python2-only __getinitargs__.
mydict = getattr(value, "__dict__", None)
getnew = getattr(value, "__getnewargs__", None)
setter = hasattr(value, "__setstate__")
getter = getattr(value, "__getstate__", None)
if getnew:
args = getnew()
elif not getter and mydict is None:
# We cannot handle the intricacies of the full
# pickle/copyreg protocol, but by handling one simple
# case of __reduce__ we can pick up both slice() and set()
# objects, which is worthwhile.
# Virtually all objects have a __reduce__ method, which
# will often raise a TypeError. Go ahead and blow up here.
getnew = value.__reduce__()
if getnew[0] != cls or any(v is not None for v in getnew[2:]):
raise TypeError("QnD cannot dump class {}".format(cname))
args = getnew[1]
if getnew:
item["__getnewargs__"] = {}
subdir = item["__getnewargs__"]
for i, arg in enumerate(args):
subdir["_" + str(i)] = arg
value = getter() if getter else mydict
if setter:
# __setstate__ only called if __getstate__ not false
# Never convert lists or tuples to ndarrays here. (??)
if value:
if isinstance(value, (list, tuple)):
value = list, value
item["__setstate__"] = value
elif value:
item.update(value)
def _load_object(qgroup, cls):
# If you fail here, you can still read the group with ADict(qgroup)
# which avoids this special treatment.
cls = cls.read() # assume QLeaf yields a text string
if not isinstance(cls, basestring):
raise TypeError("Expecting __class__ member of QGroup to be text.")
qgroup = QGroup(qgroup, auto=2)
if cls == "dict":
obj = {}
names = list(name for name in qgroup if name != "__class__")
if len(names) & 1:
names[0] = "" # die in first pass
key = None
for i, n in enumerate(sorted(names)):
if "_{}".format(i) != n:
raise TypeError("QGroup with __class__ dict error")
value = qgroup[n]
if i & 1:
obj[key] = value
else:
key = value
else:
cls = cls.rsplit(".", 1)
try:
module = (import_module(cls[0]) if len(cls) > 1 else
sys.modules[_builtin_module])
cls = getattr(module, cls[-1])
except (ImportError, AttributeError):
# If the named module does not exist or does not have
# the specified class, just return an ADict.
return ADict(qgroup)
args = qgroup.get("__getnewargs__")
if args is not None:
args = [args["_" + str(i)] for i in range(len(args))]
obj = cls(*args)
else:
obj = object.__new__(cls)
args = qgroup.get("__setstate__")
if args is not None:
obj.__setstate__(args)
else:
names = list(name for name in qgroup
if name not in ["__class__", "__getnewargs__"])
if names:
obj.__dict__.update(qgroup(2, names))
return obj
[docs]class QState(list):
"""State information for a QGroup."""
__slots__ = ()
def __init__(self, recording=0, goto=None, auto=0):
if hasattr(recording, "__iter__"):
seq = tuple(recording)[:3]
else:
if goto is not None:
goto = int(goto)
recording, auto = int(recording), int(auto)
seq = recording, goto, auto
super(QState, self).__init__(seq)
@property
def recording(self):
return self[0]
@recording.setter
def recording(self, value):
self[0] = int(value)
@property
def goto(self):
return self[1]
@goto.setter
def goto(self, value):
self[1] = None if value is None else int(value)
@property
def auto(self):
return self[2]
@auto.setter
def auto(self, value):
self[2] = int(value)
def push(self):
state = self[:3]
self.append(state)
def drop(self):
if len(self) < 4:
return 1
self[:3] = super(QState, self).pop()
return 0
[docs]class QList(object):
"""List of subgroups, lists, and ndarrays.
You reference QList elements by index or slice, like ordinary list
elements, including the python convention for negative index values.
To read the entire list, call it like a function, ``ql()``, which is
equivalent to ``ql[:]``. A QList has __iter__, append, and extend::
for element in ql: do_something
ql.append(value)
ql.extend(iterable)
In general, the elements of a QList are unrelated to one another;
it's like an anonymous QGroup. However, a common use case is to
represent a so-called UNLIMITED dimension in netCDF or HDF5. In
this case, every element will have the same dtype and shape. The
`islist` method returns 1 for this special restricted case, while
it returns 2 for an unrestricted QList. Whether this makes any
difference depends on the underlying file format. The QGroup
`recording` and `goto` methods allow you to access QList items in
the group transparently, as if they were individual elements at
a current record or index.
Attributes
----------
isgroup
isleaf
Always 0.
islist
This is 1 if this QList is a record array declared in recording
mode 1, and 2 if it was declared in any other way (including as a
record array in recording mode 2).
dtype
Always ``list``, the builtin python type.
shape
ndim
size
sshape
Always None.
"""
__slots__ = "_qnd_list", "_qnd_auto"
isgroup = isleaf = 0
dtype, shape, ndim, size, sshape = list, None, None, None, None
def __init__(self, item=None, auto=0):
object.__setattr__(self, "_qnd_list", item)
self.auto(auto)
[docs] def auto(self, recurse):
"""Set auto read mode, analogous to QGroup.auto method."""
object.__setattr__(self, "_qnd_auto", int(recurse))
[docs] def root(self):
"""Return root QGroup for this item."""
return QGroup(self._qnd_list.root(), QState(auto=self._qnd_auto))
@property
def islist(self):
return self._qnd_list.islist()
[docs] def extend(self, iterable):
"""append multiple new elements to this QList"""
for value in iterable:
self.append(value)
[docs] def append(self, value):
"""append a new element to this QList"""
dtype, shape, value = _categorize(value)
item = self._qnd_list.declare(dtype, shape)
if dtype is None:
return
if dtype == list:
if value:
QList(item).extend(value)
return
if dtype == dict:
if value:
QGroup(item).update(value)
return
if dtype == object:
_dump_object(item, value)
return
if value is not None:
item.write(value, ())
# Being unable to do partial write on declaration is consistent
# with behavior of QGroup __setitem__. The way to get it is to
# make a declaration with value = (type, shape) instead of an
# actual value in both cases.
def __repr__(self):
return "<QList with {} items>".format(len(self))
def __len__(self):
return len(self._qnd_list)
def __iter__(self):
auto = self._qnd_auto
recurse = auto > 1
for item in self._qnd_list:
if item.isgroup():
cls = item.lookup("__class__") if auto else None
if cls is None:
item, readit = QGroup(item), recurse
else:
item, readit = _load_object(item, cls), 0
elif item.islist():
item, readit = QList(item), recurse
else:
item, readit = QLeaf(item), auto
yield item() if readit else item
def __call__(self):
return self[:]
def __getitem__(self, key):
if not isinstance(key, tuple):
key = (key,)
index, args = key[0], key[1:]
this = self._qnd_list
if isinstance(index, slice):
index = range(*index.indices(len(this)))
if hasattr(index, "__iter__"):
return [self[(i,) + args] for i in index]
item = this.index(index)
if item is None:
raise IndexError("QList index {} out of range".format(index))
auto = self._qnd_auto
if item.islist():
item = QList(item, auto)
if args:
return item[args]
return item[:] if auto > 1 else item
if item.isleaf():
return _reader(item, args) if args or auto else QLeaf(item)
# Item must be a group, set up inherited part of state.
# Note that goto record was not used, so subgroup inherits it.
cls = item.lookup("__class__")
if cls is not None and auto:
return _load_object(item, cls)
item = QGroup(item, auto=auto)
return item() if auto > 1 else item
def __setitem__(self, key, value):
if not isinstance(key, tuple):
key = (key,)
index, args = key[0], key[1:]
if isinstance(index, slice) or hasattr(index, "__iter__"):
raise TypeError("QList does not support multi-element setitem")
dtype, shape, value = _categorize(value)
item = self._qnd_list.index(index)
if item is None:
raise IndexError("QList index {} out of range".format(index))
if item.islist() or item.isgroup():
idtype = list if item.islist() else dict
if idtype == dtype and not value:
return
raise TypeError("cannot set existing QGroup or QList")
# item is a QLeaf
if item.query()[0] is None:
if dtype is None and not args:
return
raise TypeError("QLeaf {} declared as None".format(index))
# Work around numpy (1.16.4) misfeature dtype('f8') tests == None:
if dtype is None or dtype in (list, dict, object):
raise TypeError("type mismatch setting QLeaf {}".format(index))
item.write(value, args)
[docs]class QLeaf(object):
"""An ndarray or None stored in a file.
You can read the data by calling the leaf instance ``ql()``, or by
indexing it ``ql[:]``, which also provides a means for partial reads.
A QLeaf has `dtype`, `shape`, `ndim`, and `size` properties with the
same meanings as an ndarray (except None has all these properties
equal None). Additionally, the `sshape` property may return a symbolic
shape with optional strings in the tuple representing dimension names.
You can write data by calling ``ql(value)``, or by setting a slice,
which provides a means for partial writes.
Attributes
----------
isgroup
islist
Always 0.
isleaf
Always 1.
dtype
The numpy dtype of this ndarray, or None if this leaf is None.
This is the dtype in memory, not necessarily as stored.
shape
ndim
size
The numpy ndarray properties, or None if this leaf is None.
sshape
A symbolic shape tuple, like shape except dimension lengths may be
type str instead of int.
"""
__slots__ = "_qnd_leaf",
isgroup = islist = 0
isleaf = 1
def __init__(self, item):
object.__setattr__(self, "_qnd_leaf", item)
[docs] def root(self):
"""Return root QGroup for this item."""
return QGroup(self._qnd_leaf.root())
def __call__(self, value=_NOT_PRESENT_):
if value is _NOT_PRESENT_:
return self[()]
else:
self[()] = value
def __getitem__(self, key):
return _reader(self._qnd_leaf,
key if isinstance(key, tuple) else (key,))
def __setitem__(self, key, value):
self._qnd_leaf.write(value, key if isinstance(key, tuple) else (key,))
@property
def dtype(self):
return self._qnd_leaf.query()[0]
@property
def shape(self):
return self._qnd_leaf.query()[1]
@property
def ndim(self):
return len(self._qnd_leaf.query()[1])
@property
def size(self):
shape = self._qnd_leaf.query()[1]
return prod(shape) if shape else 1
@property
def sshape(self):
_, s, ss = self._qnd_leaf.query()
return ss if ss else s
[docs]class QAttributes(ItemsAreAttrs):
"""Attributes for a QGroup and its members.
Usage::
qa = qgroup.attrs()
qa0 = qa.vname # for variables in this group, or qa['vname']
qa1 = qa._ # or qa[''] for attributes of this group
value = qa0.aname # or qa0['aname'], None if no such attribute
qa0.aname = value # or qa0['aname'] = value
qa0.aname = dtype, shape, value
if 'aname' in qa0: do_something
for aname in qa0: do_something
for aname, value in qa0.items(): do_something
"""
__slots__ = "_qnd_parent", "_qnd_vname", "__weakref__"
def __init__(self, parent, vname=None):
if not isinstance(parent, ProxyTypes):
parent = proxy(parent)
object.__setattr__(self, "_qnd_parent", parent)
object.__setattr__(self, "_qnd_vname", vname)
def __repr__(self):
vname = self._qnd_vname
if vname is None:
return "<QAttributes accessor for QGroup items>"
elif not vname:
return "<QAttributes for whole QGroup>"
return "<QAttributes for item {}>".format(vname)
def get(self, key, default=None):
parent, vname = self._qnd_parent, self._qnd_vname
if vname is None:
# Get group attribute, even though that is inconsistent...
# Should we implement matching set() or just let it go?
vname = ""
else:
parent = parent._qnd_parent
return parent.attget(vname).get(key, default)
def keys(self):
group, vname = self._qnd_group_vname()
return iter(group.attget(vname))
def items(self):
group, vname = self._qnd_group_vname()
return group.attget(vname).items()
def _qnd_group_vname(self):
parent, vname = self._qnd_parent, self._qnd_vname
if vname is None:
raise TypeError("need to specify QGroup item name")
return parent._qnd_parent, vname
def __getattr__(self, name):
vname = self._qnd_vname
if vname is None or name not in self._qnd_builtins_:
return super(QAttributes, self).__getattr__(name)
# Handle builtin pseudo-attributes here; they do not show up
# in the actual attribute dict referenced by [key].
# Can use dtype_, shape_, etc. attributes if real attributes
# have these names.
item = self._qnd_parent._qnd_parent.lookup(vname)
if item.isgroup():
return dict if name == "dtype" else None
if item.islist():
return list if name == "dtype" else None
dsss = item.query()
if dsss[0] is None:
return None
if name == "ndim":
return len(dsss[1])
if name == "size":
return prod(dsss[1])
return dsss[self._qnd_builtins_.index(name)]
_qnd_builtins_ = ["dtype", "shape", "sshape", "size", "ndim"]
def __getitem__(self, key):
parent, vname = self._qnd_parent, self._qnd_vname
if vname is None:
# key is vname
item = parent.lookup(key) if key else True
if item is None:
raise KeyError("no such item in QGroup as {}".format(key))
return QAttributes(self, key)
return parent._qnd_parent.attget(vname).get(key)
def __setitem__(self, key, value):
group, vname = self._qnd_group_vname()
# Note that value can be (dtype, shape, value) to be explicit.
dtype, shape, value = _categorize(value, 1)
if dtype in (list, dict, object):
raise TypeError("an attribute cannot be a dict or list")
group.attset(vname, key, dtype, shape, value)
def __iter__(self):
group, vname = self._qnd_group_vname()
return iter(group.attget(vname))
def __contains__(self, name):
group, vname = self._qnd_group_vname()
return name in group.attget(vname)
def __len__(self):
group, vname = self._qnd_group_vname()
return len(group.attget(vname))
[docs]class QnDList(object):
"""Implmentation of a low level QList type using QGroup.
A backend which has no direct support for QList objects can use
this to produce a pseudo-list, which is a group with member names
_ (None or a single signed or unsigned byte, value never read) and
names _0, _1, _2, etc.
This implementation will handle both UNLIMITED index-style lists
made with recording = 1 (that is group.declare with unlim flag)
and general lists. If UNLIMITED dimensions are supported, pass the
QnDLeaf to this constructor::
item = QnDList(QnDLeaf) # if at least one record exists
item = QnDList(QnDLeaf, 1) # if no records yet exist
Use the fromgroup constructor to check if a QnDGroup is a pseudo-list::
item = QnDList.fromgroup(QnDGroup)
"""
__slots__ = "_qnd_parent", "_qnd_current",
def __init__(self, parent, empty=None):
self._qnd_parent = parent
current = empty
if empty is not None:
if parent.isgroup():
parent.declare("_", None, ())
elif not isinstance(parent, QnDList):
current = -1
self._qnd_current = current
@staticmethod
def fromgroup(parent):
item = parent.lookup("_")
if item is not None:
if all(_us_digits.match(name) for name in parent):
return QnDList(parent) # parent is a pseudo-list
return parent
def parent(self):
parent = self._qnd_parent
return parent._qnd_parent if isinstance(parent, QnDList) else parent
@staticmethod
def isgroup():
return 0
def isleaf(self):
return int(isinstance(self._qnd_parent, QnDList))
def islist(self):
if self._qnd_parent.isgroup():
return 2
return int(not isinstance(self._qnd_parent, QnDList))
def root(self):
return self._qnd_parent.root()
# len, iter, index, declare are list methods, assume isleaf() false
def __len__(self):
if self._qnd_parent.isgroup():
return len(self._qnd_parent) - 1 # subtract _ item
if self._qnd_current is not None and self._qnd_current < 0:
return 0 # leaf.query() probably returns 1
return self._qnd_parent.query()[1][0]
def __iter__(self):
parent = self._qnd_parent
if parent.isgroup():
for i in range(len(self)):
yield parent.lookup("_" + str(i))
else:
for i in range(len(self)):
yield QnDList(self, i)
def index(self, ndx):
nrecs = max(len(self), 1)
if ndx < 0:
ndx = ndx + nrecs
if ndx < 0 or ndx >= nrecs:
return None # out of range, let caller raise any exception
parent = self._qnd_parent
if parent.isgroup():
return parent.lookup("_" + str(ndx))
return QnDList(self, ndx)
def declare(self, dtype, shape):
parent = self._qnd_parent
nrecs = len(self)
if parent.isgroup():
return parent.declare("_" + str(nrecs), dtype, shape)
return QnDList(self, nrecs)
# query, read, write are leaf methods, assume isleaf() true
def query(self):
qndlist = self._qnd_parent
dtype, shape, sshape = qndlist._qnd_parent.query()
shape = shape[1:]
if sshape:
sshape = sshape[1:]
return dtype, shape, sshape
def read(self, args=()):
current = self._qnd_current
qndlist = self._qnd_parent
check = qndlist._qnd_current
if check is not None and check < 0:
raise TypeError("attempt to read from empty UNLIMITED array")
return qndlist._qnd_parent.read((current,) + args)
def write(self, value, args=()):
qndlist = self._qnd_parent
qndlist._qnd_parent.write(value, (self._qnd_current,) + args)
# Turn off special empty list state (if on):
qndlist._qnd_current = None