This is two major releases since 0.12.0. Changes include API changes, new
features, enhancements, and performance improvements along with a large
number of bug fixes.
For the detailed list of changes see
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html
This is a major release from 0.11.0 and includes several new features
and enhancements along with a large number of bug fixes.
Highlites include a consistent I/O API naming scheme, routines to read
html, write multi-indexes to csv files, read & write STATA data files,
read & write JSON format files, Python 3 support for HDFStore, filtering
of groupby expressions via filter, and a revamped replace routine that
accepts regular expressions.
For detailed changes see:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html
Summary of changes since 0.10.1:
This is a major release from 0.10.1 and includes many new features and
enhancements along with a large number of bug fixes. The methods of
Selecting Data have had quite a number of additions, and Dtype support
is now full-fledged. There are also a number of important API changes
that long-time pandas users should pay close attention to.
* New precision indexing fields loc, iloc, at, and iat, to reduce
occasional ambiguity in the catch-all hitherto ix method.
* Expanded support for NumPy data types in DataFrame.
* NumExpr integration to accelerate various operator evaluation.
* Improved DataFrame to CSV exporting performance.
For a full list refer to the "what's new" page.
Also fixes PLIST errors introduced in last update.
Release date: 2013-01-22
New features:
Add data inferface to World Bank WDI pandas.io.wb (GH2592)
API Changes:
Restored inplace=True behavior returning self (same object) with
deprecation warning until 0.11 (GH1893)
HDFStore
refactored HFDStore to deal with non-table stores as objects, will
allow future enhancements
removed keyword compression from put (replaced by keyword complib
to be consistent across library)
warn PerformanceWarning if you are attempting to store types that
will be pickled by PyTables
Improvements to existing features:
HDFStore
enables storing of multi-index dataframes (closes GH1277)
support data column indexing and selection, via data_columns
keyword in append
support write chunking to reduce memory footprint, via chunksize
keyword to append
support automagic indexing via index keyword to append
support expectedrows keyword in append to inform PyTables about
the expected tablesize
support start and stop keywords in select to limit the row
selection space
added get_store context manager to automatically import with pandas
added column filtering via columns keyword in select
added methods append_to_multiple/select_as_multiple/
select_as_coordinates to do multiple-table append/selection
added support for datetime64 in columns
added method unique to select the unique values in an indexable
or data column
added method copy to copy an existing store (and possibly upgrade)
show the shape of the data on disk for non-table stores when
printing the store
added ability to read PyTables flavor tables (allows compatiblity
to other HDF5 systems)
Add logx option to DataFrame/Series.plot (GH2327, GH2565)
Support reading gzipped data from file-like object
pivot_table aggfunc can be anything used in GroupBy.aggregate (GH2643)
Implement DataFrame merges in case where set cardinalities might
overflow 64-bit integer (GH2690)
Raise exception in C file parser if integer dtype specified and have
NA values. (GH2631)
Attempt to parse ISO8601 format dates when parse_dates=True in read_csv
for major performance boost in such cases (GH2698)
Add methods neg and inv to Series
Implement kind option in ExcelFile to indicate whether it's an XLS
or XLSX file (GH2613)
Bug fixes:
Fix read_csv/read_table multithreading issues (GH2608)
HDFStore
correctly handle nan elements in string columns; serialize via the
nan_rep keyword to append
raise correctly on non-implemented column types (unicode/date)
handle correctly Term passed types (e.g. index<1000, when index is
Int64), (closes GH512)
handle Timestamp correctly in data_columns (closes GH2637)
contains correctly matches on non-natural names
correctly store float32 dtypes in tables (if not other float types
in the same table)
Fix DataFrame.info bug with UTF8-encoded columns. (GH2576)
Fix DatetimeIndex handling of FixedOffset tz (GH2604)
More robust detection of being in IPython session for wide DataFrame
console formatting (GH2585)
Fix platform issues with file:/// in unit test (GH2564)
Fix bug and possible segfault when grouping by hierarchical level that
contains NA values (GH2616)
Ensure that MultiIndex tuples can be constructed with NAs (GH2616)
Fix int64 overflow issue when unstacking MultiIndex with many levels
(GH2616)
Exclude non-numeric data from DataFrame.quantile by default (GH2625)
Fix a Cython C int64 boxing issue causing read_csv to return incorrect
results (GH2599)
Fix groupby summing performance issue on boolean data (GH2692)
Don't bork Series containing datetime64 values with to_datetime (GH2699)
Fix DataFrame.from_records corner case when passed columns, index
column, but empty record list (GH2633)
Fix C parser-tokenizer bug with trailing fields. (GH2668)
Don't exclude non-numeric data from GroupBy.max/min (GH2700)
Don't lose time zone when calling DatetimeIndex.drop (GH2621)
Fix setitem on a Series with a boolean key and a non-scalar as value
(GH2686)
Box datetime64 values in Series.apply/map (GH2627, GH2689)
Upconvert datetime + datetime64 values when concatenating frames
(GH2624)
Raise a more helpful error message in merge operations when one
DataFrame has duplicate columns (GH2649)
Fix partial date parsing issue occuring only when code is run at EOM
(GH2618)
Prevent MemoryError when using counting sort in sortlevel with
high-cardinality MultiIndex objects (GH2684)
Fix Period resampling bug when all values fall into a single bin
(GH2070)
Fix buggy interaction with usecols argument in read_csv when there is
an implicit first index column (GH2654)
pkgsrc change: depend on math/py-pytables.
Changes since 0.9.1:
* Delimited file parsing engine rewritten to use a fraction of memory while
being 40%+ faster.
- Much-improved Unicode handling via the encoding option.
- Column filtering (usecols)
- Dtype specification (dtype argument)
- Ability to specify strings to be recognized as True/False
- Ability to yield NumPy record arrays (as_recarray)
- High performance delim_whitespace option
- Decimal format (e.g. European format) specification
- Easier CSV dialect options: escapechar, lineterminator, quotechar, etc.
- More robust handling of many exceptional kinds of files observed in the wild
* API changes
- Deprecated DataFrame BINOP TimeSeries special case behavior
- Altered resample default behavior
- Infinity and negative infinity are no longer treated as NA by isnull and
notnull.
- Methods with the inplace option now all return None instead of the calling
object.
- pandas.merge no longer sorts the group keys (sort=False) by default.
- The default column names for a file with no header have been changed.
- Values like 'Yes' and 'No' are not interpreted as boolean by default.
- The file parsers will not recognize non-string values arising from a
converter function as NA.
- Calling fillna on Series or DataFrame with no arguments is no longer valid
code.
- Series.apply will now operate on a returned value from the applied function.
- New API functions for working with pandas options.
* New features
- Wide DataFrame Printing.
- Updated PyTables Support.
* Enhancements
- added ability to hierarchical keys.
- added mixed-dtype support!
- performance improvments on table writing.
- support for arbitrarily indexed dimensions.
- SparseSeries now has a density property.
* Bug fixes
- added Term method of specifying where conditions.
- del store['df'] now call store.remove('df') for store deletion.
- deleting of consecutive rows is much faster than before.
- in_itemsize parameter can be specified in table creation to force a minimum
size for indexing columns.
- indexing support via create_table_index (requires PyTables >= 2.3)
- appending on a store would fail if the table was not first created via put.
- fixed issue with missing attributes after loading a pickled dataframe.
- minor change to select and remove: require a table ONLY if where is also
provided.
* Compatibility
- 0.10 of HDFStore is backwards compatible for reading tables created
in a prior version of pandas, however, query terms using the prior
(undocumented) methodology are unsupported.
* N Dimensional Panels (Experimental)
pandas is an open source, BSD-licensed library providing
high-performance, easy-to-use data structures and data analysis tools
for the Python programming language.