pkgsrc-wip/py-lsqfit/Makefile

22 lines
604 B
Makefile
Raw Normal View History

2014-05-12 23:59:10 +02:00
# $NetBSD: Makefile,v 1.5 2014/05/12 21:59:10 jihbed Exp $
2014-05-12 23:59:10 +02:00
DISTNAME= lsqfit-4.8
PKGNAME= ${PYPKGPREFIX}-${DISTNAME}
CATEGORIES= math python
MASTER_SITES= ${MASTER_SITE_PYPI:=l/lsqfit/}
MAINTAINER= jihbed.research@gmail.com
Version 4.6.1 - 2014-02-02 =========================== Cleaning up some small bugs introduced with the new lsqfit.wavg. Also introduced an approximate but potentially much faster *fast* mode for it. Version 4.6 - 2014-01-30 ======================== The main change here is an upgrade to lsqfit.wavg. - Somewhat incompatible change in lsqfit.wavg: When averaging arrays or dicts, wavg used to ignore correlations between different elements of the array or dict. The new wavg takes account of all correlations between different pieces of input data. wavg returns a GVar if averaging a list of GVars, a numpy array of GVars if averaging a list of arrays of GVars, and a Bufferdict of GVars or arrays of GVars if averaging a list of dicts. In each case the return value has extra attributes: chi2, dof, Q, time, fit. The function itself also has these attributes, coming from the last fit. - gvar.mean(g) now returns g unchanged if g contains objects of type other than GVar. This is useful for writing functions that must work with either GVars or floats as arguments: gvar.mean can be used to strip the sdev off of GVars where it isn't needed or wanted. - New function gvar.asbufferdict(g) converts dictionary g to a BufferDict unless it already is one, in which case it returns g. The keys in the final result can be restricted by adding a a list of keys as a second argument: gvar.asbufferdict(g, keylist). Version 4.5.3 - 2013-12-22 =========================== - Fixed bug in gvar._gvarcore that caused problems on win64 systems. - GVar's __cinit__ has been changed to an __init__, which makes derivation from GVar possible. GVar also has new property: g.internaldata. This allows simplifies derivation from GVar --- see, for example, class WAvg in lsqfit._extras.py. Finally a cython declaration file, gvar.pxd, is installed for the benefit of other cython modules: cimport gvar gives the module access to the internal definitions of cython extension types GVar, svec and smat. - lsqfit.wavg (weighted averages) now returns a variable of type WAvg which is a class derived from GVar (with all of its functionality) but with added attributes: chi2, dof, and Q which are the chi2, dof, Q from the wavg. In the past these were read off the function itself (eg, wavg.Q) but this is nonintuitive. Now ans = lsqfit.wavg(list_of_GVars) is a GVar with the extra attributes (ans.chi2, ans.dof, ans.Q). lsqfit.wavg still has attributes chi2, Q etc to help with legacy code. Also this is useful if the average is over a list of arrays or dictionaries (ie, a multidimensional random variable). In this case the individual GVars in the result have chi2s, etc as described above, while lsqfit.wavg has the chi2 for the entire set (ie, the sum of the chi2s for all the components). Version 4.5.2 - 2013-09-26 ========================== - str(x) and repr(x) for GVar x both now return strings using the '2.31(10)' format rather than the older '2.31 +- 0.1'. The old format is still supported on input, but it will no longer appear in (default) printing. Use x.fmt(-1) to obtain the old format. - Added gv.evalcorr(g) which calculates the correlation matrix of the GVars in g. - gv.chi2 has a new option (fmt=True) that causes it to return a string (describing the chi**2) rather than the numerical value of chi**2. - Operators > and < are now defined for gvar.GVars. This allows algorithms to order GVars, which is occasionally useful. The ordering is based upon the mean values. Operators >= and <= are still *not* defined, because of incompatibilities with == and !=, which look not just at mean values but also at all the dependencies. These incompatibilities suggest that one shouldn't define > and < either, except that there are times when it is quite useful to be able to order a numerical data type for algorithmic reasons. The setup here is a compromise (kludge?). - Fixed very minor bug in lsqfit.nonlinear_fit.format(). Version 4.5.1 - 2013-08-13 ========================== - polishing/minor fixes for nonlinear_fit.simulated_fit_iter. Also now has a bootstrap option. - copy.copy and copy.deepcopy now work with GVars. - very minor fix to gvar.uncorrelated Version 4.5 - 2013-07-31 ======================== - nonlinear_fit.simulated_fit_iter generates fits of new simulated data that is generated randomly from the original fit data. This data is useful for testing fits and tuning parameters in them. Simulated data has the same covariance matrix as the original data but its mean values fluctuate around values given by the fitting function evaluated at user-specified parameter values p=pexact. The values in pexact are the "correct" values that should be obtained from a fit of the simulated data --- that is, the results of the fit to simulated data should agree with pexact to within errors. Knowing the correct answers for the fit parameters ahead of a fit allows for very realistic testing. See the documentation in the Tutorial section on Testing Fits with Simulated Data for more information. - nonlinear_fit.format() now adds 1 to 5 stars at the end of any parameter line where the parameter and the prior differ by more than 1 to 5 (or more) standard deviations, respectively. Stars are also added when fit data is printed out where fit data and the fit differ by more than 1 standard deviation. These are meant to draw attention to potential problems. - New function: gvar.chi2(g1, g2) computes the chi**2 of g1-g2, where g1 and g2 are (multi-dimensional) distributions. One of g1 or g2 can contain numbers instead of GVars (and/or can be missing entries contained in the other). Also gvar.chi2(diff) where diff = g1 - g2 equals gvar.chi2(g1, g2). - gvar.dataset.avg_data has new option specified by parameter noerror. Setting noerror=True causes avg_data to compute averages but not the errors in those averages. - gvar.ranseed() called without an argument generates its own random seed to reinitialize the numpy random number generates. The seed is returned by the subroutine and can be used to recover the random number stream in later work. The seed is also stored in gvar.ranseed.seed. The idea is to use gv.ranseed() at the start of a code and print out gvar.ranseed.seed so that the seed can, if desired, be used to recreate the same random numbers in a later run. The key here is the 'if desired'; usually you might not care to recreate a run unless something unusual happens. - The tutorial in the documentation has a new section (at the end) with a pedagogical discussion of simple fit strategies. Version 4.4.4 - 2013-07-07 ========================== - gvar.SVD sometimes complains that "SVD failed to converge". This is a numpy.linalg problem (that might be solved by *not* linking with atlas). Have introduced a back up routine (numpy.linalg.eigh) that is tried when this error is encountered. - lsqfit.wavg now accepts a list of dictionaries (containing GVars or arrays of GVars), as well as lists of GVars or arrays of GVars. - Modest optimization for gvar.evalcov. Small optimizaitons for gvar.svec and gvar.smat. - Fixed bug in svec.add (where one or other svec is size=0 svec) - Fixed very minor bug in gvar.gvar() (makes, eg, gvar(array(1.)) work). Version 4.4.3 - 2013-04-06 ========================== - Improved syntax for @transform_p from lsqfit. The old syntax still works but the new syntax is simpler: 1) use @transform_p(priorkeys,0) instead of @transform(prior,0,'p'); and 2) fit.transformed_p is the same as fit.p but augmented with the exponentials of any log-normal terms, etc. - Rules for initial values p0 in nonlinear_fit are more flexible: p0 can include keys that are not in prior (these will be ignored, unless prior is None). This makes it more likely that an old p0 will be useful for priming a new fit. Version 4.4.2 - 2013-03-16 =========================== This is another minor upgrade. - Evaluation of logGBF in nonlinear_fit was having problems (in one user's code, at least) with very large covariance matrices. This is now fixed. Version 4.4.1 - 2013-03-14 ========================== This is a very minor upgrade. - Set default svdcut=1e-15 instead of None in nonlinear_fit. This cut is very small and so usually has negligible impact in cases where an svdcut is unneeded. It protects against minor roundoff errors that arise relatively frequently, even in fairly simple problems. It also prevents problems from exact zero modes in the data or prior. One might argue that it would be useful to expose these last problems, rather than dealing with them quitely, but dealing with much more common minor roundoff errors seems more important. - exp(fit.logGBF) is the probability (density) for generating the fit data from the input fit model, assuming Gaussian statistics. It used to be proportional to that probability; the proportionality factors are now included. This change will have no impact at all on almost all uses of logGBF. Change made more for the sake of clarity than utility. - More documentation, including a tutorial section on chained fits and more discussion of svd cuts. Version 4.4 --- 2013-02-13 ========================== - New function gvar.deriv(f, x) computes df/dx where f and x are gvar.GVars, and x is independent (ie, x has only one non-zero element in x.der). A ValueError exception is raised when x is dependent on other GVars. f can also be an array of GVars or a dictionary of GVars and/or arrays of GVars. GVars also have a method which computes the derivative: f.deriv(x). - Small code improvements to lsqfit.transform_p. Version 4.3.1 --- 2013-02-10 ============================ - Slight refinements to the support for log-normal, etc priors. The decorator name is changed (but the old name is aliased to the new, to support legacy code (if there is any)). Version 4.3 --- 2013-02-10 =========================== - Works with python3.3 (and numpy >= 1.17 which is necessary for 3.3). Fixed minor errors in gvar.BufferDict.__str__ and in some of the unittests that showed up with python3.3. - Support for log-normal and "sqrt-normal" prior distributions for fit function parameters. The idea is to use parameters with names like "log(a)" instead of "a" in the prior, while expressing the fit function in terms of "a": so prior["log(a)"] is specified in the prior but not prior["a"], while the fit function uses parameter p["a"] but not p["log(a)"]. Parameter p["a"] has a log-normal distribution because prior["log(a)"] is a gaussian variable. See the section "Positive Parameters" in the overview section of the html documentation, for more information. - gvar.dataset.Dataset changed to an OrderedDict from a dict. This mostly doesn't matter. Just about the only non-cosmetic effect concerns what happens when an svdcut is applied to the output of avg_data --- small differences arise when rows and columns of the covariance matrix are interchanged (roundoff error). - Changed == and != for GVars to allow comparisons with non-GVar types; a GVar compares as not equal to a non-GVar unless its mean equals the non-GVar and its standard deviation is zero. Note that >, <, etc are not defined for GVars since GVars are not unambiguously ordered --- eg, a number drawn from the distribution 100(99) will be larger than one from 101(1) almost 50% of the time, even though 100 < 101. - Had too many pieces in the version number, so moved to 4.3. A third component, as in 4.3.1, will indicate bug fixes and minor features. There has been a lot added since 4.2 started (see 4.2.2). Version 4.2.7.2 --- 2013-01-29 ============================== gvar.fmt_errbudget(...) has new parameter to specify column widths. This allows for longer names for outputs and inputs. Version 4.2.7.1 -- 2013-01-14 ============================= Adds a further tweak to the exception handling inside fit functions --- slightly more robust than what is in 4.2.7. Version 4.2.7 -- 2013-01-13 =========================== Another minor update: - gvar.raniter and gvar.bootstrap_iter now work with single gvar.GVar's as arguments (in addition to the more useful cases of arrays and dictionaries). This makes them more consistent with the other utility functions. - Python errors buried inside fit functions now result in slightly more intelligible error messages. Added two new unittests for such exception-handling. Version 4.2.6 -- 2012-12-03 =========================== This is a minor update: - Adds load (and loads) and dump (and dumps) methods to gvar.BufferDict to facilitate saving serialized BufferDicts in files (or strings) for later use. This is particularly useful when the BufferDict contains gvar.GVars since the correlations between the different GVars in the BufferDict are complicated to retain properly. These are implemented using pickle or, optionally, json. pickle already worked with BufferDicts. json was added because pickle is not compatible between python2 and python3. json files are also readable by non-python code (and by yaml). The json implementation has some limitations (around the types used for keys in the BufferDict, as well as types for the values) so pickle may be preferable except in situations where data must be moved from python2 to python3. Version 4.2.4 -- 2012-08-18 =========================== This update is to fix a bug. Since version 4.2.2 lsqfit has been able to deal correctly with statistical correlations between priors and the input fit data. The code checks automatically for such correlations, and modifies the definition of chi**2 appropriately if it finds correlations. There was a bug in part of the code that checks for correlations, causing it to miss certain situations. That bug is fixed in this update. Also Other changes: - Renamed gvar.orthogonal to gvar.uncorrelated, which is more intelligible (and also now has correct code). - Fixed bug in gvar.GVar.partialvar (and therefore also gvar.fmt_errorbudget). The partial variance due to some GVar g should include the contributions from all other GVars that are statistically correlated with g. This previous code missed correlated but unreferenced variables that should have been included automatically. - gvar.dataset.autocorr() is now done properly (with FFTs) and so can handle large datasets. It now computes autocorrelations for all intervals. - lsqfit now issues deprecation warnings if the old classes GPrior, CGPrior, or LSQFit are used. These have been superseded in recent versions (by gvar.BufferDict and lsqfit.nonlinear_fit), and the old names have been attached to the new constructs, but the correspondence between old and new is only approximate --- hence the warning. - Documentation improvements in the Tutorial. Version 4.2.3 -- 2012-07-22 =========================== This version updates printing of GVars and of nonlinear_fits: - Enhanced the formatting capabilities of GVar.fmt. If g is a GVar, then gvar.fmt() will create a string representation of g that shows the leading 2 digits of the error (used to be 1). The new code handles special cases much more effectively. For example very large or small numbers are represented using exponential notation (eg, 1.23(4)e+10 meaning 1.23e+10 +- 4e+8). Also removed some bugs in the conversion from strings to GVars (eg, couldn't handle "-.2345(1)"). Added new unittests for fmt (in test_gvar.py). - Changed the format of the fit report produced by nonlinear_fit.format(..). New format is more compact and more informative. In particular, indices for parameter arrays are included in the output to make finding a particular element easier. Also include errors on the fit values when data and fit are printed out. Output can be streamlined using new option pstyle='m'. (Setting pstyle='vv' gives output a lot like the old format.) Added unittests for format(..) (in test_lsqfit.py). - Added new utility function gvar.fmt(g..) which formats every GVar in GVar/array/dictionary g (using x.fmt(..) for every GVar x in g). - Scripts eg0.py ... eg5.py in doc/source now generate program output in files, with names like eg0.out and eg5b.out, that are read directly into the documentation. This simplifies the building of the documentation as changes are made to reporting functions (see above). Version 4.2.2 -- 2012-06-07 =========================== This version involves significant internal change relative to the last version, much of which will be invisible to most users. Significant pieces of lsqfit and gvar were refactored for simplicity, with replacements for a number of awkward constructions that reflected earlier but now obsolete ideas about how the code would be used. A somewhat inconvenient change is renaming the gdev module to gvar (for "gaussian variable"): every instance of 'gdev' is now replaced by 'gvar', as is every 'GDev' by 'GVar'. The old names were wrong and therefore misleading. (A tiny 'gdev.py' file is included that aliases the new names with the old names, for use with old code.) More usefully, the interfaces for many functions in lsqfit and especially gvar were made more uniform: for example, almost any gvar function that took an array of GVars as an argument can now also accept a single GVar or a dictionary whose values are single GVars or arrays of GVars. This is motivated by the overall design notion that multidimensional distributions should be represented by collections of GVars: either as arrays, or as dictionaries containing GVars and/or arrays of GVars, the latter providing a much more flexible interface. These changes should make the modules easier to learn and use, and certainly makes them easier to maintain. The bigger changes include: - The names gdev and GDev are everywhere replaced by gvar and GVar (for "gaussian variable"). A new gdev.py module is included that aliases the new names to the old names, for use with old code. gdev.py is not installed with the rest of the code; if you need it (for old code) install it, for example, using "make install-gdev"; or copy it to the the directory containing the old code. Obviously, a better solution is to get rid of the old names. - Correctly handles situations where priors are correlated with the fit data. Previously such correlations were ignored. This is the most significant change in functionality. It is a situation that arises rather rarely, but which is mishandled by older versions. - Removed minor bug in lsqfit.wavg (used to ignore svdcut<0). - Fit functions that depend only on the fit parameters (that is, have no dependence on an independent "x" variable) are now supported. This is signaled either by setting x=False in the fit data (data=(x,y)) or by leaving x out altogether (data=y) in nonlinear_fit. - Rearranged gvar and lsqfit into packages instead of simple modules. This makes maintenance easier. It also reduces the number of names added to the module space. - Relocated BufferDict into gvar. BufferDicts can still be constructed from dictionaries but no longer directly from arrays. This makes for a cleaner data type. BufferDicts are used internally in several of gvar's functions as the standard dictionary class (the standard array class is a numpy array). Unlike regular dictionaries, BufferDicts can be pickled even when filled with GVars; this is currently the only way to pickle GVars. - Removed class GPrior from lsqfit. It isn't really needed any more since a dictionary works just as well. (GPrior is now an alias to gvar.BufferDict, which should allow older code to continue working, mostly.) Also removed classes BasePrior and NullPrior. - svdcut and svdnum in nonlinear_fit still specify svd cuts for the fit data, but now can also specify svd cuts for the prior (no other easy way to do this now that GPriors are effectively gone). To specify a cut for the prior make svdcut and/or svdnum into 2-tuples, where the first entry is for the data and the second is for the priors. - fit.svdcorrection is list with one or two elements. Either element can be a (1-d) vector or None. Can now be used directly as an input in fmt_errorbudget() (don't need/want to put [ ] around it). - Merged class LSQFit and function nonlinear_fit from lsqfit into a new class called nonlinear_fit. nonlinear_fit is used as before, but is now actually initializing the class when it is fitting. Given standard usage, there was no reason to keep these two separate. (The old LSQFit class was originally meant to represent a fitter, but was mostly used to hold the results of a single fit; the new class nonlinear_fit class represents the result of a fit.) - Redefined gvar.mean, gvar.sdev, gvar.var, gvar.evalcov, gvar.raniter, etc so that they all work with dictionaries as well as arrays. The dictionaries are converted to BufferDicts internally and results are returned as BufferDicts. - The name of fmt_partialsdev is now changed to the more understandable fmt_errorbudget. Also it is part of module gvar, as well as being a method in nonlinear_fit objects. The name fmt_partialsdev is retained as an alias, to benefit older code. - Allow arguments to GVar.partialvar and GVar.partialsdev to be None or single GVars or arrays/dictionaries of GVars. Arguments to gvar.fmt_errorbudget are also now allowed to be None, single GVars or lists of arrays/dictionaries of GVars. Previously each of these routines was more restrictive. - Added a bootstrap_iter function to gvar to create bootstrap copies of collections of GVars (arrays or dictionaries). - lsqfit's nonlinear_fit.bootstrap_iter does bootstrap fits on a list of bootstrap copies of the fit data. Now the list of bootstrapped data can be omitted and bootstrap copies are generated internally, from the means and covariance matrix of the data set. This is useful if the data has small errors (ie, is gaussian) which is often the case even if the fit parameters turn out to be non-gaussian (and therefore require bootstrapping). - Created new options for gvar.gvar arguments: eg, gvar.gvar(["0(1)",(2,1)]) returns array [gvar(0,1),gvar(2,1)]. - Added new tools in gvar.dataset for handling random samples from distributions. These include functions avg_data(data), bootstrap_iter(data), and bin_data(data,binsize), as well as class Dataset for collecting random samples (in a dictionary). These additions are meant to supplant the old dataset.py module. - Internal changes to how the data and covariance matrices are inverted could lead to small differences in results, due to roundoff error. - nonlinear_fit.check_roundoff() now issues a warning, rather than an error, if large roundoff errors are suspected. - svd analysis is handled by function gvar.svd which is now applied to a dictionary or array of GVars. It uses class gvar.SVD which is applied to a covariance matrix. - nonlinear_fit.kappa no longer exists. It can be obtained using gvar.SVD. - renamed nonlinear_fit.dump_parameters with nonlinear_fit.dump_pmean. Also added nonlinear_fit.dump_p and nonlinear_fit.load_parameters. - Documentation streamlined. The Overview and Tutorial section was simplified a little, and has a new section on Troubleshooting. - Speed is about the same except in cases where there are correlations between the priors and the fit data (where it is somewhat slower now, because it is doing the right thing). # Created by G. Peter Lepage (Cornell University) on 2012-04-29. # Copyright (c) 2008-2014 G. Peter Lepage. # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # any later version (see <http://www.gnu.org/licenses/>). # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details.
2014-02-02 22:15:07 +01:00
HOMEPAGE= https://github.com/gplepage/lsqfit.git
COMMENT= Utilities for nonlinear least-squares fits
LICENSE= gnu-gpl-v3
USE_LANGUAGES= c c++
PYDISTUTILSPKG= yes
.include "../../devel/py-cython/buildlink3.mk"
.include "../../lang/python/extension.mk"
.include "../../math/gsl/buildlink3.mk"
.include "../../math/py-numpy/buildlink3.mk"
.include "../../mk/bsd.pkg.mk"