This 'Applied Demography Toolbox' is a collection of applied demography programs, scripts, spreadsheets, databases and texts.
If you would like to use, share or reproduce information or ideas from the linked files, be sure to cite the respective source.
If you have questions, recommendations or
additions, or if you find the site useful, please send a message to me
(Eddie Hunsinger) at
edyhsgr@gmail.com.
Acknowledgments.
Notes for folks who would like to add to this site 1. For demography tools that are not posted on another site: Just email what you'd like to add to me at
edyhsgr@gmail.com.
Include (1) The file that does the job (the "tool"), with citation and any notes within if possible, (2) Any necessary supporting files
for an example to be run, and (3) Any information that you'd like to give me before I do anything.
I will try to use the tool with the example data you provide, and then email you back with (1) Any questions I have, and
(2) A proposed name and little blurb to describe the tool, for you to review.
Then after we agree to posting, I'll save it on server space at Cal, back it up with
Git,
and add it with it's name and blurb to the
list of tools. You can email me to change or remove it at any time, and I will do so as soon as possible.
2. For demography tools that are posted on another site: Just email the link to me at
edyhsgr@gmail.com, and I'll review it and add
it if we agree it's a good fit.
Or 3. Very simple, for any tools: Just email the tool to
edyhsgr@gmail.com
with any info you like, and I'll lead in steps for getting it posted.
I'll be careful to not post anything that you send without your review and consent.
^
Acknowledgments In addition to the folks who have offered tools for the site, a few people have provided technical support, including:
Webb Sprague, a demographer for the State of Washington, initiated the plan for version control with
Git
for the "Tools Maintained Here,"
instructed me on how to use Git, and provides terrific feedback and ideas for the site.
Carl Mason, Director of Berkeley's Demography Lab, installed Git for the site, and has been very supportive of the work.
Thanks to Tom Wilson and the Australian Population Association for featuring the site in the
June 2011 issue of Demoz,
the newsletter of the
Australian Population Association.
Thanks to Kim Dunstan and the Population Association of New Zealand for sharing the site in the
November 2011 PANZ Newsletter.
Thanks to Kelvin Pollard and the
Population Association of America's
Committee on Applied Demography for sharing the site through the
September 2012 issue of Applied Demography, the
Committee on Applied Demography's biannual newsletter.
Thanks to Tim Riffe for
sharing
the site on his personal
Demog Blog.
Here
is a neat article by Nick Barnes that gives this site some inspiration.
^
From the front page: "The National Historical Geographic Information System (NHGIS) provides, free of charge, aggregate census data and
GIS-compatible boundary files for the United States between 1790 and 2011." Awesome.
--April 2013
^
From the front page: "This webpage and accompanying materials document the process to create master index files capable of linking various U.S.
Census Bureau (Census) data products. Systematically linking various datasets together will allow Census data users to assemble information for
statistical and geographic analysis."
--March 2013
^
From the 'About the ARDA page':
"The Association of Religion Data Archives (ARDA) strives to democratize access to the best data on religion. Founded as
the American Religion Data Archive in 1997 and going online in 1998, the initial archive was targeted at researchers interested
in American religion. The targeted audience and the data collection have both greatly expanded since 1998, now including American
and international collections and developing features for educators, journalists, religious congregations, and researchers.
Data included in the ARDA are submitted by the foremost religion scholars and research centers in the world."
--March 2013
^
From the related paper:
"I develop and explain a new method for interpolating detailed fertility schedules from age-group data. The method allows estimation of
fertility rates over a fine grid of ages, from either standard or non-standard age groups. Users can calculate detailed schedules directly
from the input data, using only elementary arithmetic."
--March 2013
^
From the site: "DataFerrett is a unique data analysis and extraction tool-with recoding capabilities-to customize federal, state, and local data to
suit your requirements. (FERRETT stands for Federated Electronic Research, Review, Extraction, and Tabulation Tool.) Using DataFerrett,
you can develop an unlimited array of customized spreadsheets that are as versatile and complex as your usage demands." This extraction tool can be used
to make quick tabulations from PUMS microdata as well-- terrific.
--March 2013
^
From the site: "The data present migration patterns by state or by county for the entire United States and are available for inflows-the number of new
residents who moved to a county or state and where they migrated from, and outflows-the number of residents leaving a county or state and where they went."
--March 2013
^
Life table estimation is a fundamental part of demographic research, so this
SAS
macro by
Klára Hulíková
to provide life table smoothing options (including Kannisto, Gompertz-Makeham, modified Gompertz-Makeham, Thatcher, and Coale-Kisker functions)
should be of great use. More information is available in
Chapter 5 of the related doctoral thesis, and a
64-bit version is available as well.
Terrific!
--July 2012
^
From the cover page: "Census of Population and Housing data present here ranges from our most recent census to the historical decennial census conducted throughout the decades.
Some of the data were scanned as an effort to make historical census information available to the public. The display of data will continue as historical census records become available."
--June 2012
^
From a Census Bureau website: "The American Community Survey (ACS) is an ongoing statistical survey that samples a small percentage of the population every year --
giving communities the current information they need to plan investments and services." This site provides the questionnaires for the survey, an essential tool for understanding
the data.
--June 2012
^
Age standardization of death rates is an important procedure that allows better comparisons of death rates for different populations. This Excel spreadsheet by Michail
Agorastakis and Zacharoula Michou of the University of Thessaly provides examples of both direct and indirect age standardization which can be used for review and instruction,
or as templates for other data. A very useful tool.
--March 2012
^
"Age heaping" is systematic misreporting of age in a survey, with preference for certain ages, often due to rounding. This spreadsheet by
Michail Agorastakis and Zacharoula Michou of the
University of Thessaly can be used to calculate Myers' and Whipple's indices (as well as an extension of Whipple's index) which each measure age heaping.
A great resource.
--March 2012
^
Population pyramids are graphs that show population by age and sex, and neatly summarize many aspects of a population. This Excel workbook by Michail Agorastakis and Zacharoula
Michou of the University of Thessaly can be used to make a nice population pyramid simply and quickly. And provided along with the pyramid: mean age, median age, sex ratio,
dependency ratio, an aging index, and proportions of total population for selected age groups. Terrific!
--March 2012
^
From the linked workbook: "Even though recoding a variable into a new one, is a simple task when using statistical analysis software in Excel is not a straight foreword procedure.
We present here a simplified example of creating population size bands for Greek Municipalities (1034, in 2001)."
--March 2012
^
From the Modgen website: "Modgen (Model generator) is a generic microsimulation programming language supporting the creation, maintenance and documentation of
dynamic microsimulation models. Several types of models can be accommodated, be they continuous or discrete time, with interacting or non-interacting populations."
--February 2012
^
From the Census Bureau website: "The example below [on the linked website] of migration expectancy is calculated using the population and number of nonmovers
in the previous year by age estimated from the 2007 American Community Survey (ACS). These numbers are used to calculate an average mobility rate for each age
group for the period selected (step 1). The number of movers appears in column L and the mobility rate (Rx) is automatically calculated in column B (step 2).
A standard life table provides the expected population at the beginning of the age interval per 100,000 births (lx). The stationary population (Lx) is the total
number of people still living in the age interval per 100,000 births as of the date of the NCHS - Life Tables, in this case, 2004 (step 3). The average mobility
rate (Rx) is multiplied by the stationary population (Lx) to obtain the expected number of moves for each age interval (column E). These expected movers are cumulated
across each age group from oldest to youngest (TMx), and then divided by the population still living per 100,000 born (lx), to obtain the average expected number of
moves remaining for people in that age group (step 5)."
--February 2012
^
High quality web-based data visualization: From the Google Fusion Tables tour:
"Google Fusion Tables is a modern data management and publishing web application that
makes it easy to host, manage, collaborate on, visualize, and publish data tables online."
--February 2012
^
Shapefiles are used in geographic information systems (GIS) software to hold/link maps and spreadsheets. If you have a shapefile, but no way to check it out,
ShpToFusion will allow you to import it to Google Fusion tables to visualize and modify.
From the front page: "This website lets you import a shapefile to
Google Fusion Tables.
This
blog post has some details on how it was built."
--February 2012
^
From the
User Guide:
"This tool allows users to load tables from the American Community Survey Summary File into an Excel spreadsheet, then sort or manipulate
it as needed. This tool works best for users who need data for a few tables. Users who need data for more than 20 tables are encouraged to access the ACS Summary
File directly on the FTP site at
http://www2.census.gov/."
--February 2012
^
From the front page: "This site represents the major output arising from a joint IUSSP and UNFPA project to produce a
single volume containing updated tools for demographic estimation from limited, deficient and defective data. The material
here follows in a direct line of descent from Manual X and subsequent works (for example, the 2002 UN Manual of Adult Mortality
Estimation). The principal aspect of this website is a series of (mostly) static webpages describing and documenting the tools
and methods of demographic estimation from limited, deficient and defective data. The material is organised thematically first,
and then by the kinds of data that may be available. Where appropriate, downloadable spreadsheets are provided that allow users
to apply the methods to their own data. Forums are available to discuss and debate methods and results, and FAQs describe how to
use the site in more detail. Links to both, as well as forms to contact us, are at the top right hand side of each page."
--February 2012
^
From the
BAZI Manual:
"BAZI is a freely available Java-program implementing various apportionment methods for proportional
representation systems. It offers divisor methods as well as quota methods. The extensive database permits to investigate the
merits of the different apportionment methods on the basis of empirical data. A particular feature of BAZI is that it offers
three options for multiple electoral districts. The user may choose between (1) separate evaluations for each district, (2)
biproportional apportionments using divisor methods, and (3) a variant of the latter that is specifically tailored to the needs
of the new Zurich electoral law of 2003. For these matrix apportionment methods thirteen algorithms and hybrid combinations of
them are available. The BAZI program also offers to include minimum and maximum restrictions for vector and matrix apportionments.
We present several empirical examples where these restrictions are essential."
--February 2012
^
Percentile calculation is an often-necessary task for a research office, and depending on the type of data (grouped, complete, big, small, etc.),
it's not always straight-forward. Luckily, Kim Dunstan (Senior Demographer at
Statistics New Zealand)
has developed and shared the above-linked Excel spreadsheet macro to make it very easy for everyone. It also provides mean and standard deviation. A terrific resource.
--October 2011
^
Given continuous data that is aggregated into groups (such as age groups or income groups), the median is a little tricky to calculate.
Linked above is an Excel spreadsheet that
was developed by Steve Doig, Bob Hoyer and Meghan Hoyer to do it simply.
There are a couple equivalent calculators on this page, but because different folks like different formats, I'm very glad it was sent my way to share.
--October 2011
^
Measures of estimate and forecast accuracy are essential tools for applied demography, because they provide context for what level of accuracy should be expected,
and benchmarks for improvement in accuracy. The
Mean Absolute Percent Error
(MAPE) is one of the most often used measures of cross-sectional demographic estimate and forecast accuracy, but it can have values that are influenced too heavily
by outliers. In response to this shortcoming, several folks (including
David Swanson,
Jeff Tayman, Charles Barr, Chuck Coleman and Tom Bryan) have worked to develop and review a measure called MAPE-R, which preserves information from outliers,
but normalizes that information through a Box-Cox transformation. Above is a link to an
Excel
spreadsheet macro by Tom Bryan, which can be used to calculate MAPE-R for a given collection of Absolute Percent Errors (APE's).
Here
is a link to an early (1999) paper describing MAPE-R, and
here
is a link to a recent (2011) paper describing it.
--September 2011
^
Iterative proportional fitting (IPF) (also called raking, sample balancing, rim weighting, iterative proportional scaling, etc.) is a simple and handy technique
that is used for adjusting a table of data cells such that they add up to selected totals for both the columns and rows (in the two-dimensional case) of the table.
Among other things, it is commonly used in survey research and analysis. Linked above is a great
SAS
macro for two-dimensional iterative proportional fitting, developed by Webb Sprague, a demographer with the State of Washington's
Office of Financial Management.
Please note this code was developed recently and has only been tested with some sample data.
A related website with links to informal explanations of IPF, some more code and some published papers,
is available
here.
--September 2011
^
Carl Schmertmann is a demographer who (among other things) is leading work in the development and application of new methods.
The abstract from Schmertmann's (2003)
Demographic Research
paper describing this neat work: "I propose and examine a new family of models for age-specific fertility schedules,
in which three index ages determine the schedule's shape. The new system is based on constrained quadratic splines. It has easily interpretable parameters,
is flexible enough to fit a variety of "noiseless" schedules well, and is inflexible enough to avoid implausible estimates from noisy data. Across a set of over
two hundred contemporary ASFR schedules, the new model fits a majority better, and in some cases much better, than the Coale-Trussell model. When fit to a recent
Swedish time series, model parameters exhibit simple, regular changes over time, suggesting utility in forecasting applications. In simulated small-sample data the
new model produces plausible ASFR estimates, with errors similar to Coale-Trussell."
Here
is a link to Demographic Research's home for the paper, which also hosts a great
Excel
version of Carl's model.
--August 2011
^
This ongoing work by
Joseph Potter,
Carl Schmertmann,
Renato Assunção
and Suzana Cavenaghi to analyze the "timing, pace and scale" of Brazil's fertility transition includes
R
and
WinBUGS
code with data for mapping and Bayesian modeling. Links to terrific mapping output are included.
--August 2011
^
This work by
Joseph Potter,
Carl Schmertmann and
Renato Assunção
provides
R code for
"adapting space-time epidemiological statistics to demographic studies." Some cool mapping output is included.
Here
is a link to a paper that was published on the work in the renowned journal
Demography.
--August 2011
^
From the front page of the site: "The Human Fertility Database (HFD) is a joint project of the Max Planck Institute for Demographic Research (MPIDR) in Rostock,
Germany and the Vienna Institute of Demography (VID) in Vienna, Austria, based at MPIDR. We seek to provide free and user-friendly access to detailed,
well-documented and high-quality data on period and cohort fertility and thus to facilitate research on changes and inter-country differences in fertility in the
past and in the modern era." Similar to the
Human Mortality Database. Beautiful!
--August 2011
^
This is an invaluable web address with directory trees for complete
US Census Bureau
datasets (decennials back to 1980, American Community Surveys, economic censuses, etc.).
Folks will sometimes refer to it simply as "the ftp site." Everything important in one place.
--August 2011
^
From the main page: "This PDE Population Projections software
can be used as Population Module in multi-sector Population-Development-Environment (PDE) analysis. This tool can also be used independently of PDE analysis
for simple and multi-state population projections in case of several states that interact with each other. The states can all be defined by the user. They can be regions, educational categories, ethnic or language groups, or other user-defined dimensions. The computer's memory is the only limit to be number of states that
can be dealt by the software. It can easily handle 8-10 distinct states (for 5-year age groups) depending on the length of the projection period and the number of
age groups."
--August 2011
^
From the main page: "Rural and Urban Projection (RUP) is a computer program for performing cohort component projections on one or two areas. The cohort component
method projects each age and sex cohort over time based on the components of growth. Annual births create new cohorts, while existing cohorts are decreased by
mortality and either increased or decreased by migration. The RUP program has features that allow a considerable amount of flexibility for specifying fertility, mortality, and migration. As a result, RUP can be used to produce estimates for years where data on the components are available followed by projections into the future."
--August 2011
^
From the main page: "The Census and Survey Processing System (CSPro) is a free software package used by hundreds of organizations and tens of thousands of
individuals for entering, editing, tabulating, and disseminating census and survey data. CSPro is designed to be as user-friendly as possible, yet powerful
enough to handle the most complex applications. It can be used by a wide range of people, from non-technical staff assistants to senior demographers and
programmers. CSPro is used primarily for data entry, editing, tabulation, and dissemination. While some organizations use CSPro in conjunction with other
statistical packages, CSPro can also be used as the sole program for processing census or survey data. For example, an organization can collect data using
tablet computers with CSPro software or use the data entry tool to key results from paper questionnaires. After data collection, an organization can edit and
impute data in CSPro before preparing appropriate analytical tables with the tabulation tool. Finally, an organization can use CSPro to automatically generate a
website for disseminating results."
--August 2011
^
From the CDC WONDER
FAQ page: "What is CDC WONDER?
Wide-ranging OnLine Data for Epidemiologic Research (WONDER) -- is an easy-to-use
internet system that makes the information resources of the Centers for Disease Control and Prevention (CDC)
available to public health professionals and the public at large. It provides access to a wide array of public
health information. CDC WONDER furthers CDC's mission of health promotion and disease prevention by speeding and
simplifying access to public health information for state and local health departments, the Public Health Service,
and the academic public health community. CDC WONDER is valuable in public health research, decision making, priority
setting, program evaluation, and resource allocation." The site includes detailed and easy-to-extract population,
mortality, morbidity and fertility data.
--July 2011
^
Tim Riffe, a Doctoral research fellow at the
Center for Demographic Studies (CED)
in Barcelona, has been putting together the linked
collection of demography tools for
R, along with unique and terrific
blog entries
that deal with pieces of his work. His site is catnip for me,
and probably anyone interested in demography and/or R. I was very glad for the kind notice that Tim gave the "Toolbox" site I have here.
Quoting from his site, here is a list of R packages he has built and added as of this posting:
" LexisSurface is a package containing a function to plot demographic surfaces consisting in Lexis triangles.
There are also 4 different functions available for splitting mortality Mx data from Lexis squares into Lexis triangles
for purposes of plotting. These methods are untested; some will be improved, others dropped in the future. Will plot
using any color ramp. Flexible legend sizing and positioning. The function plots both logged and not-logged data well.
Includes example fertility and mortality data and several plotting examples. [still working on getting good legend labels-
they are accurate but may overlap at times]
HMDget is a function for reading Human Mortality Database data into R in different formats. You can merge countries
and years, and output is available in a few different predefined formats. Database access either local or via the web (based
on Carl Boe's HMD2R function).
DecompHoriuchi offers the function DecompContinuous(), a generic decomposition program for a wide variety of functions, and no limits
to the number of covariates they may have.
Lotka, a small set of formal demography functions, including estimates of r (3 strategies), calculations of R0, T (mean generation time),
age-survival-fertility decompositions of differences in r and R0, and a Kitagawa-ish decomposition of differences in R0. Examples are included,
which use a dataset from the Spain 1975 and 1998. See examples in help files. (added 7 March, 2011) I'll be making several changes to this package soon.
RateSketch() is for hand-sketching demographic (or any) age-specific rate patterns. Just define the x and y limits and click the function
from left to right. The function returns a list with the points you clicked, plus values interpolated to your desired x-values (argument = xnew)
using both loess and spline methods. See examples. I've used this tons for generating fake data to practice other demographic functions on. Documentation
to be improved, as well as bugs involving logged y scale.
EZLex is a package with two functions for drawing Lexis diagrams (think presentations and teaching materials). Includes Lexis() and
Thighlight(). Try the examples, they're easy.
LifeTable contains a main function LT(), which does the whole basic lifetable spiel, taking either Nx and Dx or else Mx as it's basic
arguments. This package also contains example data for Ukraine males, 1965, coming from the HMD, as well as 4 different a(x) estimation methods
("keyfitz", "schoen", "preston" and "midpoint"), all of which were modified by me in some minor way (apologies to the namesakes). See the
documentation and examples for further details. It accepts data up to any age, in single-ages or five-year abridged data. Also optional
smoothing (using Giancarlo's MortalitySmooth package). Returns all sorts of demographic age-functions and a few different measures.
May contain bugs, especially with 5-year abridged data, which I haven't used much. I'm thinking of including summary and plot methods in the
next release version (no hurry though)
Pyramid provides a simple wrapper with several defaults to quickly plot a population pyramid, and with simple
detection and plotting of multistate pyramids. The function also gives optional absolute or percent scales, with
flexible age-group widths, and optional generation labels on the right axis. Two example datasets from the Human
Mortality Database are given to demonstrate plotting with single-age versus grouped ages.
"
Can not beat that my friends.
--July 2011
^
I recently came across MAPLES
(R code) due to the
intriguing title and write-up
by
Roberto Impicciatore
and
Francesco Billari,
published in
Demographic Research.
From the MAPLES package site: "MAPLES is a general method for the estimation of age profiles that uses standard micro-level demographic survey data.
The aim is to estimate smoothed age profiles and relative risks for time-fixed and time-varying covariates." I look forward to applying it myself.
--July 2011
^
The Human Mortality Database
is extensively useful, neatly organized and well maintained, and now there is an
R
function developed by Carl Boe (the terrific research demographer with
HMD and
CEDA at
my favorite department)
that allows users to easily pull its data directly from the web into R.
Note it uses the "RCurl" library, which is perhaps most simply installed with "biocLite" (notes within).
--June 2011
^
The
cohort component method
is a standard and essential procedure for demographic research, and
Tim Chapin
of Florida State University has developed this neat and easy-to-use
Excel
workbook to carry it out. Many folks are looking for a good cohort component projection model spreadsheet, and I think this is an outstanding one--
very glad that Prof. Chapin put this resource together for his students, and that he's provided it to share here.
--May 2011
^
Spectrum is a suite of great and easy-to-use models for policy research that includes DemProj, which carries out cohort-component population
projections. From the Spectrum website: "DemProj projects the population for an entire country or region by age and sex, based on assumptions about
fertility, mortality, and migration. A full set of demographic indicators can be displayed for up to 50 years into the future. Urban and rural projections
can also be prepared. A companion model, EasyProj, supplies the data needed to make a population projection from the estimates produced by the Population
Division of the United Nations." While data is provided for many nations, users can also easily put in their own detailed data.
--April 2011
^
German Rodriguez is Senior Research Demographer at the renowned
Office of Population Research
at Princeton University.
From
his home page:
"In the Spring of 2006 I taught Research Methods in Demography. The demography section of the website has handouts that use
Stata to do demographic calculations organized under 12 different topics ranging from rates and standardization to stable
populations. There are also four problem sets with solutions." Demography through Stata-- terrific.
--April 2011
^
From the
Oklahoma Department of Commerce: Demographics and Population Data
website: "The Census Bureau's American Community Survey (ACS) is a great source for detailed population and household characteristics,
giving estimates and margins of error for many aspects of the American population. However, sometimes users must blend ACS
results in order to arrive at the exact measure needed - hard to do correctly if users are not familiar with statistics.
This Excel based file contains several 'calculators' that let users enter official ACS results from the Census Bureau and come
away with approximations of the true measure needed."
Calculators like this terrific one developed by Steve Barker are essential tools for folks who are regularly
combining American Community Survey (ACS) estimates.
Many thanks to the
Oklahoma Department of Commerce
for sharing it with the world.
--March 2011
^
Usually I don't include tools that (may) cost money in this list, but this one is connected to some very neat, broadly useful and publicly available
works (see below), so: Population data is often best viewed on a map, and Social Explorer maps population data.
Among other things, this software was used well by the
New York Times
to make
this
terrific application for mapping the
US Census Bureau's
American Community Survey data,
this one
for mapping 2010 Census data, and
this one
for mapping Census Bureau data on US immigration since 1880-- these works are amazing to see.
--March 2011 (updated August 2011)
^
Human migration often follows a somewhat clear and interpretable pattern by age, and very good multi-parameter models for the pattern have been developed
and described over the past few decades
(Rogers and Castro, 1981).
Recently,
Tom Wilson
of the University of Queensland authored a terrific
paper
on incorporating a student peak into a model migration schedule, and linked above is a very handy Excel workbook from that paper. I've found
that this workbook is very useful for understanding and applying both his (often essential-) student peak, and model migration schedules in general.
--March 2011
^
Measures of estimate and forecast accuracy are essential tools for applied demography, because they provide context for what level of accuracy should be expected,
and benchmarks for improvement in accuracy. The
Mean Absolute Percent Error
(MAPE) is one of the most often used measures of cross-sectional demographic estimate and forecast accuracy, but it can have values that are influenced too heavily
by outliers. In response to this shortcoming, several folks (including
David Swanson,
Jeff Tayman, Charles Barr, Chuck Coleman and Tom Bryan) have worked to develop and review a measure called MAPE-R, which preserves information from outliers,
but normalizes that information through a Box-Cox transformation. Above is a link to a
SAS
macro adapted from
this
paper by Chuck Coleman, which can be used to calculate MAPE-R for a given collection of Absolute Percent Errors (APE's).
Here
is a link to an early (1999) paper describing MAPE-R, and
here
is a link to a recent (2011) paper describing it.
--March 2011
^
From the
Netherlands Interdisciplinary Demographic Institute's
LIPRO site:
"Changes in household structure may have profound consequences for a wide range of areas in demography and social policy. Household projection models developed in demography over the past few decades are primarily of the headship rate type, in which the dynamic processes of household formation and dissolution which underlie changes in household structure, essentially are treated as a black box. Back in 1988, NIDI started a pioneering study in order to develop a dynamic household projection model which explicitly focusses on the flows underlying household changes. The model, called LIPRO ('LIfestyle PROjections'), is based on the methodology of multistate demography, but includes several extensions to solve the particular problems of household modelling.
Although originally developed for household projections, the LIPRO computer program can in fact be used for a wide range of calculations in multistate demography. And indeed, LIPRO has been extensively used for various applications, in the Netherlands as well as in many other countries." I have not
used the above-linked LIPRO software myself, but am confident that it is very valuable due to the many projects that it has been used for
(listed on the LIPRO site). I don't know of any other publicly shared/free software that offers direct steps for multistate population projection.
--March 2011
^
Population forecasts, and all forecasts, boil down to a collection of assumptions based on some sort of knowledge. For a good understanding of
a forecast, it's valuable to review and understand the forecast assumptions, and the effect of differing assumptions.
To allow this for European countries, the
Netherlands Interdisciplinary Demographic Institute
developed the above-linked PopTrain, a web-based demographic
simulation program that (1) provides population forecasts for 31 European countries, and (2) allows users to easily change assumptions on these forecasts.
An outstanding resource for planning and instruction.
--March 2011
^
Given population data that is aggregated by age groups, the median age is a little tricky to calculate. Linked above is an Excel spreadsheet that
was developed to show students how to do it in simple steps. The spreadsheet was made by
Tom Wilson,
a demographer at the University of Queensland who has contributed a great deal in recent years to advanced work in applied demography.
--March 2011
^
Interpolation and curve-fitting are very useful tools for demographic modeling (particularly for the modeling of age specific fertility, mortality and migration),
and XlXtrFun provides these in the form of simple Excel functions.
From the XlXtrFun site: "XlXtrFun.xll is a collection of functions which extends the capabilities of Microsoft Excel; developed primarily to facilitate,
interpolation of 2-dimensional and 3-dimensional data, and simplify 2-variable curve fitting. XlXtrFun has been used for years by engineering and research
and development personnel on every continent who need to interpolate, extrapolate, and curve fit data rapidly, reliably, and with a virtually non-existent
learning curve.
If you work with real-life data and want to interpolate, extrapolate, or curve fit, then you will find these functions very useful."
--March 2011
^
Population pyramids display population by age and sex, and thus are a favorite tool for demographers.
Several years back I took a class in which we used instructions from the
Population Reference Bureau (PRB)
to create population pyramids in Excel. I have since used the template I made many times, and linked above is a .zip file with versions of it
(one for Excel 2003, and one for Excel 2007). No citation is necessary for use of these pyramid spreadsheets. (Note there is also the
R function developed by Carl Mason for population pyramids
in the "Stochastic Population Forecast Code" listed on this page.)
--February 2011
^
Population projection with the
"cohort-component"
method is a key part of demographic research.
Linked above is
R
code that I developed for a basic cohort-component projection/forecast (applied to Alaska).
To see the code work, and a selection of output-graphics, you should be able to simply select-all of the code, copy, and paste into an R command line
(input data (csv format) is linked through the Internet, and can be simply downloaded for review). This code is based on the
Stochastic Population Forecast Code, also listed on this page.
A description of the work for the Stochastic Population Forecast Code is available
here. If you would like to apply this
basic projection code to another area (probably not hard at all), and have any questions, don't hesitate to shoot me an email at
edyhsgr@gmail.com.
If you have access to JSTOR,
here
is a neat (and historically significant) article from 1895 on population projection by age (and check out the graphic on page 509).
--February 2011
^
The "Lee-Carter Model"
is the most useful and well-known innovation in mortality modeling and forecasting in recent decades.
It is noted for (among other things) finding an index of mortality data (called k(t)) that usually has a (roughly-)linear time-trend.
A drawback to it is that steps to estimate and forecast its
parameters are not simple. Fortunately, the great folks at the
Center on the Economics and Demography of Aging (CEDA),
under the supervision of
Ron Lee,
have developed a web-based program (with Python) to estimate, forecast and graph Lee-Carter parameters and life expectancy
from mortality data that users paste in (just need mortality rates by age). Webb Sprague, a demographer who is now working with the
State of Washington, deserves a lot of credit for this work. If you would like to use LCFIT (not hard or time consuming to make it work, and they
provide example data), and/or have any questions, don't hesitate to send the LCFIT developers an email at
lcfit@demog.berkeley.edu.
--February 2011
^
Rob J Hyndman
is a current leader in demographic forecasting who
(with contributions from
Heather Booth,
Leonie Tickle and
John Maindonald)
put together the above-linked demography package for
R.
From the linked page:
"The demography package for R contains functions for various demographic analyses.
It provides facilities for demographic statistics, modelling and forecasting.
In particular, it implements lifetable calculations; Lee-Carter modelling and variants;
functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic
population forecasting."
--February 2011
^
I think this is the best single source for socio-economic survey microdata, which are necessary for countless projects. From the front page:
"IPUMS-International:
Harmonized data for 1960 forward, covering 326 million people in 159 censuses from around the world."
"IPUMS-USA:
Harmonized data on people in the U.S. census and American Community Survey, from 1850 to the present."
"IPUMS-CPS:
Harmonized data on people in the Current Population Survey, every March from 1962 to the present."
Fantastic.
--February 2011
^
Residential construction data can be very useful for local total population and housing unit estimates, which are useful for innumerable things.
Above is a link to a database of monthly and annual building permits by US place (municipality) and county, from the
US Census Bureau's
"Survey of Residential Construction."
--February 2011
^
If you get a request for demographic data in the US, or if you have your own request for demographic data in the US, there is a good chance that you can
make use of the
US Census Bureau's
American Factfinder. American Factfinder has detailed data from decennial censuses of the United States, the Census Bureau's
Population Estimates Program (the most reliable
federal source for annual population estimate data) and the
American Community Survey.
When you think of all the data they provide, you realize the Census Bureau is a pretty amazing organization.
--February 2011
^
Geocoding of addresses (such as those for housing units) is useful/necessary for a lot of applied demography projects, and
Steve Morse's
program makes the chore much easier. If you paste a table of addresses into the "Addresses" pane, you get back a table with the lat/long coordinates in the
"Latitude, Longitude" pane. Easy-peasy.
--August 2011
^
Standard errors are an important component of estimates from the American Community Survey, but using, combining and interpreting them is often not
intuitive. Fortunately, the folks at the
Cornell Program on Applied Demographics
have created an ACS calculator that can be used to test for significant difference between two values, and to combine two values. I'm sure that
Jan Vink,
a Research Support Specialist at PAD, deserves much credit for these great tools.
--February 2011
^
Mapping of points (lat/long points, such as those that a county assessor might have for housing units) is useful for a lot of applied demography
projects, and the
Cornell Program on Applied Demographics
Mapper makes it very easy. If you paste a table of lat/long coordinates into the Mapper window, you get back a map of those points,
The Cornell PAD Mapper was useful to me when working on the 2010 Census "Group Quarters Count Review" program. As with the Cornell PAD's ACS Calculator
(described just above this text block), I'm sure that
Jan Vink deserves much credit for this great tool.
--February 2011
^
You can't beat this.
From the front page of the site: "The Human Mortality Database (HMD) was created to provide detailed mortality and population data to researchers,
students, journalists, policy analysts, and others interested in the history of human longevity. The project began as an outgrowth of earlier projects
in the Department of Demography at the University of California, Berkeley, USA, and at the Max Planck Institute for Demographic Research in Rostock,
Germany. It is the work of two teams of researchers in the USA and Germany,
with the help of financial backers and scientific collaborators from around the world."
--February 2011
^
Life tables are the foundation of demography, and above is a link to a reproduced period life table function/script
(R) that was used at the 2006
Stanford Formal Demography Workshop,
and that I've used as a template for constructing my own life tables in R.
To make it work, you should be able to simply select-all of the code, copy, and paste into an R command line (example input data (csv format) is linked
through the Internet, and can be simply downloaded for review). If you would like to plug in data for another life table (simple), and have any questions, feel
free to send me an email at
edyhsgr@gmail.com.
Here
is a link to the original posting of the function (I reproduced it to allow an immediate link to input data, so that potential users can quickly review it).
--February 2011
^
Stochastic population forecasts typically use a large collection of randomized, realistic future scenarios for births, deaths and migration to estimate a
probability distribution for future population. Time series models are usually used to create the random scenarios, but, for many places, data to empirically
develop reliable time series models are unavailable. Linked above is
R
code that I developed for an expert-based stochastic population forecast for Alaska, using time series models from reasoned ranges of random coefficients.
To see the code/forecast work, and a selection of output-graphics, you should be able to simply select-all of the code, copy, and paste into an R command line
(input data (csv format) is linked through the Internet, and can be simply downloaded for review). The related paper with full description of the work is available
here. If you would like to apply this
forecast code to another area (probably not hard), and have any questions, don't hesitate to shoot me an email at
edyhsgr@gmail.com.
--January 2011
^
Iterative proportional fitting (IPF) (also called raking, sample balancing, rim weighting, iterative proportional scaling, etc.) is a simple and handy technique
that is used for adjusting a table of data cells such that they add up to selected totals for both the columns and rows (in the two-dimensional case) of the table.
Among other things, it is commonly used in survey research and analysis. Linked above is a .zip file with
R
code, supporting files and instruction/documentation to perform two-, three- and four-dimensional IPF.
This code has been used in a number of projects, including
this
research on computer and Internet use at U.S. public libraries,
this
work on employment flows in the U.S., and
this
guide for creating small-area cross tabulations.
Though I did pieces of the development for the IPF functions, the heavy lifting (the bulk of the 2D and 3D functions) was done by
Nels Tomlinson,
my predecessor at the
Alaska Department of Labor and Workforce Development.
A related website with links to informal explanations of IPF, some more code and some published papers, is available
here.
--February 2011 (updated September 2011)
^
Detailed and reliable age by sex population estimates are a starting point for demographic research and analysis, but, due to limited up-to-date-survey-data availability,
in some cases it's difficult to make such estimates. For this reason, simple, residual-based methods (a well-known example being the
Hamilton-Perry Method)
have been developed over the years for estimation and projection of population by age and sex. Linked above is simple
R code that I developed recently to make annual inter-censal and
post-censal population estimates by sex and single year of age, given only the counts by age and sex from the last two censuses, a generic gross migration profile,
a generic life table, and annual counts for births and net migration. I've recently learned that what the code does is either the same, similar or equivalent to a
"plus-minus"
technique. To run the code for Douglas County, Colorado, and see a selected output-graphic, you should be able to just select-all, copy and paste into an R
command line (input data (csv format) is linked through the Internet, and can be simply downloaded for review).
Here
is a link to code to run all Colorado counties at once. This code might be particulary useful for very small areas, perhaps controlled up to counties-- I believe
Jack Baker, a Senior Research Scientist at
"UNM-BBER",
uses a very similar technique to estimate Census Tract population by age and sex.
I hope to make a write-up with some review of accuracy for this work in the near-future. If you have questions for how to plug in data for another area (simple),
or have any ideas/tips for the work, don't hesitate to send me an email at
edyhsgr@gmail.com. ***Please note that I've found problems with this code (see Step 11), and it
is a work in progress.***
--February 2011 (updated September 2011)
^
There are
three often-used methods
for annually estimating the total population of local areas in the United States: (1) Administrative Records (including Component Method II),
(2) Housing Unit, and (3) Ratio-Correlation (a regression-based method). Linked above is simple
R
code to create Ratio-Correlation population estimates for Colorado counties. To run the code for Colorado, and see selected output-graphics,
you should be able to simply select-all, copy and paste into an R command line (the input data (csv format) is linked through the Internet, and can be
simply downloaded for review).
If you would like to plug in data for another area (simple), and have any questions, don't hesitate to shoot me an email at
edyhsgr@gmail.com.
The related paper with full description of the Ratio-Correlation Method and its application to Colorado counties is available
here.
Here
is a link to a draft chapter on Ratio-Correlation for the forthcoming book Subnational Population Estimates by Swanson and Tayman-- can't beat that.
Also, if you have access to JSTOR,
here
is an amazing paper from 1911, by a researcher named E.C. Snow, that uses multiple correlation (probably for the first time) to estimate population, and
here
is a terrific paper by Schmitt and Crosetti (1954), which gives one of the first descriptions
of the actual Ratio-Correlation method for total population estimation.
--January 2011 (updated March 2011)
^
The median age of a population is an important, often-used, and tricky-to-calculate figure. One day I searched the Internet for typical steps used to
calculate the median age for a given age profile, and came across
this
terrific site with instruction. I then made the above-linked
R
function/script to carry the steps out automatically for fixed age-group-sizes. Looking back at it, I'm not sure if it's as efficient as it should be, but
it was fun to make and seems to do the job (empirically tested). To see the function/script work, you should be able to simply select-all
of the code, copy, and paste into an R command line (example input data (csv format) is linked through the Internet, and can be simply downloaded for review).
If it's somewhat confusing, that's my fault, and you
can certainly send me an email at edyhsgr@gmail.com if you have any questions.
--January 2011
^
In addition to the Brass Relational Logit Model for mortality (which has some description just below this text-block),
William Brass
made many lasting contributions to demography. One of these was the
Brass Relational Gompertz Model
for fertility. The Brass Relational Gompertz Model is used to adjust the earliness and width of a proportional (sum to 1) age specific fertility rates profile,
based on the earliness and width of another age specific fertility rates profile. I believe this ability for adjustment might be useful for many things, including (1)
fertility profile estimation with limited data, and (2) fertility profile forecasting. It should be noted that (1) there is "[no] behavioral interpretation"
(from Demography: measuring and modeling population processes by Preston, Heuveline, Guillot) for the Brass Relational Gompertz model,
and (2) a third parameter for a fertility profile would simply be the TFR, which would be multiplied by a proportional
fertility profile to change the overall level of fertility.
Here
is a link to an early paper on the Brass Relational Gompertz model, by
Heather Booth
a professor at the The Australian Demographic and Social Research Institute. Linked above is an
R
function/script to apply the Brass Relational Gompertz model, and make selected output graphics. No care was taken in the selection of the standard profile, and
the code has not been tested (please let me know any problems you find, of course). To make it work, you should be able to simply select-all
of the code, copy, and paste into an R command line (example input data (csv format) is linked through the Internet, and can be simply downloaded for review).
If you would like to plug in data for another proportional fertility profile (simple), and have any
questions, you can send an email to me at
edyhsgr@gmail.com.
--February 2011
^
I think William Brass'
most famous contribution to demography (he had many, and is surely among the most productive and revered demographers ever) is the
Brass Relational Logit Model
for mortality.
The Brass Relational Logit Model is used to simply and reasonably adjust the level and shape of a life table's lx (survivorship) curve, based on the level and
shape of another
life table's lx curve. This ability for a reasonable adjustment by two parameters is useful for many things, including (1) life table estimation with limited data,
where a life table with very large margins of error and an unreliable pattern across age can be modeled with a life table that has very small margins of error,
and (2) life table forecasting, in which the the level and shape terms (alpha and beta, respectively) can be projected over time. Brass alpha in particular has
been a
subject of interest for mortality forecasting,
as it is believed to change linearly over time in many cases (Brass beta is believed to often stay trendless-- I think particularly in
developed regions).
Linked above is an
R
function/script to apply the Brass Relational Logit model, and make selected output graphics. To make it work, you should be able to simply select-all
of the code, copy, and paste into an R command line (example input data (csv format) is linked through the Internet, and can be simply downloaded for review).
If you would like to plug in data for another life table (simple), or to consider Brass' mortality model for a particular project, and have any questions, don't
hesitate to send me an email at
edyhsgr@gmail.com.
--February 2011
^
From the PASEX website: "The U.S. Census Bureau
developed the Population Analysis System (PASEX) to enhance the process of analyzing available population data. PASEX is a set of spreadsheets containing frequently used procedures and
methods in basic demographic analysis. PASEX spreadsheets include demographic methods for the following main topics: age structure, mortality, fertility, migration, distribution of population,
urbanization, and population projections. PASEX consists of spreadsheets formatted for use in Excel, so users will need to have access to Excel to use PASEX. The spreadsheets and the special
program RUP (a population projection program for rural and urban projections) are distributed together with the manuals describing the basic demographic methods and procedures they perform.
The United Nations' MORTPAK is a set of programs for mortality analysis that can be used with PASEX and RUP for analyzing population data. The user should obtain this package through the
United Nations. These demographic analysis tools allow users to produce national and subnational population estimates and projections along with estimates and rates of fertility, mortality,
and migration." Just a few of the (45!) tools are: an open ended age group distribution estimator, an indirect infant mortality estimator, a relational-Gompertz fertility modeler, and a stable-population
calculator. You can't beat that.
--January 2011 (updated August 2011)
^
Instruction and description from the classic text by Shyrock and Siegel. (Note there is also the great and updated later edition by Siegel and Swanson.)
--February 2011 (updated August 2011)
^
A great annotated population estimation bibliography by
David Swanson,
a demographer who has been a leader and primary contributor for applied demography over the past 30 years.
--February 2011
^
Michael Hartmann led
Statistics Sweden's
demographic research, and provided a wealth of high quality products. I really enjoy this general demography text
that he created, for at least a couple of reasons: (1) I think the instruction is both detailed and clear, and (2) it leads with some history, and context and
perspective for the field, which I think are essential for users of demographic methods (so that we understand the breadth (and limits) and goals of our work).
--February 2011
^