Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f665508
WIP: add standardized sse as default
Oct 28, 2022
cf8b574
Test for minimize standardized sse
Oct 28, 2022
fdb73a6
Add: standardized distance default, non-standard option
Oct 28, 2022
8e26e27
Rename option nostandard
Oct 28, 2022
3e98073
Update inputs of calipmatch
Oct 28, 2022
7eca8f5
Update syntax
Oct 28, 2022
fc648cc
Update help file
Oct 28, 2022
f693627
Update formatting of help file
Oct 29, 2022
6363b38
Update formatting and name of option
Oct 29, 2022
4691c9a
Fix syntax error and sse test
Oct 29, 2022
048ca8c
README: update embedded calipmatch.sthlp
OppInsights-Bot Oct 29, 2022
ae2220b
Update: use su+gen instead of egen,undo changes to _calipmatch function
Nov 14, 2022
b260505
Format: add spaces between some lines
Nov 14, 2022
7769d5f
Temporary test and required modification to ado file
Nov 15, 2022
25761d1
Fix comment on ado
Nov 15, 2022
1a487be
Merge branch 'temp-second_metric' into second_metric
Nov 15, 2022
badfaf9
Format edits to test file
Nov 15, 2022
0af2b16
Seems to work
Nov 15, 2022
02fee72
WIP: testfor scale differences
Nov 15, 2022
f20af13
Add: test matches are scale invariant
Nov 15, 2022
c3b3954
Add test scale invariance of matches
Nov 15, 2022
7f644e9
Format test of scale and shift invariance
Nov 15, 2022
7a8b8a0
Tweak formatting & efficiency of standardizing code
michaelstepner Nov 16, 2022
e208361
Further formatting tweaks
michaelstepner Nov 16, 2022
28d202a
Bug fix: capitalize NOstandardize option
Nov 16, 2022
f877349
Update syntax to nostandardize
Nov 16, 2022
4ae6eaf
README: update embedded calipmatch.sthlp
OppInsights-Bot Nov 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The help file can be explored interactively in Stata using `help calipmatch`.
<p>
<b>calipmatch</b> [<i>if</i>] [<i>in</i>]<b>,</b> <b><u>gen</u></b><b>erate(</b><i>newvar</i><b>)</b> <b><u>case</u></b><b>var(</b><i>varname</i><b>)</b> <b><u>max</u></b><b>matches(</b><i>#</i><b>)</b>
<b><u>caliperm</u></b><b>atch(</b><i>varlist</i><b>)</b> <b><u>caliperw</u></b><b>idth(</b><i>numlist</i><b>)</b> [<b><u>exactm</u></b><b>atch(</b>
<i>varlist</i><b>)</b>]
<i>varlist</i><b>)</b><b> nostandardize</b>]
<p>
<p>
<i>options</i> Description
Expand All @@ -44,6 +44,8 @@ The help file can be explored interactively in Stata using `help calipmatch`.
<p>
Optional
<b><u>exactm</u></b><b>atch(</b><i>varlist</i><b>)</b> list of integer variables to match on exactly
<b>nostandardize</b> distance using sum of squares; default is
standardized sum of squares
-------------------------------------------------------------------------
<p>
<p>
Expand All @@ -66,11 +68,11 @@ The help file can be explored interactively in Stata using `help calipmatch`.
<p>
The cases are processed in random order. For each case, <b>calipmatch</b>
searches for matching controls. If any valid matches exist, it selects
the matching control which minimizes the sum of squared differences
across caliper matching variables. If <b>maxmatches(</b><i>#</i><b>)</b>&gt;1, then after
completing the search for a first matching control observation for each
case, the algorithm will search for a second matching control observation
for each case, etc.
the matching control which minimizes the standardized sum of squared
differences across caliper matching variables. If <b>maxmatches(</b><i>#</i><b>)</b>&gt;1, then
after completing the search for a first matching control observation for
each case, the algorithm will search for a second matching control
observation for each case, etc.
<p>
<p>
<a name="options"></a><b><u>Options</u></b>
Expand Down Expand Up @@ -119,6 +121,10 @@ The help file can be explored interactively in Stata using `help calipmatch`.
This enables speedy exact matching, by ensuring that all values are
stored as precise integers.
<p>
<b>nostandardize</b> calculates distance between cases and controls using the
sum of squared differences. When specified, matches will be
sensitive to the scale of caliper variables. This can be used to
weight caliper variables.
<p>
<a name="saved_results"></a><b><u>Saved results</u></b>
<p>
Expand Down
34 changes: 27 additions & 7 deletions calipmatch.ado
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ human-readable summary can be accessed at http://creativecommons.org/publicdomai

program define calipmatch, sortpreserve rclass
version 13.0
syntax [if] [in], GENerate(name) CASEvar(varname numeric) MAXmatches(numlist integer >0 max=1) CALIPERMatch(varlist numeric) CALIPERWidth(numlist >0) [EXACTmatch(varlist)]
syntax [if] [in], GENerate(name) CASEvar(varname numeric) MAXmatches(numlist integer >0 max=1) CALIPERMatch(varlist numeric) CALIPERWidth(numlist >0) [EXACTmatch(varlist) nostandardize]

* Verify there are same number of caliper vars as caliper widths
if (`: word count `calipermatch'' != `: word count `caliperwidth'') {
Expand Down Expand Up @@ -88,9 +88,29 @@ program define calipmatch, sortpreserve rclass
tempname case_matches

if r(no_matches)==0 {
mata: _calipmatch(boundaries,"`generate'",`maxmatches',"`calipermatch'","`caliperwidth'")

if "`standardize'"=="" {
* Create standardized caliper vars (subtract mean, divide by SD)
local i = 0
foreach var of varlist `calipermatch' {
local ++i

tempvar std_`var'
qui sum `var' in `=_N-`insample_total'+1'/`=_N'
qui gen `std_`var'' = (`var' - r(mean)) / r(sd) in `=_N-`insample_total'+1'/`=_N'

local std_calipermatch `std_calipermatch' `std_`var''
local std_caliperwidth `std_caliperwidth' `=`: word `i' of `caliperwidth'' / r(sd)'
}

mata: _calipmatch(boundaries,"`generate'",`maxmatches',"`std_calipermatch'","`std_caliperwidth'")
}
else {
mata: _calipmatch(boundaries,"`generate'",`maxmatches',"`calipermatch'","`caliperwidth'")
}

qui compress `generate'

matrix `case_matches' = r(matchsuccess)
matrix `case_matches' = (`cases_total' - `case_matches''* J(rowsof(`case_matches'),1,1)) \ `case_matches'
}
Expand Down Expand Up @@ -150,19 +170,19 @@ void _calipmatch(real matrix boundaries, string scalar genvar, real scalar maxma
// Outputs:
// The values of "genvar" are filled with integers that describe each group of matched cases and controls.
// - r(matchsuccess) is a Stata return matrix tabulating the number of cases successfully matched to {1, ..., maxmatch} controls

real scalar matchgrp
matchgrp = st_varindex(genvar)

real rowvector matchvars
matchvars = st_varindex(tokens(calipvars))

real rowvector tolerance
tolerance = strtoreal(tokens(calipwidth))

real scalar curmatch
curmatch = 0

real scalar highestmatch
highestmatch = 0

Expand Down Expand Up @@ -239,7 +259,7 @@ void _calipmatch(real matrix boundaries, string scalar genvar, real scalar maxma

stata("return clear")
st_matrix("r(matchsuccess)",matchsuccess)

}

real matrix find_group_boundaries(string scalar grpvars, string scalar casevar, real scalar startobs, real scalar endobs) {
Expand Down
7 changes: 5 additions & 2 deletions calipmatch.sthlp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Create a variable indicating groups of matched cases and controls
{opt max:matches(#)}
{opth caliperm:atch(varlist)}
{opth caliperw:idth(numlist)}
[{opth exactm:atch(varlist)}]
[{opth exactm:atch(varlist)} {bf: nostandardize}]


{synoptset 23 tabbed}{...}
Expand All @@ -44,6 +44,7 @@ matching{p_end}

{syntab :Optional}
{synopt :{opth exactm:atch(varlist)}}list of integer variables to match on exactly{p_end}
{synopt :{bf: nostandardize}} distance using sum of squares; default is standardized sum of squares {p_end}
{synoptline}


Expand All @@ -67,7 +68,7 @@ variables when multiple valid matches exist.

{pstd}
The cases are processed in random order. For each case, {cmd:calipmatch} searches for matching controls. If
any valid matches exist, it selects the matching control which minimizes the sum of squared differences across
any valid matches exist, it selects the matching control which minimizes the standardized sum of squared differences across
caliper matching variables. If {opt maxmatches(#)}>1, then after completing the search for a first matching
control observation for each case, the algorithm will search for a second matching control observation for
each case, etc.
Expand Down Expand Up @@ -115,6 +116,8 @@ matching variables, they must also have identical values for every exact matchin
{it:int} or {it:long}. This enables speedy exact matching, by ensuring that
all values are stored as precise integers.

{phang}{bf: nostandardize} calculates distance between cases and controls using the sum of squared differences.
When specified, matches will be sensitive to the scale of caliper variables. This can be used to weight caliper variables.

{marker saved_results}{...}
{title:Saved results}
Expand Down
105 changes: 91 additions & 14 deletions test_calipmatch.do
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ program define test_calipmatch
if (_rc==0) {

* Assign arguments to locals using the same syntax as calipmatch
syntax [if] [in], GENerate(name) CASEvar(varname numeric) MAXmatches(numlist integer >0 max=1) CALIPERMatch(varlist numeric) CALIPERWidth(numlist >0) [EXACTmatch(varlist)]
syntax [if] [in], GENerate(name) CASEvar(varname numeric) MAXmatches(numlist integer >0 max=1) CALIPERMatch(varlist numeric) CALIPERWidth(numlist >0) [EXACTmatch(varlist) nostandardize]

* Store returned objects
local cases_total = r(cases_total)
Expand Down Expand Up @@ -345,30 +345,107 @@ replace income_percentile = 52 in 3
replace income_percentile = 41 in 4
replace income_percentile = 55 in 5

gen byte age = 40
replace age = 47 in 2
replace age = 55 in 4
gen int age_days = 14600
replace age_days = 17155 in 2
replace age_days = 20075 in 4

*----------------------------------------------------------------------------
* Valid inputs, test performance of matching algorithm
*----------------------------------------------------------------------------

gen float sse = (income_percentile - income_percentile[1])^2 + (age - age[1])^2
* matches minimize sum of normalized squares
egen std_income_percentile = std(income_percentile)
egen std_age_days = std(age_days)

gen float std_sse = (std_income_percentile - std_income_percentile[1])^2 + (std_age_days - std_age_days[1])^2
list

test_calipmatch, gen(matchgroup) case(case) maxmatches(1) ///
calipermatch(income_percentile age_days) caliperwidth(100 36500)

sum std_sse if case==0, meanonly
assert cond(_n==2, std_sse==r(min), std_sse!=r(min)) // test that obs 2 is global min

assert matchgroup == 1 in 2 // test that obs 2 is matched
assert matchgroup == . in 3/5

keep case income_percentile age_days

* matches minimize sum of squares when nostandardize is specified
gen float sse = (income_percentile - income_percentile[1])^2 + (age_days - age_days[1])^2
list

test_calipmatch, gen(matchgroup) case(case) maxmatches(1) ///
calipermatch(income_percentile age_days) caliperwidth(100 36500) nostandardize

sum sse if case==0, meanonly
assert cond(_n==3, sse==r(min), sse!=r(min)) // test that obs 3 is global min

assert matchgroup == 1 in 3 // test that obs 3 is matched
assert matchgroup == . in 2
assert matchgroup == . in 4/5

keep case income_percentile age_days

*============================================================================
* New dataset: two caliper matching variables, with scaling and a shift
*============================================================================

clear
set obs 2000

gen byte case=(_n<=200)

gen byte income_percentile=ceil(runiform() * 100)
gen byte age = 44 + ceil(runiform()*17)
gen int days_over_44 = (age - 44)*365

*----------------------------------------------------------------------------
* Valid inputs, test performance of matching algorithm
*----------------------------------------------------------------------------

* matches minimize sum of squares
test_calipmatch, gen(matchgroup) case(case) maxmatches(1) ///
calipermatch(income_percentile age) caliperwidth(100 100)
* matches are scale and shift invariant
set seed 4585239
set sortseed 789045789

sum sse if case==0, meanonly
assert cond(_n==2, sse==r(min), sse!=r(min)) // test that obs 2 is global min
test_calipmatch, gen(matchgroup_1) case(case) maxmatches(1) ///
calipermatch(income_percentile age) caliperwidth(5 3)

assert matchgroup == 1 in 2 // test that obs 2 is matched
assert matchgroup == . in 3/5
drop casecount matched_case control matched_controls

set seed 4585239
set sortseed 789045789

test_calipmatch, gen(matchgroup_2) case(case) maxmatches(1) ///
calipermatch(income_percentile days_over_44) caliperwidth(5 1095)

drop casecount matched_case control matched_controls

gen match_diffs_std = abs(matchgroup_1 - matchgroup_2)
su match_diffs_std, meanonly
assert r(max) == 0

* matches are scale and shift dependent when nostandardize is specified
set seed 4585239
set sortseed 789045789

test_calipmatch, gen(matchgroup_3) case(case) maxmatches(1) ///
calipermatch(income_percentile age) caliperwidth(5 3) nostandardize

drop casecount matched_case control matched_controls

set seed 4585239
set sortseed 789045789

test_calipmatch, gen(matchgroup_4) case(case) maxmatches(1) ///
calipermatch(income_percentile days_over_44) caliperwidth(5 1095) nostandardize

gen match_diffs = abs(matchgroup_3 - matchgroup_4)
su match_diffs, meanonly
assert r(max) != 0

keep case income_percentile age
keep case income_percentile age

*----------------------------------------------------------------------------

di "Successfully completed all tests."
di "Successfully completed all tests."