What if there were an easy way to get some numbers and information from recent bills, and cut through the hype to make some interesting mathematical observations? This might be useful for those who are studying law or mathematics – or anyone wanting to dig through the numbers in recent law. With some basic PHP we can build something that easily processes and shows which might be the most interesting for mathematicians.
Suppose you are a teacher going through the US congress records (at https://www.congress.gov/bill), and finding some numbers to crunch or compare… many of these are many many pages, so how can we find which sections would have a lot of numbers, might be something with some things to chart and compare and work it in to a math lesson?
Step 1: Separating data, finding the numbers
Step one is separating the text into its sections to find what sections have the most numbers… For starters we can build a .php file that reads line by line to separate them into two arrays, title and associated text:
<?php
$bill = "[119th Congress Public Law 4]
[From the U.S. Government Publishing Office]
[[Page 139 STAT. 9]]
Public Law 119-4
119th Congress
An Act
Making further continuing appropriations and other extensions for the
fiscal year ending September 30, 2025, and for other
purposes. <<NOTE: Mar. 15, 2025 - [H.R. 1968]>>
Be it enacted by the Senate and House of Representatives of the
United States of America in Congress assembled, <<NOTE: Full-Year
Continuing Appropriations and Extensions Act, 2025.>>
SECTION 1. SHORT TITLE.
This Act may be cited as the ``Full-Year Continuing Appropriations
and Extensions Act, 2025''.
SEC. 2. TABLE OF CONTENTS.
The table of contents of this Act is as follows:
Sec. 1. Short title.
Sec. 2. Table of contents.
Sec. 3. References.
... Paste whole TXT file from Congress.gov here ...";
$arrayOfLines = explode("\n",
str_replace("\r\n","\n",$bill) );
$titles = [];
$texts = [];
$inSection = false;
$text = '';
foreach($arrayOfLines as $line) {
if(preg_match('/(SEC. |SECTION )\d.*/', $line)) {
$titles[]= $line;
if(!empty($text)) {
//Set matching text for previous section.
$texts[]= $text;
}
$text = '';
$inSection = true;
} else if ($inSection) {
$text .= htmlentities($line)."<br>";
}
}
$texts[]= $text;
var_dump($titles);
var_dump($texts);
This will separate the values by section with an array of title, an array of associated text. Run “php ./extract.php” in your directory and you will get the titles and text output:
array(27) {
[0]=>
string(23) "SECTION 1. SHORT TITLE."
[1]=>
string(26) "SEC. 2. TABLE OF CONTENTS."
[2]=>
string(43) "SEC. 3. <<NOTE: 1 USC 1 note.>> REFERENCES."
[3]=>
string(65) "SEC. 2101. <<NOTE: Time period.>> EXTENSION FOR COMMUNITY HEALTH "
[4]=>
string(64) "SEC. 2102. <<NOTE: Time period.>> EXTENSION OF SPECIAL DIABETES "
[5]=>
string(47) "SEC. 2103. NATIONAL HEALTH SECURITY EXTENSIONS."
[6]=>
string(61) "SEC. 2201. EXTENSION OF INCREASED INPATIENT HOSPITAL PAYMENT "
...
array(27) {
.... texts of each section .....
Now you have arrays of titles, texts, and now I’ll make a count of the numbers that would be interesting for an analysis. For starters I’ll match numbers that are together or connected with “.” or “,”: In preg regex language that is ‘/[0-9][0-9,.]*/‘ – match number, and match any number of numbers (“*”) or coma/decimal after.
Unfortunately this will count many numbers that are actually a year, not a numeric number that we are probably interested in. So I will not count those. I will also subtract for section numbers or numbers that are part of a date: (month) (day) using the (jan|feb|…etc…) matcher:
$arrayOfLines = explode("\n",
str_replace("\r\n","\n",$bill) );
$titles = [];
$texts = [];
$inSection = false;
$text = '';
foreach($arrayOfLines as $line) {
if(preg_match('/(SEC. |SECTION )\d.*/', $line)) {
$titles[]= $line;
if(!empty($text)) {
//Set matching text for previous section.
$texts[]= $text;
}
$text = '';
$inSection = true;
} else if ($inSection) {
$text .= htmlentities($line)."<br>";
}
}
$texts[]= $text;
#Start with empty array then set array [0], [1], [2] etc...
$i=0;
$numberCounts = [];
foreach($texts as $text) {
$count = 0;
preg_match_all('/[0-9][0-9,.]*/',$text, $matches);
if( is_array($matches) && is_array($matches[0])) {
foreach($matches[0] as $number) {
//Do not count if this is likely just a date:
$intval = intval($number);
if( $intval > 2050 || $intval < 1900) {
$count++;
}
}
}
//Subtract those that are "Section (no), (monthname) (date) etc.:
preg_match_all('/(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|sept|september|oct|october|nov|november|dec|december|section|sections|report|page|law|sec.) [0-9]/i', $text, $datematches);
if( is_array($datematches) && is_array($datematches[0])) {
$count -= count($datematches[0]);
}
//var_dump($matches);
$numberCounts[$i] = $count;
$i++;
}
var_dump($titles);
var_dump($numberCounts);
This outputs the number of numbers we are likely interested in for observing interesting mathematical things:
array(27) {
[0]=>
string(23) "SECTION 1. SHORT TITLE."
[1]=>
string(26) "SEC. 2. TABLE OF CONTENTS."
[2]=>
string(43) "SEC. 3. <<NOTE: 1 USC 1 note.>> REFERENCES."
[3]=>
string(65) "SEC. 2101. <<NOTE: Time period.>> EXTENSION FOR COMMUNITY HEALTH "
[4]=>
string(64) "SEC. 2102. <<NOTE: Time period.>> EXTENSION OF SPECIAL DIABETES "
[5]=>
string(47) "SEC. 2103. NATIONAL HEALTH SECURITY EXTENSIONS."
[6]=>
string(61) "SEC. 2201. EXTENSION OF INCREASED INPATIENT HOSPITAL PAYMENT "
[7]=>
string(62) "SEC. 2202. EXTENSION OF THE MEDICARE-DEPENDENT HOSPITAL (MDH) "
[8]=>
string(63) "SEC. 2203. EXTENSION OF ADD-ON PAYMENTS FOR AMBULANCE SERVICES."
[9]=>
string(65) "SEC. 2204. EXTENSION OF FUNDING FOR QUALITY MEASURE ENDORSEMENT, "
[10]=>
string(64) "SEC. 2205. EXTENSION OF FUNDING OUTREACH AND ASSISTANCE FOR LOW-"
[11]=>
string(56) "SEC. 2206. EXTENSION OF THE WORK GEOGRAPHIC INDEX FLOOR."
[12]=>
string(57) "SEC. 2207. EXTENSION OF CERTAIN TELEHEALTH FLEXIBILITIES."
[13]=>
string(56) "SEC. 2208. EXTENDING ACUTE HOSPITAL CARE AT HOME WAIVER "
[14]=>
string(63) "SEC. 2209. EXTENSION OF TEMPORARY INCLUSION OF AUTHORIZED ORAL "
[15]=>
string(37) "SEC. 2210. MEDICARE IMPROVEMENT FUND."
[16]=>
string(34) "SEC. 2211. MEDICARE SEQUESTRATION."
[17]=>
string(53) "SEC. 2301. SEXUAL RISK AVOIDANCE EDUCATION EXTENSION."
[18]=>
string(55) "SEC. 2302. PERSONAL RESPONSIBILITY EDUCATION EXTENSION."
[19]=>
string(60) "SEC. 2303. EXTENSION OF FUNDING FOR FAMILY-TO-FAMILY HEALTH "
[20]=>
string(44) "SEC. 2401. DELAYING MEDICAID DSH REDUCTIONS."
[21]=>
string(62) "SEC. 3101. COMMODITY FUTURES TRADING COMMISSION WHISTLEBLOWER "
[22]=>
string(60) "SEC. 3102. PROTECTION OF CERTAIN FACILITIES AND ASSETS FROM "
[23]=>
string(41) "SEC. 3103. ADDITIONAL SPECIAL ASSESSMENT."
[24]=>
string(66) "SEC. 3104. NATIONAL CYBERSECURITY PROTECTION SYSTEM AUTHORIZATION."
[25]=>
string(61) "SEC. 3105. EXTENSION OF TEMPORARY ORDER FOR FENTANYL-RELATED "
[26]=>
string(29) "SEC. 3106. BUDGETARY EFFECTS."
}
array(27) {
[0]=>
int(2)
[1]=>
int(1)
[2]=>
int(2286)
[3]=>
int(59)
[4]=>
int(32)
[5]=>
int(63)
[6]=>
int(31)
[7]=>
int(36)
[8]=>
int(14)
[9]=>
int(20)
[10]=>
int(31)
[11]=>
int(9)
[12]=>
int(79)
[13]=>
int(9)
[14]=>
int(10)
[15]=>
int(10)
[16]=>
int(19)
[17]=>
int(16)
[18]=>
int(17)
[19]=>
int(10)
[20]=>
int(25)
[21]=>
int(13)
[22]=>
int(6)
[23]=>
int(5)
[24]=>
int(6)
[25]=>
int(5)
[26]=>
int(14)
}
Indeed these sections have some interesting numbers and the number of numbers is counted for each section…
Putting it all together
Next, I built a .php file that is one can use on the server to let anyone get an overview of any bill pasted in as text:
- Get the posted form from $_POST[] array
- Process using the regex as noted above.
- Import jQuery and easily expand and collapse the element following an element using “$(this).next().toggle()”
- Secure the form against <script> or cookie injection using htmlspecialchars.
- All human coded and no odd AI junk
<?php
$bill = $_POST['bill'];
if( empty($bill)) {
?><html><head>
<title>Results</title>
<script src="https://code.jquery.com/jquery-3.7.1.min.js" integrity="sha256-/JqT3SQfawRcv/BIHPThkBvs0OEvtFFmqPF/lYI/Cxo=" crossorigin="anonymous"></script>
<script>
jQuery(function($){ //On DOM ready
$('h2').click(function() { //Set click handlers
$(this).next().toggle() //Toggle visibility next element.
});
});
</script>
<style>h2 {
margin:1em 0 0 0;
width:100%;
background:#EEE;
}
h2 strong {
float:right;
margin-right:2em;
}
</style>
</head>
<body>
<h1>Bill Text</h2>
<h1>Enter bill text from <a href="https://www.congress.gov/">https://www.congress.gov/</a>:</h1>
<form method="POST">
<textarea name="bill" cols="100" rows="40"></textarea><br>
<input type="submit" value="Process...">
</form>
</body></html>
<?php
exit;
}
$arrayOfLines = explode("\n",
str_replace("\r\n","\n",$bill) );
$titles = [];
$texts = [];
$inSection = false;
$text = '';
foreach($arrayOfLines as $line) {
if(preg_match('/(SEC. |SECTION )\d.*/', $line)) {
$titles[]= $line;
if(!empty($text)) {
//Set matching text for previous section.
$texts[]= $text;
}
$text = '';
$inSection = true;
} else if ($inSection) {
$text .= htmlentities($line)."<br>";
}
}
$texts[]= $text;
#Start with empty array then set array [0], [1], [2] etc...
$i=0;
$numberCounts = [];
foreach($texts as $text) {
$count = 0;
preg_match_all('/[0-9][0-9,.]*/',$text, $matches);
if( is_array($matches) && is_array($matches[0])) {
foreach($matches[0] as $number) {
//Do not count if this is likely just a date:
$intval = intval($number);
if( $intval > 2050 || $intval < 1900) {
$count++;
}
}
}
//Subtract those that are "Section (no), (monthname) (date) etc.:
preg_match_all('/(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|sept|september|oct|october|nov|november|dec|december|section|sections|report|page|law|sec.) [0-9]/i', $text, $datematches);
if( is_array($datematches) && is_array($datematches[0])) {
$count -= count($datematches[0]);
}
//var_dump($matches);
$numberCounts[$i] = $count;
$i++;
}
?><html><head>
<title>Results</title>
<script src="https://code.jquery.com/jquery-3.7.1.min.js" integrity="sha256-/JqT3SQfawRcv/BIHPThkBvs0OEvtFFmqPF/lYI/Cxo=" crossorigin="anonymous"></script>
<script>
jQuery(function($){ //On DOM ready
$('h2').click(function() { //Set click handlers
$(this).next().toggle() //Toggle visibility next element.
});
});
</script>
<style>h2 {
margin:1em 0 0 0;
width:100%;
background:#EEE;
}
h2 strong {
float:right;
margin-right:2em;
}
</style>
</head>
<body>
<h1>Results</h1>
<?php for($i=0; $i<count($titles); $i++) {
echo "<h2>".htmlspecialchars($titles[$i])." <strong>".$numberCounts[$i]."</strong></h2>";
echo "<p style='display:none'>".str_replace('&#039;', "'", str_replace('<br>','<br>',htmlspecialchars($texts[$i])))."</p>";
}
?>
</body></html>
For the finished interactive page based on the above, please see here: https://howtotrainyourrobot.com/interactiveBill.php
Use this to find the pages and sections that have the most numbers – maybe these would be a good basis for analyzing congress’ priorities, or building a story problem for your math class that has a base in practical bills and laws that are going into effect.
Choose your favorite (or least favorite) bill, and process it using this tool. In the “H.R.1 – One Big Beautiful Bill Act” we can see that the sections on DOD spending have many numbers in them – chart or tabulate the expenditure. The Offshore Oil and Gas Leasing proposes new drilling projects in the “Gulf of America”, what would be the benefits/risks of starting 30 or more drilling projects off the coast of Florida? (brainstorm or share).
^ After entering the bill text, click the headings to expand the section and find the most mathematically relevant portions!
Other potential class projects could including choosing some bill and building an assignment upon finding the most interesting parts of a bill –
- Compare numbers to historic data or other polices.
- Build a chart comparing allocated budget to different functions.
- Agree or disagree with the premise – how this would affect the USA or the world?