A regular expression parser for dollars
There are two python parsers in the project
dollar_program.py
1
2
3
4
5
6
7
8
9
import sys,re
regex = r"(\$?(?:(\d+|a|half|quarter|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|\w+teen|\w+ty|hundred|thousand|\w+illion).)*((\d+|and|((and|a)?.)?half( a)?|quarter|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|\w+teen|\w+ty|hundred|thousand|\w+illion))(\s)?(dollar|cent)(s)?)|((\$(?:\d+.)*\d+)(.(\w+illion|thousand))?)"
with open(sys.argv[1], 'r') as f:
test_str = f.read()
matches = re.finditer(regex, test_str, re.MULTILINE)
outFile=open("dollar_output.txt","w")
for matchNum, match in enumerate(matches, start=1):
outFile.write(match.group()+"\n")
outFile.close()
telephone_regex.py
1
2
3
4
5
6
7
8
9
import sys,re
regex = r"[(]?\d{3}[)]?[(\s)?.-]\d{3}[\s.-]\d{4}"
with open(sys.argv[1], 'r') as f:
test_str = f.read()
matches = re.finditer(regex, test_str, re.MULTILINE)
outFile=open("telephone_output.txt","w")
for matchNum, match in enumerate(matches, start=1):
outFile.write(match.group()+"\n")
outFile.close()
This is the program file. It is possible to call the program on the command line with a text file as a parameter and output regexp matches in the format indicated below. For example,
dollar_program.py target_text.txt
telephone_regex.py target_text.txt
dollar_output.txt – this should contain the dollar amounts recognized by your program, one per line. The parts of the lines that are not part of the dollar amount should not be printed at all. 3 lines of example output might be something like this:
$5 million
$5.00
five hundred dollars
telephone_output.txt – the output file for telephosne numbers,
e.g.,
212-345-1234
777-1000