Monday and Tuesday (WINVEST Day)
On Monday of week six, DSPG conducted assessments of houses in Grundy Center and New Hampton by taking pictures and tracking their characteristics using the Fulcrum app. We then continued our evaluations in Independence on Tuesday, which marked the conclusion of our WINVEST site assessments.
During the assessments, we meticulously examined the houses’ structural features and recorded detailed notes on the condition of the driveway, foundation, gutters, paint, porch, roof, siding, walls, and window frames, as well as any other noteworthy features that were in poor condition.
We also took into account the condition of the lot, noting any junk or debris present on the site. Additionally, we evaluated the sidewalk connecting to the house and made note of its condition.
We paid extra close attention to the gutters, roof, siding, and landscape, assessing whether they were in good, fair, or poor condition. As we took and evaluated the pictures, we meticulously noted any obstructions in the photo(s) taken, paying close attention to whether they were caused by overgrown vegetation (such as bushes or weeds), trees, electrical posts, cars, or any other obstructions.
Along with evaluation of the housing, we also took note of our general impressions of the block. We evaluated the neighborhood sidewalks, taking note of whether they were partial or only on one side, and if they had curb cuts for easy accessibility at intersections.
Additionally, we assessed the condition of the sidewalks, ranking them to determine if they were unsafe and in need of repair or replacement. We also assessed the condition of the street trees and evaluated their overall health, marking any that appeared to be in poor condition.
We took note of the presence and condition of street lights, paying special attention to the brightness and coverage of the lights. We also marked the location of any street signs for wayfinding. Additionally, we identified the type of storm drain, whether it was a ditch/swale, curb/gutter, or another type of system. Finally to document our observations, we took pictures of both sides of the street block, capturing any notable features or areas of concern.
Additionally, if we identified any flaws or damages in the sidewalk, we took pictures to document the issues.
Here are the comprehensive maps of our WINVEST site assessments, as well as a photo of a typical house that we would assess.
-02.png)
Other AddOns to the day
Me, Sadat and Harun discussed about the current progress of the project and came up with few other ideas.
A little bit about project discussion that we had:
Inventory
There is some cleaned data after web-scraping.
There are also uncleaned datasets in Box/GitHub
Each category has;
- Price, Weight, URL, Local/Not-Local, and Zip Code
Original Idea
Plot commodities by location to investigate price differences.
Answer whether the locality of a commodity affects its prices.
BRAINSTORMING SESSION ------------------------------------------------------------------
Some New Ideas
Supply Chain Operations / Commodity arbitrage.
Data collection is the major problem.
- Building a multi-purpose spider that can pull information for local food data collection part.
Final thoughts that we all came up with:
Creating a reliable tool that works for scraping data (potentially a well-documented spider template).
Investigate a single product as a case study to scrape data for simple spatial visualization by zip code.
Create some application for demonstrating how a Supply Chain Operations tool could be utilized.
Wednesday ( Client Meeting Day)
Had a Meeting with Courtney Long where we discussed further preceding of the project
Final thoughts after client meeting that we all agreed on:
A comprehensive map to showcase the prices of eggs and bacon across various counties using the collected data. This map serves as a valuable tool for identifying trends and patterns in pricing, as well as understanding customer preferences towards specific brands. Additionally, the map aids in the selection of suitable selling locations by considering crucial factors such as brand reputation, pricing, and travel distance (cost).
A number of web-scrapping spiders for selected websites to facilitate the creation of a comprehensive product database. These spiders will automate the process of data scraping, enabling repetitive and efficient collection of data.
Showcase the capability of the spiders with a specific crop example. The spiders will be utilized to extract data for one of the following six products: tomatoes (regardless of the type), carrots, lettuce, watermelon, eggplant, or leafy greens. This demonstration will effectively highlight the functionality and effectiveness of the spiders in retrieving the desired data.
Optimization of the crop flow, from the point of supply to the point of demand, that maximizes overall profit. We will explore the factors and methodology to estimate the demand and supply.
Optimization of the crop flow, from the point of supply to the point of demand:
The red points represent the counties.
The green lines represent the flow of crops, and the blue arrow shows the direction of the flow.
Current issues that will be resolved:
The supply is greater than the demand.
Minimizing the traveling distance rather than maximizing the profit.
Considered random points as counties instead of the original location of each county.
The following might be included in the project:
A separate account of fresh and not fresh products.
Consideration of each individual farmer’s profit.
At the end of the day, we all discussed and listed down all the project’s output that we plan to produce at the end of this project.
Thursday
We started our day with coffee talk which was done and presented by Aaron. Aaron spoke about web scraping, specifically spiders in web scraping.
The above and below pictures shows a glimpse of what was presented in coffee talk.
Work In Progress… 
We are almost to the end of cleaning all the spiders. Data that we collected from grocers list is almost cleaned and now we are moving ahead with the data that we collected from local farms and CSAs. A snapshot of how we are cleaning and adding “local” column to the final data set can be seen below:
“Hyvee” spider is still being worked on and the code is on the run. A small part of the code can be seen below:
#Imports the products
from DSPG_Products import Products
from DSPG_SpiderErrors import DataCleanerError
from DSPG_SpiderErrors import BrandingError
from DSPG_SpiderErrors import StringValueExtractionError
from DSPG_SpiderErrors import DebugError
from datetime import datetime
#This is a helper class to reduce duplicate code in the DataCleaner class
DataCleaner():
class
__init__(self):
def = [
self.getLocalBrands #Bacon
'desmoinesbaconandmeatcompany', 'desmoinesbaconco','berkwoodfarms', 'joiagoodfarm', 'beeler', 'dmbaconco', 'prairiefresh', 'webstercity', 'hyvee', 'hickorycountry'},
{#Eggs
'farmershenhouse', 'cedarridgefarm', 'joiafoodfarm', 'thatssmart', 'hyvee', 'beavercreekfarm'},
{#Heirloom Tomatoes
'seedsavers'}
{
]
= [
self.getNonLocalBrands #Bacon
'jollyposh', 'farmland', 'countrysmokehouse', 'herbivorousbutcher', 'bigbuy', 'nimanranch', 'jimmydean', 'farmpromise', 'hormel', 'plainvillefarms', 'nueske', 'smithfield', 'applegate', 'garrettvalley', 'pedersonsnaturalfarms', 'indianakitchen', 'freshthyme', 'oscarmayer', 'jamestown', 'debruinranch', 'wright', 'boarshead'},
{#Eggs
'stateline', 'freshthyme', 'bornfree', 'handsomebrookfarm', 'handsomebrookfarms', 'egglandsbest', 'peteandgerryseggs', 'pennysmart', 'bestchoice', 'nellies', 'vitalfarms', 'organicvalley', 'happyegg'},
{#Heirloom Tomatoes
'organicvalley', 'delcabo'}
{
]
= [
self.getBrandNames #Bacon
'Des moines bacon & meat company','Jimmy dean', 'Oscar mayer', 'Jolly posh','Webster city', 'Prairie fresh', 'Des-moines-bacon-and-meat-company', "Boar's head", 'Fresh thyme', 'Country-smokehouse', 'Farm promise', 'Hormel', 'Oscar-mayer', 'Smithfield', 'Farmland', 'De bruin ranch', 'Indiana kitchen', "Pederson's natural farms", 'Des moines bacon co', 'Applegate', 'Country smokehouse', 'Niman ranch', 'Jolly posh', 'Berkwood farms', 'Hyvee', 'Jimmy-dean', 'Dm bacon co', 'Herbivorous butcher', 'Hickory country', 'Hy-vee', 'Beeler', 'Joia food farm', 'Garrett valley', 'Deli', 'Jamestown', 'Plainville farms', 'Big buy', 'Nueske', 'Wright'},
{#Eggs
"That's smart", "Egglands best", "Pete and gerry's eggs" ,"Nellie's", "Eggland's best", 'Handsome brook farm', 'Egglands-best', 'Farmers hen house', 'Handsome brook farms', 'Joia food farm', 'Penny smart', 'Fresh thyme', 'Vital farms', 'Best choice', 'Nellies-eggs', 'Organic valley', 'Cedar ridge farm', 'Happy egg', 'Thats-smart', 'Pete-and-gerrys-eggs', 'Farmers-hen-house', 'Beaver creek farm', 'Born free', 'Stateline', 'Hyvee', 'Hy-vee'},
{#Heirloom Tomatoes
'Del cabo', 'Seed savers', 'Organic valley'}
{
]
LoadDataSet(self, inputIndex, url):
def = inputIndex
self.productIndex if inputIndex == 0:
= {'Product Type': None,
self.Data 'Current Price': None,
'Orignal Price': None,
'Weight in lbs': None,
'True Weight': None,
'Brand': None,
'Local': None,
'Address': None,
'State': None,
'City': None,
'Zip Code': None,
'Date Collected': str(datetime(datetime.today().year, datetime.today().month, datetime.today().day))[:-9],
'Url': url
}== 1:
elif inputIndex = {'Product Type': None,
self.Data 'Current Price': None,
'Orignal Price': None,
'Amount in dz': None,
'True Amount': None,
'Brand': None,
'Local': None,
'Address': None,
'State': None,
'City': None,
'Zip Code': None,
'Date Collected': str(datetime(datetime.today().year, datetime.today().month, datetime.today().day))[:-9],
'Url': url
}== 2:
elif inputIndex = {'Product Type': None,
self.Data 'Current Price': None,
'Orignal Price': None,
'Weight in lbs': None,
'True Weight': None,
'Brand': None,
'Organic': None,
'Local': None,
'Address': None,
'State': None,
'City': None,
'Zip Code': None,
'Date Collected': str(datetime(datetime.today().year, datetime.today().month, datetime.today().day))[:-9],
'Url': url
}else:
DataCleanerError(inputIndex)
raise
cleanPricing(self):
def = ''.join(c for c in self.Data['Current Price'] if c.isdigit() or c == '.')
price if len(price) == 0:
return'Current Price'] = float(price)
self.Data[if self.Data['Orignal Price'] == None:
'Orignal Price'] = self.Data['Current Price']
self.Data[
return= ''.join(c for c in self.Data['Orignal Price'] if c.isdigit() or c == '.')
price if len(price) == 0:
'Orignal Price'] = self.Data['Current Price']
self.Data[else:
'Orignal Price'] = float(price)
self.Data[
baconModifications(self):
def #Finds True Weight if not available
if(self.Data['True Weight'] == None):
'True Weight'] = self.findWeight()
self.Data[#Sets the Weight in lbs if possible
if(self.Data['True Weight'] != None):
'Weight in lbs'] = self.ozToLb(self.Data['True Weight'])
self.Data[
ozToLb(self, input):
def if input == None:
return None= str(input).lower()
weight if 'oz' in weight:
self.stringValueExtraction(weight, 'oz') / 16.0
return 'lbs' in weight:
elif self.stringValueExtraction(weight, 'lb')
return '/lb' in weight:
elif 1.0
return 'lb' in weight:
elif self.stringValueExtraction(weight, 'lb')
return
return None
#If no weight is given we look at other places that could have what we need
#This Determines if a list talking about weights in ounces or pounds.
findWeight(self):
def #Checking these places for clues
= [self.Data['Current Price'], self.Data['Product Type'], self.Data['Orignal Price']]
checkLocations = []
possible for string in checkLocations:
if string == None:
continue= string.lower().replace(' ', '') # convert to lowercase and remove spaces
string if 'pound' in string:
"{self.stringValueExtraction(string, 'pound')} lb"
return f'ounce' in string:
elif "{self.stringValueExtraction(string, 'ounce')} oz"
return f'lbs' in string:
elif "{self.stringValueExtraction(string, 'lbs')} lb"
return f'/lb' in string:
elif "{self.stringValueExtraction(string, '/lb')}/lb"
return f'lb' in string:
elif "{self.stringValueExtraction(string, 'lb')} lb"
return f'oz' in string:
elif "{self.stringValueExtraction(string, 'oz')} oz"
return f'/ea' in string:
elif #This is the worst outcome so we want to append it to a list for later
possible.append(f"{self.stringValueExtraction(string, '/ea')}/ea")
next((item for item in possible if item is not None), None)
return
#Heirloom tomatoes are tricky
heirloomTomatoesModifications(self, weight):
def #We can extract Organic from the name
if self.Data['Organic'] == None:
if 'organic' in self.Data['Product Type'].lower().replace(' ', ''): # convert to lowercase and remove spaces
'Organic'] = 'Organic'
self.Data[#This part is for Weight
if self.Data['True Weight'] != None:
'Weight in lbs'] = self.ozToLb(self.Data['True Weight'])
self.Data[
returnif weight == None:
= self.findWeight()
string if '/lb' in string.lower().replace(' ', ''):
'True Weight'] = string
self.Data['Weight in lbs'] = 1.0
self.Data[
returnelse:
'True Weight'] = string
self.Data[else:
'True Weight'] = weight
self.Data['Weight in lbs'] = self.ozToLb(self.Data['True Weight'])
self.Data[
#Helper to reduce code. Splits the string and returns the float value
stringValueExtraction(self, string, stringType):
def if string == None or stringType == None:
StringValueExtractionError (string, stringType)
raise = ''
value = []
stringList for string in string.split(stringType):
for c in string:
if c.isdigit() or c == '.':
+= c
value :
elif valuestringList.append(float(value))
= ''
value if value:
stringList.append(float(value))
-1] if stringList else None
return stringList[
#Eggs don't have weight so we use amount
eggModifications(self):
def if self.Data['True Amount'] == None:
= [self.Data['Product Type'], self.Data['Current Price'], self.Data['Orignal Price']]
checkLocations for string in checkLocations:
if string == None:
continue= string.lower().replace(' ', '') # convert to lowercase and remove spaces
string if 'dozen' in string:
= self.stringValueExtraction(string, 'dozen')
amount if amount == None:
'True Amount'] = f"{1} dz"
self.Data['Amount in dz'] = 1.0
self.Data[
return'True Amount'] = f"{amount} dz"
self.Data['Amount in dz'] = amount
self.Data[
return if 'dz' in string:
= self.stringValueExtraction(string, 'dz')
amount 'True Amount'] = f"{amount} dz"
self.Data['Amount in dz'] = amount
self.Data[
return if 'ct' in string:
= self.stringValueExtraction(string, 'ct')
amount 'True Amount'] = f"{amount} ct"
self.Data['Amount in dz'] = amount / 12
self.Data[
returnif 'ea' in string:
= self.stringValueExtraction(string, 'ea')
amount 'True Amount'] = f"{amount} ea"
self.Data['Amount in dz'] = amount / 12
self.Data[
returnif 'pk' in string:
= self.stringValueExtraction(string, 'pk')
amount 'True Amount'] = f"{amount} pk"
self.Data['Amount in dz'] = amount / 12
self.Data[
return
else:
= self.Data['True Amount'].lower().replace(' ', '')
string if 'dozen' in string:
= self.stringValueExtraction(string, 'dozen')
amount if amount == None:
'Amount in dz'] = 1.0
self.Data[
return'Amount in dz'] = amount
self.Data['dz' in string:
elif 'Amount in dz'] = self.stringValueExtraction(string, 'dz')
self.Data['ct' in string:
elif 'Amount in dz'] = self.stringValueExtraction(string, 'ct') / 12
self.Data['ea' in string:
elif 'Amount in dz'] = self.stringValueExtraction(string, 'ea') / 12
self.Data['pk' in string:
elif 'Amount in dz'] = self.stringValueExtraction(string, 'pk') / 12
self.Data['pack' in string:
elif 'Amount in dz'] = self.stringValueExtraction(string, 'pack') / 12
self.Data[
determineLocality(self):
def :
tryif self.Data['Brand'] == None:
#Formats the name
= ' '.join(self.Data['Product Type'].split()).lower() # remove extra spaces
name = ''.join(c for c in name if c.isalpha() or c == "'" or c == " " or c == "-" or c == "&") # keep only letters, apostrophes, hyphens, and spaces and capitalize the first letter
name = name.capitalize()
name = ''
brand for b in self.getBrandNames[self.productIndex]:
if b in name and len(b) > len(brand):
= b
brand 'Brand'] = brand
self.Data[else:
'Brand'] = self.Data['Brand'].lower().capitalize()
self.Data[#Converts the brand into something we can quickly compair
= ''.join(c for c in self.Data['Brand'] if c.isalpha()).lower()
brand #Determins locality
if brand in {'deli'}:
'Local'] = "Can't be Determined"
self.Data[in self.getLocalBrands[self.productIndex]:
elif brand 'Local'] = "Local"
self.Data[in self.getNonLocalBrands[self.productIndex]:
elif brand 'Local'] = "Non-local"
self.Data[else:
'Local'] = "None Listed"
self.Data[
if self.productIndex == 2: #Special condition for Heirloom Tomatoes
#Sometimes what we need is in the name
= self.Data['Product Type'].lower().replace(' ', '')
name if 'organic' in name:
'Organic'] = True
self.Data[if 'local' in name: # convert to lowercase and remove spaces
'Local'] = 'Local'
self.Data[:
except IndexErrorBrandingError(self.productIndex)
raise
#Two Helper functions in case we need to add more brands to the loadBrands function above
#for the getBrandNames
# def CleanStringList(strings):
# cleaned = set()
# for s in strings:
# s = ''.join(c for c in s if c.isalpha()).lower() # keep only letters
# cleaned.add(s)
# print(list(cleaned))
# #for the getLocalBrands and getNonLocalBrands
# def CleanStringNames(strings):
# cleaned = set()
# for s in strings:
# name = ' '.join(s.split()).lower() # remove extra spaces
# name = ''.join(c for c in name if c.isalpha() or c == "'" or c == " " or c == "-").capitalize() # keep only letters, apostrophes, hyphens, and spaces and capitalize the first letter
# cleaned.add(name)
# print(list(cleaned)
)
3.Hunt for more data on many other resources/websites provided by our client is going on.
These resources/websites come from a list that are currently purchasing or selling local food products for a Local Food Purchasing Assistance and Local Food to Schools Program.
-Prudent Produce
-Wheatsfield
-Field to Family
-Farm Table Delivery
-Organic Greens
-Iowa Food Cooperative
-Grinnell Farm to table
-All Seasons Harvest
-Farmers Hen House
-Early Morning Harvest
-Flint Ridge Cooperative
-Oneota Cooperative
Crop flow optimization model is still being worked on
Plan for the next week
We have next client meeting on 28th June 23, so we plan to complete the entire data cleaning process before that so that we can start with Data Analysis and modeling process.
- Planning on creating a teaser video for our project.
- We will start working on data Analysis and modelling which includes cleaning, exploring partitioning, modelling and projecting output from the final data set.
- Simultaneously, we plan look for more data that can be used for project.