Grocery Price Tracking Through Webscraping

What skills to learn, what tools to get
Post Reply
prognastat
Posts: 513
Joined: Fri May 04, 2018 8:30 pm
Location: Texas
Contact:

Grocery Price Tracking Through Webscraping

Post by prognastat » Wed Dec 05, 2018 6:33 pm

I figured I'd hit two birds with one stone. I've been needing to learn python so I figured I might as well get some use out of it on the ERE front.

I've now completed a python script that's webscraping my grocery store's website once a day for the prices of my staples and what the running average is. My next step is to have it check once a week which products are currently priced below their running average and send me an email with a list of those products and what the price difference the day before my regular shopping trip so I can incorporate it in to both what I plan on eating the next week and also things I might want to buy in bulk.

Once I have both the tracking and notifications done it'll be a matter of using it for a while and seeing what improvements can be made, but if anything it's been some good experience improving my python skills. It should also allow me to get my grocery costs down a little bit with minimal effort once completed.

It's pretty simple so far, it starts with a list of product URLs, scrapes them for the product name and price. Then stores these in .csv files along with the date for the price.

User avatar
Jin+Guice
Posts: 203
Joined: Sat Jun 30, 2018 8:15 am

Re: Grocery Price Tracking Through Webscraping

Post by Jin+Guice » Wed Dec 05, 2018 7:04 pm

Fuck! I've always wanted to do this but I can't because I am too dumb/ unskilled. I am both impressed and envious.

blackbird
Posts: 82
Joined: Mon Apr 08, 2013 6:36 pm

Re: Grocery Price Tracking Through Webscraping

Post by blackbird » Wed Dec 05, 2018 7:20 pm

@prognostat

Scraping with Python is a lot of fun. Last winter I built a similar project to scrape news titles / forum post titles / etc from sites I frequented (kind of like an aggregator similar to old FARK) and then spit out a list with hyperlinks to the actual content. It was surprisingly easy to get the basic part down, but I suspect the texting me a list piece is more complicated. Only real issue I ran into is that folks use a wider variety of forum software than I expected and some didn't play well with scraping, while others proved easy to work with I found. I set it aside when I got focused on a different project, but just earlier today I picked up a $10 monitor at the thrift store for my Raspberry Pi, and I intend to work on it again once that little guy is set back up. I'd be interested in seeing your code sometime when you finish, I'm sure it is cleaner than mine. I remember using BeautifulSoup ( https://pypi.org/project/beautifulsoup4/ ) for most of it. Well done!

User avatar
Gilberto de Piento
Posts: 1126
Joined: Tue Nov 12, 2013 10:23 pm

Re: Grocery Price Tracking Through Webscraping

Post by Gilberto de Piento » Wed Dec 05, 2018 7:23 pm

Cool project! Let us know how it works out.

Bigger idea: do this for all their products and then do the same for other stores. Make a big db of product costs over time. See if anyone will then pay for the data.

prognastat
Posts: 513
Joined: Fri May 04, 2018 8:30 pm
Location: Texas
Contact:

Re: Grocery Price Tracking Through Webscraping

Post by prognastat » Wed Dec 05, 2018 8:10 pm

@Jin+Guice

Thanks, really isn't much so far though. It's only 95 lines of code so far and most of it relies on some modules such as beautifulsoup. If you have some existing coding experience in a different language it shouldn't be too hard to learn enough to do this over a weekend.

@blackbird

That's exactly what I did, it relies on beautifulsoup for parsing the html. It's currently just running in the background on my desktop. I can definitely share it once I finish the basics I wanted in it and clean it up some and add comments so I don't have to feel ashamed about my code. I'm sure currently it's no cleaner than yours.

@Gilberto

Would be cool and I might eventually expand to some of the other big chain stores, they have to have a pretty competent online store to scrape it for useful data though which excludes many stores. It might be tricky to compare products between the stores though without manually determining equivalent products.

Main problem for building a large national public database is that prices vary from store to store even among chains so storing all that data would probably require far more storage and bandwidth than I'd have available without getting serious about it and I suspect the market for finding deals on groceries is probably pretty limited.

prognastat
Posts: 513
Joined: Fri May 04, 2018 8:30 pm
Location: Texas
Contact:

Re: Grocery Price Tracking Through Webscraping

Post by prognastat » Tue Dec 11, 2018 2:28 pm

Well work got a little busy so haven't had a chance to make too much progress yet, did manage to clean it up just a little so I'll post what I have so far:

Code: Select all

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv
import datetime
import shutil
import os
import time
import threading

myURLs = []

#Get all URLs from URLs.txt located in the same folder as the script
with open('URLs.txt', 'r') as f:
    myURLs = [line.rstrip() for line in f.readlines()]

headers = ["Date", "Product Price", "Average Price", "Price Checked"]

filenames = []

#loop through all URLs provided
def update_prices():
    for URL in myURLs:
        uClient = uReq(URL)
        page_html = uClient.read()
        uClient.close()

        #html parsing
        page_soup = soup(page_html, "html.parser")

        #Get product name
        product_name = page_soup.h1.text

        #Get product price
        product_price = page_soup.find("meta",{"itemprop":"price"})["content"]

        #Get current date
        x = datetime.datetime.now()
        date = x.strftime("%x")

        average_price = float(product_price)
        price_checked_count = 1

        newrow = [date,product_price,average_price,price_checked_count]
        
        filename = product_name.replace(" ", "_") + ".csv"
        
        if filename not in filenames:
            filenames.append(filename)
            
        #Try opening file if they exist to add new data and create a new file if it doesn't exist
        try:
            with open(filename, 'r', newline='') as f, open(filename + '.temp', 'w', newline='') as fout:
                reader = csv.reader(f)
                writer = csv.writer(fout)
                lastrow = []
                checked = False
                
                for row in reader:
                    if date in row[0]:
                        checked = True
                        break
                    elif lastrow == headers:
                        price_checked_count = int(row[3])
                        average_price = ((float(row[2])*price_checked_count)+float(product_price))/(price_checked_count+1)
                        price_checked_count = price_checked_count+1
                        newrow = [date,product_price,average_price,price_checked_count]
                        writer.writerow(newrow)
                    writer.writerow(row)
                    lastrow = row
                if checked == True:
                    os.remove(filename + '.temp')
                    break
                else:
                    shutil.move(filename + '.temp', filename) 
        except FileNotFoundError:
            with open(filename, 'w') as csvfile:
                filewriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
                filewriter.writerow(headers)
                filewriter.writerow(newrow)
    print(filenames)
    threading.Timer(86400, update_prices).start()
                
update_prices()
This is for the texas grocery store chain HEB. It takes a .txt file with all the URLs separated by newlines and grabs the product name and price from the page's html then writes that into .csv files.

I have written some code to activate every Friday at 8PM and check if the current price for the products is lower than the average and send this list of products and prices by email, but it's not functional/done yet.
Last edited by prognastat on Tue Dec 11, 2018 4:04 pm, edited 2 times in total.

Curmudgeon
Posts: 7
Joined: Tue Feb 20, 2018 3:54 pm

Re: Grocery Price Tracking Through Webscraping

Post by Curmudgeon » Tue Dec 11, 2018 3:54 pm

If you find this useful, it's possible that other people will, as well (I like it - I may grab your code).
If other people find it useful, it may be possible to monetize it (actually, this seems very easy to monetize).
If it other people find it useful, you may find it is not necessary to gather the data via scraping - stores may be willing to hand over their data (or pay you to take it!)

prognastat
Posts: 513
Joined: Fri May 04, 2018 8:30 pm
Location: Texas
Contact:

Re: Grocery Price Tracking Through Webscraping

Post by prognastat » Tue Dec 11, 2018 4:18 pm

Thanks, let me know if it turns out to be useful. So far I only have about a week's worth of data collected during which no prices seem to have changed so haven't really been able to tell how useful it'll be yet. I suspect in the long run it may save 10-20% on my grocery spending. Which for myself is already quite low so maybe 15-30 dollars a month, since I already keep an eye out on the weekly sales the savings aren't as high, but it might reduce time spent and also increase savings a little on things that don't get posted in the weekly ad. Every little bit helps though.

Curmudgeon
Posts: 7
Joined: Tue Feb 20, 2018 3:54 pm

Re: Grocery Price Tracking Through Webscraping

Post by Curmudgeon » Tue Dec 11, 2018 4:42 pm

The first hitch for me is that my favorite grocery store - Winco - doesn't even publish ads, because they claim they can offer lower prices if they don't have to pay for ads. They are, by far, the lowest price grocer in my area - their normal prices are usually lower than the sale prices at the competition. So, it seems like you need to be able to see ALL prices, not just sales.
Last edited by Curmudgeon on Tue Dec 11, 2018 5:08 pm, edited 1 time in total.

2Birds1Stone
Posts: 389
Joined: Thu Nov 19, 2015 11:20 am

Re: Grocery Price Tracking Through Webscraping

Post by 2Birds1Stone » Tue Dec 11, 2018 4:45 pm

prognastat wrote:
Wed Dec 05, 2018 6:33 pm
I figured I'd hit two birds with one stone.
ouch

prognastat
Posts: 513
Joined: Fri May 04, 2018 8:30 pm
Location: Texas
Contact:

Re: Grocery Price Tracking Through Webscraping

Post by prognastat » Tue Dec 11, 2018 5:01 pm

@Curmudgeon
Yeah if they don't publish their prices anywhere you're either going to have to keep mental track of prices or write them down manually to keep track.

Post Reply