Page 1 of 1

Grocery Price Tracking Through Webscraping

Posted: Wed Dec 05, 2018 6:33 pm
by prognastat
I figured I'd hit two birds with one stone. I've been needing to learn python so I figured I might as well get some use out of it on the ERE front.

I've now completed a python script that's webscraping my grocery store's website once a day for the prices of my staples and what the running average is. My next step is to have it check once a week which products are currently priced below their running average and send me an email with a list of those products and what the price difference the day before my regular shopping trip so I can incorporate it in to both what I plan on eating the next week and also things I might want to buy in bulk.

Once I have both the tracking and notifications done it'll be a matter of using it for a while and seeing what improvements can be made, but if anything it's been some good experience improving my python skills. It should also allow me to get my grocery costs down a little bit with minimal effort once completed.

It's pretty simple so far, it starts with a list of product URLs, scrapes them for the product name and price. Then stores these in .csv files along with the date for the price.

Re: Grocery Price Tracking Through Webscraping

Posted: Wed Dec 05, 2018 7:04 pm
by Jin+Guice
Fuck! I've always wanted to do this but I can't because I am too dumb/ unskilled. I am both impressed and envious.

Re: Grocery Price Tracking Through Webscraping

Posted: Wed Dec 05, 2018 7:20 pm
by blackbird

Scraping with Python is a lot of fun. Last winter I built a similar project to scrape news titles / forum post titles / etc from sites I frequented (kind of like an aggregator similar to old FARK) and then spit out a list with hyperlinks to the actual content. It was surprisingly easy to get the basic part down, but I suspect the texting me a list piece is more complicated. Only real issue I ran into is that folks use a wider variety of forum software than I expected and some didn't play well with scraping, while others proved easy to work with I found. I set it aside when I got focused on a different project, but just earlier today I picked up a $10 monitor at the thrift store for my Raspberry Pi, and I intend to work on it again once that little guy is set back up. I'd be interested in seeing your code sometime when you finish, I'm sure it is cleaner than mine. I remember using BeautifulSoup ( ) for most of it. Well done!

Re: Grocery Price Tracking Through Webscraping

Posted: Wed Dec 05, 2018 7:23 pm
by Gilberto de Piento
Cool project! Let us know how it works out.

Bigger idea: do this for all their products and then do the same for other stores. Make a big db of product costs over time. See if anyone will then pay for the data.

Re: Grocery Price Tracking Through Webscraping

Posted: Wed Dec 05, 2018 8:10 pm
by prognastat

Thanks, really isn't much so far though. It's only 95 lines of code so far and most of it relies on some modules such as beautifulsoup. If you have some existing coding experience in a different language it shouldn't be too hard to learn enough to do this over a weekend.


That's exactly what I did, it relies on beautifulsoup for parsing the html. It's currently just running in the background on my desktop. I can definitely share it once I finish the basics I wanted in it and clean it up some and add comments so I don't have to feel ashamed about my code. I'm sure currently it's no cleaner than yours.


Would be cool and I might eventually expand to some of the other big chain stores, they have to have a pretty competent online store to scrape it for useful data though which excludes many stores. It might be tricky to compare products between the stores though without manually determining equivalent products.

Main problem for building a large national public database is that prices vary from store to store even among chains so storing all that data would probably require far more storage and bandwidth than I'd have available without getting serious about it and I suspect the market for finding deals on groceries is probably pretty limited.

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 2:28 pm
by prognastat
Well work got a little busy so haven't had a chance to make too much progress yet, did manage to clean it up just a little so I'll post what I have so far:

Code: Select all

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv
import datetime
import shutil
import os
import time
import threading

myURLs = []

#Get all URLs from URLs.txt located in the same folder as the script
with open('URLs.txt', 'r') as f:
    myURLs = [line.rstrip() for line in f.readlines()]

headers = ["Date", "Product Price", "Average Price", "Price Checked"]

filenames = []

#loop through all URLs provided
def update_prices():
    for URL in myURLs:
        uClient = uReq(URL)
        page_html =

        #html parsing
        page_soup = soup(page_html, "html.parser")

        #Get product name
        product_name = page_soup.h1.text

        #Get product price
        product_price = page_soup.find("meta",{"itemprop":"price"})["content"]

        #Get current date
        x =
        date = x.strftime("%x")

        average_price = float(product_price)
        price_checked_count = 1

        newrow = [date,product_price,average_price,price_checked_count]
        filename = product_name.replace(" ", "_") + ".csv"
        if filename not in filenames:
        #Try opening file if they exist to add new data and create a new file if it doesn't exist
            with open(filename, 'r', newline='') as f, open(filename + '.temp', 'w', newline='') as fout:
                reader = csv.reader(f)
                writer = csv.writer(fout)
                lastrow = []
                checked = False
                for row in reader:
                    if date in row[0]:
                        checked = True
                    elif lastrow == headers:
                        price_checked_count = int(row[3])
                        average_price = ((float(row[2])*price_checked_count)+float(product_price))/(price_checked_count+1)
                        price_checked_count = price_checked_count+1
                        newrow = [date,product_price,average_price,price_checked_count]
                    lastrow = row
                if checked == True:
                    os.remove(filename + '.temp')
                    shutil.move(filename + '.temp', filename) 
        except FileNotFoundError:
            with open(filename, 'w') as csvfile:
                filewriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    threading.Timer(86400, update_prices).start()
This is for the texas grocery store chain HEB. It takes a .txt file with all the URLs separated by newlines and grabs the product name and price from the page's html then writes that into .csv files.

I have written some code to activate every Friday at 8PM and check if the current price for the products is lower than the average and send this list of products and prices by email, but it's not functional/done yet.

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 3:54 pm
by Curmudgeon
If you find this useful, it's possible that other people will, as well (I like it - I may grab your code).
If other people find it useful, it may be possible to monetize it (actually, this seems very easy to monetize).
If it other people find it useful, you may find it is not necessary to gather the data via scraping - stores may be willing to hand over their data (or pay you to take it!)

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 4:18 pm
by prognastat
Thanks, let me know if it turns out to be useful. So far I only have about a week's worth of data collected during which no prices seem to have changed so haven't really been able to tell how useful it'll be yet. I suspect in the long run it may save 10-20% on my grocery spending. Which for myself is already quite low so maybe 15-30 dollars a month, since I already keep an eye out on the weekly sales the savings aren't as high, but it might reduce time spent and also increase savings a little on things that don't get posted in the weekly ad. Every little bit helps though.

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 4:42 pm
by Curmudgeon
The first hitch for me is that my favorite grocery store - Winco - doesn't even publish ads, because they claim they can offer lower prices if they don't have to pay for ads. They are, by far, the lowest price grocer in my area - their normal prices are usually lower than the sale prices at the competition. So, it seems like you need to be able to see ALL prices, not just sales.

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 4:45 pm
by 2Birds1Stone
prognastat wrote:
Wed Dec 05, 2018 6:33 pm
I figured I'd hit two birds with one stone.

Re: Grocery Price Tracking Through Webscraping

Posted: Tue Dec 11, 2018 5:01 pm
by prognastat
Yeah if they don't publish their prices anywhere you're either going to have to keep mental track of prices or write them down manually to keep track.