Serious Question for the Web Savvy?
1I hope you all don’t mind. A lot of you all seem to be very very computer literate. I was wondering if it was possible to... or if it can even be done…????
Can a group of web pages be screen shot and saved for posterity?
I want to be able to press a button or, even better, have it automatically do it.
Here is the reason. My company is in the construction field. It can happen that the job is running behind and the client is unhappy with progress. We schedule based on predicted weather and intuition. Sometimes the weather guys just miss it all together but sometimes it is right on the money. What I don’t understand is why the collective has not made an archive of the predictions of the weather. The actuals are good for historical value but in our case the predictions are what we have to base the work schedule.
So if it was possible to have a macro type program that would go to the web sites we use, and snap a screen shot of the predictions and save them to a file for later viewing it would be very important data when the questions come up.
Does anyone know if such a thing exists?
- 18 comments, 13 replies
- Comment
There are various ways to do that, from literal screenshots to printing the page to a file, e.g. PDF.
If you don't need it for audit purposes, there are many ways to scrape data from web pages for analysis.
google chrome has a print to pdf button, you can print the entire webpage using CONTROL -P
@communist Or if she's using a Mac, every program can do this (with Command-P or, in Yosemite, there's something like Archive to PDF in the File menus).
@editorkid Additionally, on a Mac, I highly suspect the level of automation she's looking for could be handled with Automator.
The Internet Archive does just what their name implies.
The Wayback Machine
@cercopithecoid Most weather sites won't have all locations cached on a daily basis.
@cercopithecoid true, but the snapshots can be infrequent, and may not be particularly useful for tracking something like a local weather forecast for example.
@nadroj You're absolutely right. I should have qualified my comment with "if there's something you've already missed, here's a chance"
@silverqueen Besides printing to PDF, here's a site lots of folks use to archive pages: https://archive.today/
(If you need iron-clad proof, you can print to PDF, then have it timestamped by one of the available free or paid timeservers).
Yes it's definitely possible to script this. I don't know if such a thing already exists or not, but if not you can hire a programmer to build it.
If your on linux:
wget -r http://meh.comwill download the entire site for you.@MrGlass you're*
You can use a headless browser like PhantomJS to automate grabbing screen shots of web pages, takes a bit of technical skill but it's definetly not super hard.
@ejcook111 Never heard of this, but it looks interesting. It would probably do the job required.
The easiest way to do this would probably be to build a robot that will take the screenshots for you. That way, you don't have to hire a web programmer or even mess with any web technology at all.
@phatmass just robot technology
http://www.wunderground.com/history/
weather underground has an archived weather search. other sites might as well.
@vampje That appears to be the actual weather history. She's looking for a history of the forecasts.
When whoever is making the plans checks the weather, could they not just print a copy then?
Hey I really appreciate all the suggestions! I knew I would find help here. I am sorry I haven’t responded until now but I wanted to be able to do a full response to all of you.
Yes I am looking for the “Forecast” archives. I am really surprised that no one has done it yet since we all like to complain about how “the Weather Man got it wrong again”
@nadroj I think pdf screen shot would work as it is legible. I would prefer to have the data intact like an old newspaper is archived not just the cliff notes (mined data) but mining it would be cool for each weather man’s report card. Haha @er1c I don’t know is the word I am looking for the CASHE? Is that what I am asking for, my own cache?
@editorkid @MrGlass I am not on Lennox or a Mac. The company is Windows based. I would like it if it were an automatic thing and it could save maybe 4-5 website forecast pages every morning and just save them, (needed or not). @brhfl Is there an automatic thing like this for Windows?
@communist
I personally can’t use the chrome browser but… Could it be set to do it by itself? @ dashcloud This sounds interesting but What is a 2.0 site? How would I know if the sites our guys use are capable of archiving? I did read it and I am pretty lost. Smile. Do I get to pick what I want archived or rely on the site to do it? Can I pick the time I want archived or... I read the FAQ but I am not that sophisticated. I am not at all sure how that site works.
@Vampje @cercopithecoid Thanks but it looks like those won’t do what I am looking for.
@phatmass A robot would be cool! Can you re wire my Neeto to do it? Grin Oh and I guess I will need to get some of those baby arms too. @jqubed
In an ideal world that would be the solution but… The phrase ‘best laid plans of mice and men’ comes to mind. Ideally it would be great as the page was viewed it was saved but that would actually be impractical to save each one each day and still get the required thinking done and the planning completed. And if I can figure out a way to do it, I could make sure it was done every day and not have to rely on them to do it, one page at a time. “Oh I forgot to do it today.” “It was raining so... why should I?” The excuses are innumerable. @katylava You think it is possible? How does a person find a programmer? I was almost thinking that would be the way I would have to go.
Of Corse that is assuming that it hasn’t already been done? @ejcook111 I have never heard of a headless browser. Hmmmmm Will PhantomJS work on windows? How smart do you have to be? Do I need to hire a programmer to set this up? This might be the thing I am looking for?
Going to check it out.
Thanks again to all of you.
@silverqueen The Windows solution is a little bit trickier, but totally doable if you have someone relatively tech-savvy in the office. PhantomJS was a great suggestion (and yes, there is a Windows version), and turning a web page into a screenshot with it is pretty simple, code-wise. The PhantomJS website provides sample code for this. From there, you'd want to automate it to run on a given schedule - Windows includes a tool, the Task Scheduler, for this.
@silverqueen You can find someone on elance.com or craigslist.org. You'll probably find someone new to programming, but this is a fairly simple task. Well, the phantomjs code is easy... not sure about how to make sure it can run on a schedule on your computer. It would probably be better running on a cheap linux server which emails you the screenshots every day, but that would take more skill to set up.
@katylava Windows Task Scheduler actually works surprisingly well, and is much simpler for a layperson than setting up a cron job… that should be the easiest part!
@sliverqueen PhantomJS will work on windows, as far whether or not you will need a programmer. It really depends on your background, it's not a hard program to write but it requires some coding. http://phantomjs.org/ has examples you can probably largely copy and paste.
@ejcook111 @brhfl
How cool is this! Seek and you shall find a plethora of information.
http://phantomjs.org/related-projects.html
Looks like I am in the right place but… I am lost in the language. This might be over my head. Oh, hell who am I kidding, it is way way over my head. Smile At least I have validated that I am not the only one who ever thought this was a good idea for one reason or another. I could down load it but at that point I would be at a screeching halt. I only know a tiny bit of excel codes. No ware near any of the argument logic required for this. Are any of you in the ST. Louis area? Where is the DUMMY book for this stuff?Grin You guys are giving me hope! Thanks again!
@silverqueen actually, looking at that page, it looks like you just need to find someone to set up one of the projects under Screenshot Utilities > web services on a cheap server, and just code a simple scheduled job to use it.
So, the first step is to download PhantomJS and stick it somewhere, just put that folder in C:\Program Files or something. The second step is creating a folder somewhere to keep your images in.
The third step is your code. Based off of the sample code provided, let's do something like this:
var page = require('webpage').create();
page.open('http://nws.noaa.gov/20002', function() {
page.render(new Date().toISOString().substr(0,10) + '.png');
phantom.exit();
});
Paste that into Notepad. The URL in there gets the forecast of DC (ZIP: 20002) from the NWS, replace that URL with whatever you need. The output filename format is the current date in ISO format, so for today that would be 2012-01-23. While that may seem awkward, it's ideal for having a folder full of dated files. Why? Because it means sorting by name sorts them properly. I'd save this file as 'forecast.js' in the folder you're keeping the images in.
To schedule it, search for 'Task Scheduler' in the Start menu search bar. Things could get a little tricky here, as I don't know offhand how this thing has changed between OS versions. But, ideally, on the right you'll have a thing that says 'Create Basic Task.' Double click that. Name it, click 'Next.' Go through the options for when you want it to run (probably two screens worth), click 'Next.' Choose 'Start a Program,' 'Next.' In 'Program/script,' find where you stuck the phantomjs folder, and choose 'phantomjs.exe.' Under 'Arguments,' type the name of the script (forecast.js, or whatever you named it). Figure out where your images folder is - you can go to it in Windows Explorer and right click the path, and copy it from there. This should go in 'Start in.'
That's essentially what's involved, and someone who is relatively comfortable with computers should be able to help. No guarantees here, not responsible if your computer explodes, blah blah, but that's the gist of what needs to happen. I did test the code, worked for me, all the snow made me sad. I wish you luck!
@brhfl Thanks for the wonderful step by step! I was not at work all weekend so I couldn't try the steps. I'll try it when I finish running the payroll. Smile Thank you for the guide.
Well I am having trouble even getting it to install. I think the installer is not very screen reader friendly. I'll need to get someone here who knows how to get this working. I don't believe that the program once installed will be much more accessible for me. Not to say that it is not going to work but it may be for someone else to manage. A crash course for them hahahaha I have had enough trying to find the program that will do it. Now it looks like it is their turn.
Again thank you all for the help. Happy SDF!!!
@silverqueen It's unfortunate, but I'm not terribly surprised that accessibility isn't the greatest. A major component of my job is preparing PDFs for 508(c) compliance, and other general accessibility tasks… people don't get why it takes so much time, why structuring things properly makes such a big difference, why getting it right is (frankly) a big deal. I don't 'get it' either, in the sense that I don't have to use a computer that way. But I try, I spend time in the screen reader, I try to figure out what is working, what is not, what would be irritating if that were my means of interaction. It's unfortunate that more people do not. I will say that in this case, I don't think there's really an installer - just a zip file where you dump the program wherever you want it to live. Anyway, hopefully I helped out with a jumping-off point for someone else to attempt this. Happy SDF to you too!
@silverqueen we recently just had a client ask us the exact same thing - after some looking for a windows setup, I found this for her: http://www.nirsoft.net/utils/web_site_screenshot.html
seems to do what you're looking for - and can even be scheduled.