Mail us : boss@infotinks.com

Amazing way to extract links from raw html

2014-06-19Linux, Linux and Windowskossboss

Amazing way to extract links from raw html
########################################

This goes well with my other article: Extract Quotes

Extract html from links found best one here: http://stackoverflow.com/questions/1881237/easiest-way-to-extract-the-urls-from-an-html-page-using-sed-or-awk-only

cat index.html | grep -o '<a href=['"'"'"][^"'"'"']*['"'"'"]' | sed -e 's/^<a href=["'"'"']//' -e 's/["'"'"']$//' <br>

NOTE: index.html can be a full website or just a subset of a website, the above will extract links pretty well, but as the forum says it is still limited by the limitations and capacities of regular expressions

NOTE: this can work in windows using grep.exe and sed.exe and cat.exe (which come in your regular cygwin package)

One thought on “Amazing way to extract links from raw html”

Dallas Stars Samsung Galaxy S5 Cases Argyle says:

2015-01-19 at 10:30 am

I conceive you have noted some very interesting details , thankyou for the post.
Dallas Stars Samsung Galaxy S5 Cases Argyle http://onenortheastlegacy.co.uk/images/nhl/Dallas-Stars-Samsung-Galaxy-S5-Cases-Argyle.html

Reply

Leave a Reply Cancel reply

* COVID19 CORONAVIRUS DASHBOARD
A python powered covid19 dashboard. Stats are updated daily. Uses plotly for the plots. Can view normal scale plot or logarithmic scale plots. Github code available inside.

* RHOOD - ROBINHOOD PROFITS
A robin_stocks API powered python script that parses your Robinhood account to provide you with all information in a single text output. Mainly, it parses all of your orders and outputs sorted orders, open positions, informative profits, dividend information.

* XRAID/RAID CALCULATOR
Use this calculator to find final useable filesystem size of a RAID array. This calculator works for the ReadyNAS and ReadyDATA. Also for any ZFS volumes and any MDADM volumes: RAID0,1,10,50,60 with any number of vdevs (RAIDz3 not included).

* NETGEAR RAID CONFIGURATOR
While employed at Netgear, I wrote the logic behind this calculator. Netgear noticed the popularity of my XRAID/RAID calculator and asked me to help dev up the javascript logic.

* ALL GUITAR NOTES USING PYTHON
Python exercise showing how to graph all of the notes on a guitar. Shows the note of every fret of each string.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31