In this lesson it will be detailed described how correctly to use Links database analysis in 3 different cases:
1. Filtering adult-resources by hostname from mixed database.
2. Extracting only forums on phpBB
3. Database checking for “200 OK”- that mean extracting only working links.
There are just few examples of usage of this tool in different aims. Besides, this tool can extract only Russian resources (not using domain zones), or resources which use a specific engine/themes etc.
Example № 1
Filtering adult-resources by hostname from mixed database
Suppose, that in domain name of host contains adult keywords, then domain will have relation to adult content. This report is at 97% true, that mean almost all domains which contains adult keyword have adult content, and 3 % will not change much.
1) Choose adult keywords. For example, I chose only 5 keywords.
sex intim porn xxx erotic
In “Search:” field, enter your list of keywords. So it will look like:
3) Press “Run”. Process will take 2-3 seconds, because we use search by hostnames. So in few seconds we can see results.
As it can see the result is saved in new created database LinksList id22_mod.txt. Open this database – there are 773 links. If we have used more keywords then the result would be above.
... http://www.sexpacking.com/forum/read.php?2,358,page=6 http://www.asexuality.org/discussion/index.php http://sex-work.org/forums/index.php http://forum.literotica.com/sendmessage.php http://www.telefonsex2002.de/telefonsex-forum/index.php http://www.labanlieuesexprime.org/forum.php3?id_article=2 http://www.yusex.com/forum/index.php http://www.sexy-tipp.ch/forum/messages/21867/1481.html?1098903062 http://www.pod-porn.com/cgi-bin/distribb/ultimatebb.cgi http://bbs.porncity.net/index.php http://www.asexstories.com/community/index.php http://www.nofauxxx.com/boards/phpBB2/index.php http://phebus.journalintime.com/forum/ http://www.pornstarkings.com/index.php http://greatsexgames.com/forums/index.php http://www.worldsexguide.com/forum/index.php http://www.sexinfo.ro/forum/index.php ...
The task to collect adult database from mixed database is fulfilled less than in 5 minute.
Extracting only forums on phpBB
1) Now will make a search by content and not by hostname. The process will be similar to first one with exception that keywords will be search in content of site and not in hostname. It will take some time to do search. Will use some keywords like:
phpBB viewforum.php viewtopic.php profile.php?mode=register
3) Press “Run”. In few minutes will be checked more than 3000 URLs
At the end in new created database LinksList id2_mod.txt will be more than 11.000 forums on phpBB (from 25 000 links database):
... http://AvtoSreda.RU/forum/index.php http://www.stroykann.ru/forum/index.php http://www.krada.org/forum/index.php http://forum.neoclub.ru/index.php http://forum.sch192.ru/index.php http://www.arbinada.com/modules.php?name=Forums http://forum.mashexport.com/index.php http://forum.kayman-k.ru/index.php http://fengshuiby.com/forum/index.php http://autoshina.kz/frm///index.php http://www.kachok.ru/forum/index.php http://www.evrostroika.ru/forum/index.php http://forum.spblove.ru/index.php http://mirabeltour.com/mirabelforum/index.php http://forum.americanfootball.ru/index.php http://www.f1-game.ru/forum/index.php http://cinema.kgd.info/forum/index.php http://forum.zapavto.ru/index.php http://forum.vinfo.ru/index.php ...
Database checking for “200 OK”- that mean extracting only working links.
1) This example is analogical with search by content. For searching for “200 OK” it is not necessary to download full page, it is enough to download only topic. At begin of this topic should be “200 OK”. If in topic will be "404 NOT FOUND" or "403 FORBIDDEN", then this link mismatches with our search. So in “Search:” field should use only one line:
3) Press “Run”. Process is faster, than in previous example (40treads/sec, instead of 12 treads/sec as was in previous example), because search is made only in topic of content (since “200 OK” is only in topic):
In fact, in resulting database (LinksList id30_mod.txt) are saved almost all links, because most of them are working (from 1357 links working links are 1256 links). All links where appear “404 Not Found” or where host is banned – are filtered
As you could see this tool can be used in many different ways. Success in Your experiments!