Em lập topic này để thảo luận tất cả về bot ợ, trước e đọc bên fb hình như có cần phải block một số bot nào? Hoặc cách detect all bot, cách lùa vào chuồng, nên cho bot nào vào, bot nào không? Hoặc bot quậy quá làm sập db thì block...Xin mời các cụ
Hiện tại server mình mấy ngày nay ngày vẹo nào bot nó cũng cào từ sáng đến tối =)) Cào đến nỗi rú cả con server ở nhà :v
Đây, các bác muốn block mấy con spider bẩn bẩn thì copy đoạn dưới cho vào robots.txt ============================ User-agent: linkdexbot Disallow: / User-agent: linkdexbot/2.0 Disallow: / User-agent: Yandex Disallow: / User-agent: YandexImages Disallow: / User-agent: Baiduspider Disallow: / User-agent: MJ12bot Disallow: / User-agent: spbot Disallow: / User-agent: LexxeBot/1.0 Disallow: / User-agent: NextGenSearchBot Disallow: / User-agent: Sosospider Disallow: / User-agent: Sosospider+(+http://help.soso.com/webspider.htm) Disallow: / User-agent: SiteBot/0.1 Disallow: / User-agent: SiteBot Disallow: / User-agent: crystalsemanticsbot Disallow: / User-agent: CrystalSemanticsBot Disallow: / User-agent: NetSeer crawler Disallow: / User-agent: trovitBot Disallow: / User-agent: LexxeBot Disallow: / User-agent: DotBot Disallow: / User-agent: Ezooms Disallow: / User-agent: discobot Disallow: / User-agent: Jyxobot Disallow: / User-agent: sogou Disallow: / User-agent: sogou spider Disallow: / User-agent: sistrix Disallow: / User-agent: heritrix Disallow: / User-agent: GarlikCrawler/1.1 (http://garlik.com/, [email protected]) Disallow: / User-agent: AhrefsBot Disallow: / User-agent: NerdByNature.Bot Disallow: / User-agent: psbot Disallow: / User-agent: WBSearchBot Disallow: / User-agent: AddThis.com robot [email protected] Disallow: / User-agent: AddThis.com Disallow: / User-agent: ia_archiver Disallow: / User-agent: proximic Disallow: / User-agent: discoverybot Disallow: / User-agent: bl.uk_lddc_bot Disallow: / User-agent: IstellaBot Disallow: / User-agent: seokicks Disallow: / User-agent: SEOkicks-Robot Disallow: / User-agent: UnisterBot Disallow: / User-agent: Bender Disallow: / User-agent: wotbox Disallow: / User-agent: Yasni Disallow: / User-agent: JikeSpider Disallow: / User-agent: netEstate NE Crawler Disallow: / User-agent: Exabot Disallow: / User-agent: Pixray-Seeker Disallow: / User-agent: Linguee Disallow: / User-agent: integromedb Disallow: / User-agent: SearchmetricsBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: BDCbot Disallow: / User-agent: grapeshot Disallow: / User-agent: GrapeshotCrawler Disallow: / User-agent: WeSEE:Search Disallow: / User-agent: TurnitinBot Disallow: / User-agent: admantx Disallow: / User-agent: spbot Disallow: / User-agent: BUbiNG Disallow: /
Vào đâu để xem bot ạ? trên vps làm sao để biết. xem log mà em chả hiểu gì. toàn thấy invailid pass user ....
Aug 10 00:39:20 ip-172-31-37-5 sshd[21390]: Invalid user 1111 from 5.101.40.10 Aug 10 00:39:20 ip-172-31-37-5 sshd[21390]: input_userauth_request: invalid user 1111 [preauth] Aug 10 00:39:20 ip-172-31-37-5 sshd[21390]: Connection closed by 5.101.40.10 [preauth] Aug 10 00:39:21 ip-172-31-37-5 CRON[21384]: pam_unix(cron:session): session closed for user root Aug 10 00:39:46 ip-172-31-37-5 sshd[21398]: Connection closed by 5.101.40.10 [preauth] Aug 10 00:39:56 ip-172-31-37-5 sshd[21408]: Invalid user 1234 from 5.101.40.10 Aug 10 00:39:56 ip-172-31-37-5 sshd[21408]: input_userauth_request: invalid user 1234 [preauth] Aug 10 00:39:57 ip-172-31-37-5 sshd[21408]: Connection closed by 5.101.40.10 [preauth] Aug 10 00:40:20 ip-172-31-37-5 sshd[21410]: Invalid user admin from 5.101.40.10 Aug 10 00:40:20 ip-172-31-37-5 sshd[21410]: input_userauth_request: invalid user admin [preauth] là sao hả các bác? bị scan user phải k ạ?
ngày nay, bots bỏ qua file robots.txt file robots.txt User-agent: AhrefsBot Disallow: / User-agent: coccoc Disallow: / thực tế bots vẫn vào web ầm ầm
Moi search cho cu day : http://hotpot.se/robots_txt_bots_bad.htm or http://help.ahrefs.com/about-ahrefs/what-is-the-list-of-your-current-ip-ranges cu xem thu nhe