Educational CyberPlayGround ®

How do I get my page out of Google? Robot Exclusion Protocal.

Learn how to get your page, url, information out of google.

When the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot "crawls" the web.

Steps to Shut Down A Website on Google.

If you use Google's Search Console (formerly known as Google Webmaster tools), there's a spot where you can request removal of pages from Google's search results, including cached pages. Google is NOT required to honor every request.
Another option is to let them know that the business has closed - especially if it displays on Google Maps search results. If it does, there's a link that says "Feedback" under a business result where you can let them know it's closed.

Robots Exclusion Protocol (REP)

1) Robots Exclusion Protocol (REP). starts with having a robots.txt page in your root directory.

example:

User-Agent: Googlebot
Disallow: /logs/

Preventing access to a file

Include a Robots META tag, giving an overview of when to use them.

<html>
<head>
<meta name="googlebot" content="noindex">

You can add the NOFOLLOW tag to the metatag also.This tells the Googlebot not to follow any links it finds on that page, thus hiding the page from google and any other pages linked from there. Add this line to the <HEAD>section

<meta name ="robots" content="nofollow">

BUT to make sure you control everything using NOFOLLOW is not the best method to ensure content does not get into google you want to use the NOINDEX tag on individual pages and by using the robots.txt .

<meta name="googlebot" content="noindex">

This method works with google but there are thousands of robots that are rude, crude, and lude which are programed to get your content no matter what.This problem is more complicated.

ALSO you can use it like this:

<meta name="googlebot" content=" noindex, nofollow" />

More tags you can use:

noarchive, nosnippet, NOODP and
<meta http-equiv="cache-control" content="private" />

 

unavailable_after Meta Tag

This only works with HTML pages.

Do you want to control how long pages stay in google? Do you have a temporary page that will be removed at the end of the month, or are available free for a week, but after that you put them into an archive that users pay to access?

Then you want the page to show in Google search results until it expires, then have it removed: you don't want your content found in google but people can't see it on your site.

Tell the google bot that an HTML page should be removed from the search results after 8am Eastern Standard Time on 1st September 2010, simply add the following tag to the first section of the page:

<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 1-Sept-2010 8:00:00 EST">

The date and time is specified in the RFC 850 format.

This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results.

BUT if you this didn't work then you need to do it yourself. You should use the existing URL removal tool.

Control all your content with the X-Robots-Tag directive in the HTTP Header

  • Don't display a cache link or snippet for this item in the Google search results:
    X-Robots-Tag: noarchive, nosnippet

  • Tell us that a document will be unavailable after 7th July 2007, 4:30pm GMT:

    X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT

You can combine multiple directives in the same document. For example:
  • Do not show a cached link for this document, and remove it from the index after 23rd July 2007, 3pm PST:

X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 23 Jul 2007 15:00:00 PST


 

How to clean your site when it has been infected with malware

If your site has been hacked or infected with malware, you should act quickly to repair the damage. Google recommends reviewing the recommendations provided by the organization antiphishing.org.

Whatever your platform or type of infection, Google recommends the following steps:

1: Quarantine your site

2: Assess the damage

3: Clean up your site

4: Ask Google to review your site