Archive

Posts Tagged ‘Webmaster Trends Analyst’

Work smarter, not harder, with site health

October 1, 2011 Comments off

We consistently hear from webmasters that they have to prioritize their time. Some manage dozens or hundreds of clients’ sites; others run their own business and may only have an hour to spend on website maintenance in between managing finances and inventory. To help you prioritize your efforts, Webmaster Tools is introducing the idea of “site health,” and we’ve redesigned the Webmaster Tools home page to highlight your sites with health problems. This should allow you to easily see what needs your attention the most, without having to click through all of the reports in Webmaster Tools for every site you manage.

Here’s what the new home page looks like:

You can see that sites with health problems are shown at the top of the list. (If you prefer, you can always switch back to listing your sites alphabetically.) To see the specific issues we detected on a site, click the site health icon or the “Check site health” link next to that site:

This new home page is currently only available if you have 100 or fewer sites in your Webmaster Tools account (either verified or unverified). We’re working on making it available to all accounts in the future. If you have more than 100 sites, you can see site health information at the top of the Dashboard for each of your sites.

Right now we include three issues in your site’s health check:

  1. Have we detected malware on the site?
  2. Have any important pages been removed via our URL removal tool?
  3. Are any of your important pages blocked from crawling in robots.txt?

You can click on any of these items to get more details about what we detected on your site. If the site health icon and the “Check site health” link don’t appear next to a site, it means that we didn’t detect any of these issues on that site (congratulations!).

A word about “important pages:” as you know, you can get a comprehensive list of all URLs that have been removed by going to Site configuration > Crawler access > Remove URL; and you can see all the URLs that we couldn’t crawl because of robots.txt by going to Diagnostics > Crawl errors > Restricted by robots.txt. But since webmasters often block or remove content on purpose, we only wanted to indicate a potential site health issue if we think you may have blocked or removed a page you didn’t mean to, which is why we’re focusing on “important pages.” Right now we’re looking at the number of clicks pages get (which you can see in Your site on the web > Search queries) to determine importance, and we may incorporate other factors in the future as our site health checks evolve.

Obviously these three issues—malware, removed URLs, and blocked URLs—aren’t the only things that can make a website “unhealthy;” in the future we’re hoping to expand the checks we use to determine a site’s health, and of course there’s no substitute for your own good judgment and knowledge of what’s going on with your site. But we hope that these changes make it easier for you to quickly spot major problems with your sites without having to dig down into all the data and reports.

After you’ve resolved any site health issues we’ve flagged, it will usually take several days for the warning to disappear from your Webmaster Tools account, since we have to recrawl the site, see the changes you’ve made, and then process that information through our Web Search and Webmaster Tools pipelines. If you continue to see a site health warning for that site after a week or so, the issue may not have been resolved. Feel free to ask for help tracking it down in our Webmaster Help Forum… and let us know what you think!

More guidance on building high-quality sites

September 6, 2011 Comments off

In recent months we’ve been especially focused on helping people find high-quality sites in Google’s search results. The “Panda” algorithm change has improved rankings for a large number of high-quality websites, so most of you reading have nothing to be concerned about. However, for the sites that may have been affected by Panda we wanted to provide additional guidance on how Google searches for high-quality sites.

Our advice for publishers continues to be to focus on delivering the best possible user experience on your websites and not to focus too much on what they think are Google’s current ranking algorithms or signals. Some publishers have fixated on our prior Panda algorithm change, but Panda was just one of roughly 500 search improvements we expect to roll out to search this year. In fact, since we launched Panda, we’ve rolled out over a dozen additional tweaks to our ranking algorithms, and some sites have incorrectly assumed that changes in their rankings were related to Panda. Search is a complicated and evolving art and science, so rather than focusing on specific algorithmic tweaks, we encourage you to focus on delivering the best possible experience for users.

What counts as a high-quality site?

Our site quality algorithms are aimed at helping people find “high-quality” sites by reducing the rankings of low-quality content. The recent “Panda” change tackles the difficult task of algorithmically assessing website quality. Taking a step back, we wanted to explain some of the ideas and research that drive the development of our algorithms.

Below are some questions that one could use to assess the “quality” of a page or an article. These are the kinds of questions we ask ourselves as we write algorithms that attempt to assess site quality. Think of it as our take at encoding what we think our users want.

Of course, we aren’t disclosing the actual ranking signals used in our algorithms because we don’t want folks to game our search results; but if you want to step into Google’s mindset, the questions below provide some guidance on how we’ve been looking at the issue:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much quality control is done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?

Writing an algorithm to assess page or site quality is a much harder task, but we hope the questions above give some insight into how we try to write algorithms that distinguish higher-quality sites from lower-quality sites.

What you can do

We’ve been hearing from many of you that you want more guidance on what you can do to improve your rankings on Google, particularly if you think you’ve been impacted by the Panda update. We encourage you to keep questions like the ones above in mind as you focus on developing high-quality content rather than trying to optimize for any particular Google algorithm.

One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole site’s rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content.

We’re continuing to work on additional algorithmic iterations to help webmasters operating high-quality sites get more traffic from search. As you continue to improve your sites, rather than focusing on one particular algorithmic tweak, we encourage you to ask yourself the same sorts of questions we ask when looking at the big picture. This way your site will be more likely to rank well for the long-term. In the meantime, if you have feedback, please tell us through ourWebmaster Forum. We continue to monitor threads on the forum and pass site info on to the search quality team as we work on future iterations of our ranking algorithms.

Reorganizing internal vs. external backlinks

September 5, 2011 Comments off

Today we’re making a change to the way we categorize link data in Webmaster Tools. As you know, Webmaster Tools lists links pointing to your site in two separate categories: links coming from other sites, and links from within your site. Today’s update won’t change your total number of links, but will hopefully present your backlinks in a way that more closely aligns with your idea of which links are actually from your site vs. from other sites.

You can manage many different types of sites in Webmaster Tools: a plain domain name (example.com), a subdomain (www.example.com or cats.example.com), or a domain with a subfolder path (www.example.com/cats/ or www.example.com/users/catlover/). Previously, only links that started with your site’s exact URL would be categorized as internal links: so if you entered www.example.com/users/catlover/ as your site, links from www.example.com/users/catlover/profile.html would be categorized as internal, but links from www.example.com/users/ or www.example.com would be categorized as external links. This also meant that if you entered www.example.com as your site, links from example.com would be considered external because they don’t start with the same URL as your site (they don’t contain www).

Most people think of example.com and www.example.com as the same site these days, so we’re changing it such that now, if you add either example.com or www.example.com as a site, links from both the www and non-www versions of the domain will be categorized as internal links. We’ve also extended this idea to include other subdomains, since many people who own a domain also own its subdomains—so links from cats.example.com or pets.example.com will also be categorized as internal links for www.example.com.

Links for www.google.com External links Internal links
Previously categorized as… www.example.com/
www.example.org/stuff.html
scholar.google.com/
sketchup.google.com/
google.com/
www.google.com/
www.google.com/stuff.html
www.google.com/support/webmasters/
Now categorized as… www.example.com/
www.example.org/stuff.html
scholar.google.com/
sketchup.google.com/
google.com/
www.google.com/
www.google.com/stuff.html
www.google.com/support/webmasters/

If you own a site that’s on a subdomain (such as googlewebmastercentral.blogspot.com) or in a subfolder (www.google.com/support/webmasters/) and don’t own the root domain, you’ll still only see links from URLs starting with that subdomain or subfolder in your internal links, and all others will be categorized as external links. We’ve made a few backend changes so that these numbers should be even more accurate for you.

Note that, if you own a root domain like example.com or www.example.com, your number of external links may appear to go down with this change; this is because, as described above, some of the URLs we were previously classifying as external links will have moved into the internal links report. Your total number of links (internal + external) should not be affected by this change.

PDFs in Google search results

September 5, 2011 Comments off

Our mission is to organize the world’s information and make it universally accessible and useful. During this ambitious quest, we sometimes encounter non-HTML files such as PDFs, spreadsheets, and presentations. Our algorithms don’t let different filetypes slow them down; we work hard to extract the relevant content and to index it appropriately for our search results. But how do we actually index these filetypes, and—since they often differ so much from standard HTML—what guidelines apply to these files? What if a webmaster doesn’t want us to index them?

Google first started indexing PDF files in 2001 and currently has hundreds of millions of PDF files indexed. We’ve collected the most often-asked questions about PDF indexing; here are the answers:

Q: Can Google index any type of PDF file?
A: Generally we can index textual content (written in any language) from PDF files that use various kinds of character encodings, provided they’re not password protected or encrypted. If the text is embedded as images, we may process the images with OCR algorithms to extract the text. The general rule of the thumb is that if you can copy and paste the text from a PDF document into a standard text document, we should be able to index that text.

Q: What happens with the images in PDF files?
A: Currently the images are not indexed. In order for us to index your images, you should create HTML pages for them. To increase the likelihood of us returning your images in our search results, please read the tips in our Help Center.

Q: How are links treated in PDF documents?
A: Generally links in PDF files are treated similarly to links in HTML: they can pass PageRank and other indexing signals, and we may follow them after we have crawled the PDF file. It’s currently not possible to “nofollow” links within a PDF document.

Q: How can I prevent my PDF files from appearing in search results; or if they already do, how can I remove them?
A: The simplest way to prevent PDF documents from appearing in search results is to add an X-Robots-Tag: noindex in the HTTP header used to serve the file. If they’re already indexed, they’ll drop out over time if you use the X-Robot-Tag with the noindex directive. For faster removals, you can use the URL removal tool in Google Webmaster Tools.

Q: Can PDF files rank highly in the search results?
A: Sure! They’ll generally rank similarly to other webpages. For example, at the time of this post, [mortgage market review], [irs form 2011] or [paracetamol expert report] all return PDF documents that manage to rank highly in our search results, thanks to their content and the way they’re embedded and linked from other webpages.

Q: Is it considered duplicate content if I have a copy of my pages in both HTML and PDF?
A: Whenever possible, we recommend serving a single copy of your content. If this isn’t possible, make sure you indicate your preferred version by, for example, including the preferred URL in your Sitemap or by specifying the canonical version in the HTML or in the HTTP headers of the PDF resource. For more tips, read our Help Center article about canonicalization.

Q: How can I influence the title shown in search results for my PDF document?
A: We use two main elements to determine the title shown: the title metadata within the file, and the anchor text of links pointing to the PDF file. To give our algorithms a strong signal about the proper title to use, we recommend updating both.

If you want to learn more, watch Matt Cutt’s video about PDF files’ optimization for search, and visit our Help Center for information about the content types we’re able to index. If you have feedback or suggestions, please let us know in the Webmaster Help Forum.