Friday, 24 May 2013

Download Google’s SEO Report Card

I was browsing through my RSS feeds a couple of minutes ago and I came across an interesting post from the Google Webmaster Blog, titled Google’s SEO Report Card.,

The report was created to help Google’s engineers and employees better optimize their product pages for search engines. So yeah, even Google practices SEO!

You’ll find that the report touches on many on-page and on-site SEO factors, including title tags, meta tags, heading tags, canonical URLs, image tags and link anchors.

The most useful part, however, are the examples that are included on each point. You’ll be able to understand what is right and what is wrong on each SEO factor.

Here is the direct download link for the PDF (hoping Google won’t mind…).

Google Now Lists the Top 1,000 Websites On The Web

Curious to know what are the largest websites on the web? While there are some lists around, none of them was regarded as accurate, and Google apparently wants to fix this. It just released its own Top 1000 Websites list.

As you can see below, the list shows the category of the website, its number of monthly unique visitors, page views, and whether or not the website accepts advertising.

The announcement was made on the official AdWords blog. In fact this list was developed to help advertisers target big websites that accept ads.

According to Google itself “the list excludes adult sites, ad networks, domains that don’t have publicly visible content or don’t load properly, and certain Google sites,” so keep this in mind.

Here are the blogs I spotted on the list:

#246 – The Huffington Post – 12 million Uniques
#434 – Engadget – 8,1 Million Uniques
#540 – Gizmodo – 6.7 Million Uniques
#696 – Mashable – 5.6 Million Uniques
#850 – TechCrunch – 4.7 Million Uniques

If these numbers are accurate they reveal some very interesting data. For example, Mashable is as big as PCWorld, one of the oldest and most established tech publications. Similarly, Engadget is as big as the Washington Post (on the web only, obviously). And The Huffington Post as big as Digg (counting uniques only, not page views).

I was hoping to see Daily Blog Tips around the last positions, but no luck…!

Update: Here is where Google says the numbers are coming from:

Traffic statistics are estimated by combining sample user data from various Google products and services and opt-in direct-measured site-centric data. In addition, site owners may opt-in direct measured Google Analytics traffic statistics to provide a more accurate measure of their site traffic. Sites that have opted-in direct measured Google Analytics data are indicated through the footnote “Google Analytics data shared by publisher”.

Video: How To Get Thousands of Visitors From Google

It’s easier than ever to get Google to send your blog thousands of visitors a month. Forget complicated terms like “keyword density”–let’s keep it simple. Here’s how, with about five minutes of work, you can get your blog ranked highly in Google for valuable search terms.

http://www.youtube.com/watch?feature=player_embedded&v=WR2ecd8gx4k

Video Highlights:

[0:20] First step: Go to Google and type in “Google keyword tool”. The first result is the Google Keyword Tool.
[0:51] I recommend typing in a question word, like “how”, “how to”, or “what”, followed by a generic word that describes your blog, like “business”.
[1:20] Sort by Global Monthly Search Volume. This is how many people, on average, search for this keyword in a month.
[1:42] Look for keywords that have between 70 and 10,000 searches per month.
[2:00] Important: You should only target one keyphrase with each of your blog posts.
[2:20] Your blog post title and permalink should be this exact keyphrase. Resist the urge to add extra words.
[2:25] Use your selected keyphrase several times in your blog post.
[2:31] Also important: Make sure that when people link to your blog post, they use that exact keyphrase. The best way to do this is to make it the title of your blog post.
[2:48] If you have a WordPress blog, download the free All In One SEO Pack to modify your page title, keywords, and meta description.
[2:57] Your meta description is the sentence or two that appears below your blog post in Google search results.
[3:17] Guest posting, and linking to your blog post in the “blurb” at the end of your guest post, will help your post rank higher in the search engines.
[3:48] Use this simple system to get thousands of visitors to your blog every month!

Have you had good results from using the Google Keyword Tool to rank your blog posts well in the search engines? Let us know in the comments.

Calculate Your Google Supplemental Index Ratio

Successful bloggers know the importance of learning SEO concepts. One method of measuring the SEO health of your website is to calculate the ratio of your pages in Google’s supplemental index.

What is the supplemental index?

In short, its nickname is ‘Google Hell’ and it is a place your website does not want to be. The supplemental index is a secondary index for lower ranking pages. Pages found in the supplemental index tend to be crawled less often and will never be assigned Page Rank. As a result, these pages tend to appear lower in organic search results. There are many reasons why pages lose rank and fall into the supplemental index. Here are the most common:

Low quality content (1 line posts)
Internal duplicate content noise or scraped posts
Lack of external links
The number of query string parameters exceeds Google’s algorithm

Calculating your supplemental index ratio

There has been numerous posts in the SEO community on calculating Google supplemental index ratios. Unfortunately, most of the queries to determine the number of pages in the supplemental index were deprecated and no longer return the correct results. These queries include:

site:www.yoursite.com *** -sjpked
site:www.yoursite.com *** -sljktf
site:www.yoursite.com *** -view
site:www.yoursite.com *** -ndsfoiw

Since supplemental queries seem to have a limited lifetime, a more stable way is to find the number of pages in the main index (those that have a higher chance of appearing in search results) and subtract it from the total number of pages indexed.

Total Pages Indexed = site:www.yoursite.com
Pages in the Main Index = site:www.yoursite.com -inallurl:www.yoursite.com

Pages in Supplemental Index = Total Pagex Indexed – Pages in the Main Index

To calculate your supplemental index ratio you simply divide the number of supplemental pages by the total number of pages indexed (the lower this ratio, the better). Below you will find some examples:

Website	Pages in Supplemental Index	Total Pages Indexed	Supplemental Index Ratio
www.seobook.com	90	2260	3,9%
www.dailyblogtips.com	60	521	11,5%
www.copyblogger.com	116	574	20,2%

How can I make my ratio better?

Optimize Your Blog for Search Engines. Many tips can be found in the previous article Blog Setup: 40 Practical Tips
QC + QL = No Supplemental Index. The best way to pull your pages out of the supplemental index is by providing quality content (QC) that will get you quality links (QL). Search engines will start to view your blog as an authority and will place your page in the main index. You might get lucky and through internal linking or site association other pages may also be removed from the supplemental index.
Be patient. New blogs tend to have ratios above 75% for a number of months. This is because of low traffic and a lack of quality links. Keep posting quality content and your ratio will improve.

Collection of Robots.txt Files

The implementation of a suitable robots.txt file is very important for search engine optimization. There is plenty of advice around the Internet for the creation of such files (if you are looking for an introduction on this topic read “Creat a robots.txt file“), but what if instead of looking at what people say we could look at what people do?

That is what I did, collecting the robots.txt files from a wide range of blogs and websites. Below you will find them.

Key Takeaways

Only 2 out of 30 websites that I checked were not using a robots.txt file
Even if you don’t have any specific requirements for the search bots, therefore, you probably should use a simple robots.txt file
Most people stick to the “User-agent: *” attribute to cover all agents
The most common “Disallowed” factor is the RSS Feed
Google itself is using a combination of closed folders (e.g., /searchhistory/) and open ones (e.g., /search), which probably means they are treated differently
A minority of the sites included the sitemap URL on the robots.txt file

Top 25 SEO Blogs

The Top 25 SEO Blogs list, instead, ranks the blogs according to their Google Pagerank, Alexa rank, number of Bloglines subscribers and Technorati authority. Each factor has a score from 0 to 10, and the maximum score for each blog is 40. Details about the algorithm can be found below the table.


#1	Search Engine Land	7	10	9	10	36
#2	SEOBook	6	10	10	10	36
#3	SEO Moz	5	10	10	10	35
#4	Matt Cutts	7	10	8	10	35
#5	Search Engine Watch	7	10	10	7	34
#6	Search Engine Roundtable	7	10	8	8	33
#7	Search Engine Journal	7	8	9	8	32
#8	Online Marketing Blog	6	7	7	10	30
#9	Pronet Advertising	7	7	5	10	29
#10	Marketing Pilgrim	7	8	6	8	29
#11	SEO Chat	6	10	4	6	26
#12	Search Engine Guide	7	8	4	6	25
#13	SEO Blackhat	6	8	6	5	25
#14	Stuntdubl	6	6	6	6	24
#15	Graywolf’s SEO	6	7	4	7	24
#16	SEO by the SEA	6	4	5	5	20
#17	Link Building Blog	5	5	5	4	19
#18	Jim Boykin	5	6	4	4	19
#19	SEOpedia	6	5	4	4	19
#20	DaveN	6	5	4	4	19
#21	Bruce Clay	5	7	3	3	18
#22	Blue Hat SEO	4	6	3	4	17
#23	Tropical SEO	5	5	3	4	17
#24	SEO Refugee	5	6	1	3	15
#25	Small Business SEM	5	4	3	3	15

Blogs considered: the list considers only blogs that have a high percentage of SEO-related content. Topics might range from SEO news coverage to general SEO discussion and link building.

Google Pagerank (0 to 10): the actual Pagerank was used on the algorithm.
Alexa Rank (0 to 10): Ranges were determined based on the Alexa Rank (i.e., 100k and up, 80k-100k, 60k-80k, 40k-60k) and each range was assigned a number (1 to 10).

Bloglines Subscribers (0 to 10): Subscriber ranges were determined (i.e., 1-50, 50-100, 100-150, 150-300) and each range was assigned a number (1 to 10).
Technorati Authority (0 to 10): Ranges were determined based on Technorati’s Authority rank (i.e., 1-125, 125-250, 500-750,750-1000) and each range was assigned a number (1 to 10).

Top 7 SEO Blogs

Bruce Clay, Inc. Blog – Bruce is an old name in the industry. His blog is a great resource for SEO news and tips in general. What keeps me coming back, though, are the hilarious in-post comment wars he has with Susan. Think we could do something like that, Daniel?
Marketing Pilgrim – Andy Beal’s got some great material on a variety of different marketing-related topics, including a lot of articles on SEO and blogging.
Matt Cutts: Gadgets, Google, and SEO – There’s nothing quite like getting news straight from the source. Matt works for Google. He’s a great read to begin with, but you just can’t beat his blog for authority when it comes to the internet’s #1 search engine.
Search Engine Land – Although the blog itself has only been around for a few months, Danny Sullivan has been an SEO bigwig for much longer.
Search Engine Roundtable – If you want to stay abreast of conversations on the hottest SEO discussion boards (e.g. Cre8asite, WebmasterWorld, DigitalPoint, etc.) without, well, having to read them, this blog is your first stop.
Search Engine Watch – This blog is definitely one of the oldest and most respected sources for search engine news. It’s a bit drier than the others; good for information, bad for personality, but still useful if all you’re interested in are the facts.
SEOmoz Daily SEO Blog – Rand Fishkin and co. provide some great information. Be sure not to miss their whiteboard fridays.

Search Engine Ranking Factors

It is always a good idea to be updated on the factors that search engines use to determine search results and rank websites. SEOMoz released a very detailed document titled “Search Engine Ranking Factors V2”, which outlines the views of 34 SEO experts regarding how Google’s algorithm works. Below you will find the Top 5 positive and negative factors on the study:

Top 5 Positive Factors

Keyword Use in Title Tag
Global Link Popularity of Site
Anchor Text of Inbound Link
Link Popularity within the Site
Age of Site

Top 5 Negative Factors

Server is Often Inaccessible to Bots
Content Very Similar or Duplicate of Existing Content on the Index
External Links to Low Quality/Spam sites
Participation in Link Schemes or Actively Selling Links
Duplicate Title/Meta Tags on Many Pages

How Google Ranks Blogs

Google Blog Search is a new tool that is gaining popularity on the Internet lately. The Blog Search might also be a good source of visitors if your blog rank on the first positions for specific keywords, but what factors does Google take into account to elaborate the search results?

The “Seo by the Sea” blog has an interesting article analyzing a new patent from Google that contains some indicators about the positive and negative factors affecting blog ranking, check it out:

Positive Factors:

Popularity of the blog (RSS subscriptions)

Implied popularity (how many clicks search results get)

Inclusion in blogrolls

Inclusion in “high quality” blogrolls

Tagging of posts (also from users)

References to the blog by sources other than blogs

Pagerank

Negative Factors:

Predictable frequency of posts (short bursts of posts might indicate spam)

Content of the blog does not match content of the feed

Content includes spam keywords

Duplicated content

Posts have all the same size

Link distribution of the blog

Posts primarily link to one page or site

Create a robots.txt file

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.

A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.

“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.

If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/

After you created the robots.txt file just upload it to your root directory and you are done!