Google indexa flash

Como ya debereis saber, google ha modificado sus algoritmos de crawling para entender (o intentar entender :-) el todopoderoso flash que tantos dolores de cabeza traen.

Os adjunto un articulo de Vanesa Fox hablando de algunas recomendaciones. A mi modo de ver… es de obligada lectura !!

A couple of weeks ago, Adobe announced that it was working with Google and Yahoo! on making Flash content easier to index in search engines. Google said it was using the search-engine specific Flash player that Adobe had made available (Yahoo!’s integration is still in the works). While I think it’s great and absolutely vital that search engines continue to evolve beyond strictly text (to ensure they are providing the best possible experience for their users), I don’t think this announcement means that all the Flash content on the web will now suddenly start ranking in search results and I don’t think that Flash developers can stop thinking about search engine optimization.

How search engines work
It all goes back to how search engines work. At least for now (even with all of the advancements in the last year around universal search), the foundations of the major search engines are based on text. The web began with primarily text-only pages and the search engine algorithms were built on that idea. When people started searching for information, they searched with words. We’re used to asking for things in words, after all, and since words were what the web was made up of, the questions and answers matched up quite well. Search engines are a bit of a middleman (middlemachine?) between a searcher’s textual questions and a web site’s textual answers.

Searching continues to be text based
Sure, you might imagine other types of exchanges. I might want to upload a picture of a person and ask for all the other pictures on the web of that person. Or I might want to search through the audio of a song for a particular lyric. All of those types of searches and more are coming (and some have been tried, with varying degrees of success), but at least for now, those applications are not how the three major search engines work and not how most people search.

Over time, search engines have experimented with different elements on pages beyond simply the text itself to better understand what those pages are about. Although since these experiments are built on a text-based foundation, the experiments have also still mostly focused on text. For instance, search engines found that the text that’s in the title may be a strong indicator of the focus on the page. The textual caption under and image is likely describing that image.

How Flash fits in with text-based search engines
Now, consider Flash. Most Flash pages contain little text. Those that do could often just as easily display that text outside of the Flash components (which would make it easier for those on screen readers and mobile phones, for instance, to view the content).

With this latest innovation in crawling Flash, Google can more easily access the text in Flash, but they still can’t process it quite as well as it can HTML text because they aren’t extracting any meta data about that text. As I mentioned earlier, search engines are now storing all kinds of meta data based on the structure of the text in HTML, like if it’s in a title tag, or an H1 and so on. So Flash-based text has that disadvantage.

Provide a separate URL for each piece of Flash content
Another consideration is how the Flash application itself is constructed. This new Flash player that Adobe is making available to Google and Yahoo! helps the search engines in that it enables them to access content it never could before. The crawlers can interact with the Flash application as a user would and crawl deeper into the application to get to text that may be four or five levels deep. On first glance, this may seem similar to search engine crawlers following links within HTML sites, but it can actually be quite different.

HTML pages (generally) have unique URLs for each page. Flash applications can be constructed that way, but can also be constructed so that as you go deeper into the application, the URL doesn’t change. This can be problematic for lots of usability reasons that have nothing to do with search. For instance, the back button in the browser doesn’t work. Users can’t easily email, Digg, or otherwise share a particular section of the Flash application easily. Bookmarking only works for the beginning of the Flash app.

As you might imagine, it also causes problems in search. Sure, the search engine crawlers may now be able to get to some of that content several levels in, but they have to index all of the text under a single URL. (Also note that they likely won’t index all of the application in this case; they will execute only a certain number of interactions.)

Say information about your latest product line is available once you choose “products” from the home page, then “new” from the products page, then “coming soon” from the new page. If the URL of the application doesn’t change for each interaction, then search engines will have to index the content from the home page, products page, new page, and coming soon page all under a single URL. When a searcher looks for your latest product line, that URL may appear in the results. But once the searcher clicks over, they aren’t brought to your coming soon page, they see your home page, and may have no idea where to go from there. If you ensure your Flash app uses a different URL for each page, then the searcher can be brought directly to the page that has the right content, which should greatly improve conversion rates and lower bounce rates.

But if you take the announcement that Google can now index Flash at face value, without looking deeper, you may not realize this, and think that your single-URL Flash application is now perfectly positioned for search.

Taking back the tour
Want an example of how the statement “Google can now index Flash” isn’t the whole story?

I’ve been watching the Tour de France. It’s playing on the Versus network for the first time this year. I’d never heard of the Versus network before (since it seems to mostly show ultimate fighting cage matches, this may be because I’m not its target audience; not to mention that I wasn’t the target audience for the network under its previous name, OLN, as I think it mostly played shows about people fishing then), and the network is looking to capitalize on this potential new audience.

Versus is spending a lot of money on its Tour de France campaign “Take Back the Tour”. It has put together flashy commercials and an equally flashy website.

firstpage

Versus probably would like to be found when people search for [tour de france]. The Tour de France page on the main versus.com domain shows up in the search results, but the Take Back The Tour site that they spent so money money on? Nowhere to be found.

Well, they’re spending all the money on commercials and print ads, so maybe people have been searching for [take back the tour] as well. The site does rank #1 for that query on both Google and Live (although it’s down at #8 on Yahoo!). For all three engines, even those who do the search because they saw an ad might not be sure if the takebackthetour.com listing is really the official site based on how the listing looks in the search results.

results

You can see that at this point, Google doesn’t see any content on the site and in fact, notes on the cached page that [take back the tour] appears only in links pointing to the page. Since it can’t extract any text, it has no way of knowing that the site is about the Tour de France.

Google still doesn’t Flash executed via JavaScript
So. What’s the problem? Google crawls Flash now and all should be well. I see at least two problems. The first is fundamental. The Flash executes via JavaScript. Google noted in their blog post that:

“Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.”

They did update the post later to say that:

“For our July 1st launch, we didn’t enable Flash indexing for Flash files embedded via SWFObject. We’re now rolling out an update that enables support for common JavaScript techniques for embedding Flash, including SWFObject and SWFObject2.”

Will this update help the Take Back the Tour site? Maybe not.

Can Google find any words to index?
Another big obstacle to the crawl of this site is that even if Google could get to the Flash, it would find few words to index. Nearly all of the text on the site is contained in images. The first thing you see when you go to the site is lots of words, but the only ones that seem to be text, rather than part of the image, are in the link “join the movement”.

So, once Google can access the Flash, it will be able to crawl and index those words. This design is a theme throughout the site. Links like “back” are text. Nearly everything else is in images.

Let’s pretend for a moment that they changed the Flash file so that the text wasn’t contained in images (and that the JavaScript problem didn’t exist). Would this help indexing? Yes and no.

No separate URLs can lead to a poor experience for searchers
Each time you click a link in the Flash file, you are taken to another page, but the URL doesn’t change. It stays at takebackthetour.com no matter how you navigate. That means that any text Google does pick up will be indexed under that one URL.

By clicking about three levels deep, I can find TV spots about the tour. If the site designers added some text about those TV spots, using the language of their customers, then searchers looking for [tour de france video] or something similar might see the takebackthetour.com site come up in their search results. But when they clicked through to the site, they wouldn’t see the TV spots. They would see the Flash splash page. And they would have to figure out how to navigate through the site to find the video section. Chances are that many searchers would scan the initial page that came up, not see what they were looking for and go back to the search results to find another site.

Little change for viral success
This makes for a poor user experience from search, but consider also that the creators of this campaign obviously are hoping it goes viral. If you want a site to go viral, you have to make it easily shareable. Sure, people may love the rant section or the video section or the contest, but no URL of any of these sections exists for those people to email, Digg, Twitter, Stumble, or otherwise share. A viral campaign that requires every person who shares the content to say, “go to this URL, then click ‘join the movement’, then click ‘how will you take back the tour’ is over before it even begins.

And what about accessibility? And those on the go? I watched the first night of the tour at a friend’s house. What if I had seen the commercial, wanted to check it out, and pulled up the site on my Windows Mobile Smartphone? I would have had this awesome experience:

nojavascript

It’s not even an accurate error message, since the first problem is that I don’t have JavaScript support.

Be smart about Flash
Clearly, a few problems still exist with Flash websites. My view is this:

  • It’s important for web technology providers to think about things like accessibility and search engine optimization or those who implement those technologies will turn to other solutions. To this end, Adobe should be commended for continuing to evolve their offerings to better serve the needs of their users.
  • Search engines have to continue to evolve beyond HTML as their primary goal is to provide the best possible results for searchers. They can’t rely on site owners across the web understanding what technologies are better for search. Google is clearly working on “organizing all the world’s information”, not just all the information well optimized for search engines, and this latest Flash development is an important part of that evolution.
  • If you operate a business online, search is an important acquisition channel. Don’t leave such an important avenue for gaining new customers in the hands of others. Ensure that you are making it as easy as possible for search engines to find your content.
  • Flash may very well be a great technology for your site, but implement it wisely.
Share and Enjoy:
  • Technorati
  • LinkedIn
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • Live

Tags: , , ,

Leave a Reply