How do I get the web page contents from a WebView?


Question

On Android, I have a WebView that is displaying a page.

How do I get the page source without requesting the page again?

It seems WebView should have some kind of getPageSource() method that returns a string, but alas it does not.

If I enable JavaScript, what is the appropriate JavaScript to put in this call to get the contents?

webview.loadUrl("javascript:(function() { " +  
    "document.getElementsByTagName('body')[0].style.color = 'red'; " +  
    "})()");  
1
80
2/27/2014 10:57:46 AM

Accepted Answer

I know this is a late answer, but I found this question because I had the same problem. I think I found the answer in this post on lexandera.com. The code below is basically a cut-and-paste from the site. It seems to do the trick.

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @JavascriptInterface
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");
148
6/16/2014 3:27:10 PM

Per issue 12987, Blundell's answer crashes (at least on my 2.3 VM). Instead, I intercept a call to console.log with a special prefix:

// intercept calls to console.log
web.setWebChromeClient(new WebChromeClient() {
    public boolean onConsoleMessage(ConsoleMessage cmsg)
    {
        // check secret prefix
        if (cmsg.message().startsWith("MAGIC"))
        {
            String msg = cmsg.message().substring(5); // strip off prefix

            /* process HTML */

            return true;
        }

        return false;
    }
});

// inject the JavaScript on page load
web.setWebViewClient(new WebViewClient() {
    public void onPageFinished(WebView view, String address)
    {
        // have the page spill its guts, with a secret prefix
        view.loadUrl("javascript:console.log('MAGIC'+document.getElementsByTagName('html')[0].innerHTML);");
    }
});

web.loadUrl("http://www.google.com");

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon