PostHeaderIcon c++ with tidy on no UTF-8 html source

Here i just talk about some setting for tidy:

	TidyDoc tdoc = tidyCreate();                     // Initialize "document"

	ok = tidyOptSetBool( tdoc, TidyXhtmlOut, no );  // Convert to XHTML
	if ( ok )
		rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
	if (rc>=0) {
		rc = tidyOptSetInt(tdoc, TidyOutCharEncoding, 0);
	}
	if ( rc >= 0 )
		rc = tidyParseString( tdoc, ainput );           // Parse the input
	if ( rc >= 0 )
		rc = tidyCleanAndRepair( tdoc );               // Tidy it up!
	if ( rc >= 0 )
		rc = tidyRunDiagnostics( tdoc );               // Kvetch
	if ( rc > 1 )                                    // If error, force output.
		rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
	if ( rc >= 0 )
		rc = tidySaveBuffer( tdoc, &output );          // Pretty Print

	if ( rc >= 0 )
		result=output.bp;
	else
		result=input;

	tidyBufFree( &output );
	tidyBufFree( &errbuf );
	tidyRelease( tdoc );

the important point is the TidyOutCharEncoding.

the flow code is the init configure of it:

#define RAW         0
#define ASCII       1
#define LATIN0      2
#define LATIN1      3
#define UTF8        4
#define ISO2022     5
#define MACROMAN    6
#define WIN1252     7
#define IBM858      8

when your process html code is not utf-8, and the output is messy code, then maybe you can set the TidyOutCharEncoding to RAW or you can set it with UTF8.
Finally, I put two function for with tidy(my process html is gb2312, so I set the TidyOutCharEncoding with RAW ).
a. CString to CString in UNICODE:

CString CleanHtml(CString input)
{
	CString result;
	CStringA ainput(input);
	TidyBuffer output = {0};
	TidyBuffer errbuf = {0};
	int rc = -1;
	Bool ok;

	TidyDoc tdoc = tidyCreate();                     // Initialize "document"

	ok = tidyOptSetBool( tdoc, TidyXhtmlOut, no );  // Convert to XHTML
	if ( ok )
		rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
	if (rc>=0) {
		rc = tidyOptSetInt(tdoc, TidyOutCharEncoding, 0);
	}
	if ( rc >= 0 )
		rc = tidyParseString( tdoc, ainput );           // Parse the input
	if ( rc >= 0 )
		rc = tidyCleanAndRepair( tdoc );               // Tidy it up!
	if ( rc >= 0 )
		rc = tidyRunDiagnostics( tdoc );               // Kvetch
	if ( rc > 1 )                                    // If error, force output.
		rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
	if ( rc >= 0 )
		rc = tidySaveBuffer( tdoc, &output );          // Pretty Print

	if ( rc >= 0 )
		result=output.bp;
	else
		result=input;

	tidyBufFree( &output );
	tidyBufFree( &errbuf );
	tidyRelease( tdoc );
	return result;
}

b. const char* to CString in UNICODE:

CString CleanHtml(const char* input)
{
	CString result;
	TidyBuffer output = {0};
	TidyBuffer errbuf = {0};
	int rc = -1;
	Bool ok;

	TidyDoc tdoc = tidyCreate();                     // Initialize "document"

	ok = tidyOptSetBool( tdoc, TidyXhtmlOut, no );  // Convert to XHTML
	if ( ok )
		rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
	/*if (rc>=0) {
		rc = tidyOptSetInt(tdoc, TidyOutCharEncoding, 0);
	}*/
	if ( rc >= 0 )
		rc = tidyParseString( tdoc, input );           // Parse the input
	if ( rc >= 0 )
		rc = tidyCleanAndRepair( tdoc );               // Tidy it up!
	if ( rc >= 0 )
		rc = tidyRunDiagnostics( tdoc );               // Kvetch
	if ( rc > 1 )                                    // If error, force output.
		rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
	if ( rc >= 0 )
		rc = tidySaveBuffer( tdoc, &output );          // Pretty Print

	if ( rc >= 0 )
		result=output.bp;
	else {
		CString orign(input);
		result=orign;
	}

	tidyBufFree( &output );
	tidyBufFree( &errbuf );
	tidyRelease( tdoc );
	return result;
}
12,236 views

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Copyright © 2010 - C++ Technology. All Rights Reserved.

Powered by Jerry | Free Space Provided by connove.com