Sunday, 8 September 2013

HTML parsing using JSOUP and combine all data into one with out repeats. Is that possible?

HTML parsing using JSOUP and combine all data into one with out repeats.
Is that possible?

Below is my program and which is working.
public static void main(String[] args) {
String html = "<p>I am making the letter <span
style=\"font-weight:bold\">BOLD</span></p>";
Document document = Jsoup.parse(html);
Elements textNodes = document.select("p");
for (Element element : textNodes) {
System.out.println("Data in P : " + element.text());
for (Element span : element.select("span")) {
System.out.println("Data In Span : " + span.text());
String att = span.attr("style");
int a = 1;
StringTokenizer st2 = new StringTokenizer(att, ":");
while (st2.hasMoreTokens()) {
String att2 = st2.nextToken();
if (a == 1) {
a = 2;
continue;
} else {
System.out.println("Attribute : " + att2);
a = 1;
}
}
}
}
}
the out put is :
Data in P : I am making the letter BOLD
Data In Span : BOLD
Attribute : bold
Actual html page will look like this :
I am making the letter BOLD
I need to reproduce same HTML like out put when I run program.
I need output as "I am making the letter BOLD"
"BOLD" in above sentence should print from "Data In Span";
So that I could compare if (attribute of Data In Span), is bold? Then
print the word in bold.
So my out put will be :
I am making the letter BOLD
Can you help me with this?

No comments:

Post a Comment