First, if you're using Java 1.5+(?), you can use for-each loops for
more readable code.
for (final XWPFComment comment : adoc.getComments()) {
final String id = comment.getId();
final String author = comment.getAuthor();
final String text = comment.getText();
}
I don't see anything in POI right now that make extracting the
annotated text that a track changes comment refers to.
Here's the current implementation of XWPFComment:
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFComment.java?view=markup
Taking a look at the OOXML 2006 schemas wml.xsd (download from
http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(PDF).zip,
extract OfficeOpenXML-Part4a.zip, extract OfficeOpenXML-XMLSchema.zip,
open wml.xsd), I see that the comment (*.docx/word/comments.xml)
doesn't refer to the document text.
<xsd:complexType name="CT_Comment">
<xsd:complexContent>
<xsd:extension base="CT_TrackChange">
<xsd:sequence>
<xsd:group ref="EG_BlockLevelElts" minOccurs="0"
maxOccurs="unbounded"></xsd:group>
</xsd:sequence>
<xsd:attribute name="initials" type="ST_String" use="optional">
<xsd:annotation>
<xsd:documentation>Initials of Comment Author</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="CT_TrackChange">
<xsd:complexContent>
<xsd:extension base="CT_Markup">
<xsd:attribute name="author" type="ST_String" use="required">
<xsd:annotation>
<xsd:documentation>Annotation Author</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
<xsd:attribute name="date" type="ST_DateTime" use="optional">
<xsd:annotation>
<xsd:documentation>Annotation Date</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="CT_Markup">
<xsd:attribute name="id" type="ST_DecimalNumber" use="required">
<xsd:annotation>
<xsd:documentation>Annotation Identifier</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:complexType>
Examining the zipped xml contents of a simple comment example docx
file that I created, I see that the relationship is the other way
around: the document refers to the comments (this ordering makes more
sense anyways).
For a simple file that I created with the text "My name is John." and
annotated the word John with a comment with the message "Noun", here's
what I got in CommentExample.docx/word/document.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns....>
<w:body>
<!-- text paragraph: "My name is [[John]]." -->
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000"
w:rsidRDefault="00000000" w:rsidRPr="00000000">
<w:pPr>
<w:pBdr/>
<w:contextualSpacing w:val="0"/>
<w:rPr/>
</w:pPr>
<!-- text run "My name is " -->
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:rPr><w:rtl w:val="0"/></w:rPr>
<w:t xml:space="preserve">My name is </w:t>
</w:r>
<!-- comment range, text run "John" -->
<w:commentRangeStart w:id="0"/>
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:rPr><w:rtl w:val="0"/></w:rPr>
<w:t xml:space="preserve">John</w:t>
</w:r>
<w:commentRangeEnd w:id="0"/>
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:commentReference w:id="0"/>
</w:r>
<!-- text run "." -->
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:rPr><w:rtl w:val="0"/></w:rPr>
<w:t xml:space="preserve">.</w:t>
</w:r>
</w:p>
<w:sectPr>
<w:pgSz w:h="15840" w:w="12240"/>
<w:pgMar w:bottom="1440" w:top="1440" w:left="1440"
w:right="1440" w:header="0"/>
<w:pgNumType w:start="1"/>
</w:sectPr>
</w:body>
</w:document>
So to solve your problem, you could either:
1. search the document.xml for all comments, looking up the comment's
author and text using the ID that is referenced in the document
commentRangeStart-commentRangeEnd and joining all the text contained
between those markers
2. for each comment in the comment table, find the corresponding
commentRangeStart and commentRangeEnd tags in document.xml and get the
corresponding text that was annotated (in this example, John).
If you don't already have a development environment set up, I
encourage you to do so. Patches are greatly appreciated.
Post by Ramani RoutrayI have a Microsoft word (.docx) file and trying to retrieve the comments and it's associated highlighted text. Can you pls help.
Attaching picture of the sample word document and the java code for extracting the comments. [ A file with a line "My name is John". The word "John" is highlighted with a comment "Noun" ]
I am able to extract the comments (Noun, Adjective). I would like to extract the text associated with the comment "Noun" (Noun = John, Adjective = great)
FileInputStream fis = new FileInputStream(new File(msWordFilePath));
XWPFDocument adoc = new XWPFDocument(fis);
XWPFWordExtractor xwe = new XWPFWordExtractor(adoc);
XWPFComment[] comments = adoc.getComments();
for(int idx=0; idx < comments.length; idx++)
{
MSWordAnnotation annot = new MSWordAnnotation();
annot.setAnnotationName(comments[idx].getId());
annot.setAnnotationValue(comments[idx].getText());
aList.add(annot);
}
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@poi.apache.org
For additional commands, e-mail: dev-***@poi.apache.org