Dr Michael Daly wrote on 3/15/12 8:00 AM: > for _docs/test3, I only see the two .doc files > > I deleted some .html files in the _docs dir, and now I get a different output > (it goes on & on, attempting to index .xls files in /_doc): > > swish-e -S prog -c /share/MD0_DATA/swish-e-files/swish-e-conf/web_1.conf > Indexing Data Source: "External-Program" > Indexing "spider.pl" > External Program found: /opt/lib/swish-e/spider.pl > Missing argument in sprintf at /opt/lib/swish-e/spider.pl line 38. > Missing argument in sprintf at /opt/lib/swish-e/spider.pl line 38. > /opt/lib/swish-e/spider.pl: Reading parameters from 'default' > > Summary for: http://localhost:104/_docs/test3/Reception-duties.doc > Connection: Close: 1 (1.0/sec) > Total Bytes: 1,217 (1217.0/sec) > Total Docs: 1 (1.0/sec) > Unique URLs: 1 (1.0/sec) > application/msword->text/plain: 1 (1.0/sec) > Warning: document 'http://localhost:104/_docs/test3/' could not be encoded > to charset 'ISO-8859-1' > Warning: document 'http://localhost:104/_docs/' could not be encoded to > charset 'ISO-8859-1' > Warning: document 'http://localhost:104/' could not be encoded to charset > 'ISO-8859-1' > http://localhost:104/_docs/2008%20CASH%20FLOW%20ESTIMATES.xls:317: error: > Unexpected end tag : table > </table> > ^ > http://localhost:104/_docs/2008%20CASH%20FLOW%20ESTIMATES.xls:318: error: > Unexpected end tag : table > </table> > ^ > Warning: document 'http://localhost:104/_docs/21st_aug/' could not be > encoded to charset 'ISO-8859-1' > http://localhost:104/_docs/%20sims%20st.xls:396: error: Unexpected end tag > : table > </table> > ^ > http://localhost:104/_docs/%20thomas%20st.xls:191: error: Unexpected end > tag : table > </table> > ^ > http://localhost:104/_docs/%20thomas%20st.xls:192: error: Unexpected end > tag : table > </table> > ^ > Syntax Error: Couldn't read xref table > Syntax Warning: PDF file is damaged - attempting to reconstruct xref table... > http://localhost:104/_docs/Book1.xls:14648: error: Unexpected end tag : table > </table> > > > What is wrong? why is your .xls being indexed as .pdf? What are the contents of /share/MD0_DATA/swish-e-files/swish-e-conf/web_1.conf ? again, break this down to a single URL to isolate your problem. Try turning on the spider debug options too: http://swish-e.org/docs/spider.html#debug -- Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com _______________________________________________ Users mailing list Users(at)not-real.lists.swish-e.org http://lists.swish-e.org/listinfo/usersReceived on Fri Mar 16 2012 - 01:16:19 GMT