Skip to main content.
home | support | download

Back to List Archive

Re: Indexing Links

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Apr 27 2006 - 13:14:55 GMT
assuming you are using the spider.pl script and swish-e -S prog, then 
the href link should be followed.

however, there is no support for omitting the <a> text from the description.

intervolved none scribbled on 4/27/06 7:43 AM:
> Thank you for the response.
>    
>   Actually I do not care if the text within the the <a> tag is indexed.  I just want the links followed and indexed.   I do not want the text in the <a> tags stored in my description property.   
>    
>   example:
>   -----------------
>   ...
>   <a href="page.html">testing</a>   <- the page page.html should be indexed but the text not included in the description of the current page
>   this is the body of the text.  <- this should be the description of the page
>   ...
> -------------------------------
> Peter Karman <peter@peknet.com> wrote: 
>   If I am understanding you correctly, you want the text within the 
> tagset to be indexed but not stored in the description Property. I don't 
> believe there is a config option to allow that. The properties simply 
> suck up all the characters they find, optionally converting entities, 
> and ignoring tags.
> 
> intervolved none scribbled on 4/26/06 11:29 AM:
>> I have noticed on a lot of my pages that get indexed that the
>> description displayed is from the href tags and not from the actual
>> body of the content. Is there anyway to fix this? I want the links
>> to be indexed but I do not want the text to be included in the
>> description of the page.
>>
>>
>>
>>
>> Config :
>>
>> MaxDepth 0 Delay 0 Metanames keywords MetaNamesRank 10 keywords 
>> IndexContents HTML2 .htm .html .shtml .jsp IndexContents TXT .pdf
>> .doc DefaultContents HTML2 StoreDescription HTML2 200 
>> StoreDescription TXT 200 PropertyNameAlias swishdescription
>> description obeyRobotsNoIndex yes
>>
>> HTMLLinksMetaName links IndexDir http://testserver/testpage.html
>>
>>
>>
>>
>> d:>\swish-e.exe -f "d:\testing\indexes\temp.index" -wdirectives -p
>> swishdescription -d :: # SWISH format: 2.4.2 # Search words:
>> directives # Removed stopwords: # Number of hits: 1 # Search time:
>> 0.000 seconds # Run time: 0.015 seconds 
>> 1000::http://testserver/testpage.html::My Title::932::one two three
>> one two three one two three. four five six. seven eight nine ten,
>> uno dos tres quatro Advance Directives and Organ Donation
>> Page body text example
>>
>> The description is : one two three one two three one two three. four
>> five six. seven eight nine ten, uno dos tres quatro Advance
>> Directives and Organ Donation Page body text example
>> . Not : Advance Directives and Organ Donation Page body
>> text example
>>
>> .
>>
>> Html Page that is indexed:
>>
>>       > valign="top">> src="/images/nav/navStd.gif" class="vimg"
>> border="0"> 
>>   > target="">one two three one two three one two three. four five six.
>> seven eight nine ten, uno dos tres quatro
>>
>>   
>>
>>
>>
>>
>> Advance Directives and Organ Donation   Page body text
>> example     
>    test page line 1   
>    test page line 2 
> body test line 2 more info...   
> 
>>
> 
>>
>> --------------------------------- Love cheap thrills? Enjoy
>> PC-to-Phone calls to 30+ countries for just 2�/min with Yahoo!
>> Messenger with Voice.
>>
>>
>> *********************************************************************
>> Due to deletion of content types excluded from this list by policy, 
>> this multipart message was reduced to a single part, and from there 
>> to a plain text message. 
>> *********************************************************************
>>
>>
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Thu Apr 27 06:14:56 2006