/* * Copyright © 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, * California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has * intellectual property rights relating to technology embodied in the product * that is described in this document. In particular, and without limitation, * these intellectual property rights may include one or more of the U.S. * patents listed at http://www.sun.com/patents and one or more additional * patents or pending patent applications in the U.S. and in other countries. * U.S. Government Rights - Commercial software. Government users are subject * to the Sun Microsystems, Inc. standard license agreement and applicable * provisions of the FAR and its supplements. Use is subject to license terms. * Sun, Sun Microsystems, the Sun logo and Java are trademarks or registered * trademarks of Sun Microsystems, Inc. in the U.S. and other countries. This * product is covered and controlled by U.S. Export Control laws and may be * subject to the export or import laws in other countries. Nuclear, missile, * chemical biological weapons or nuclear maritime end uses or end users, * whether direct or indirect, are strictly prohibited. Export or reexport * to countries subject to U.S. embargo or to entities identified on U.S. * export exclusion lists, including, but not limited to, the denied persons * and specially designated nationals lists is strictly prohibited. */ BABYL OPTIONS: Version: 5 Labels: Note: This is the header of an rmail file. Note: If you are seeing it in rmail, Note: it means the file has no messages in it.  1, filed,, Summary-line: 11-Jan sreeni@cs.albany.edu #A note on using RE's matching the emty string Return-Path: Received: from Eng.Sun.COM by schizophrenia.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id HAA21134; Sat, 11 Jan 1997 07:47:28 -0800 Received: from sunmail1.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id HAA02652; Sat, 11 Jan 1997 07:44:26 -0800 Received: from Eng.Sun.COM by sunmail1.Sun.COM (SMI-8.6/SMI-4.1) id HAA06974; Sat, 11 Jan 1997 07:44:26 -0800 Received: from suntest.Eng.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id HAA02640; Sat, 11 Jan 1997 07:44:24 -0800 Received: from asap.Eng.Sun.COM by suntest.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id HAA21295; Sat, 11 Jan 1997 07:44:18 -0800 Received: from Eng.Sun.COM by asap.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id HAA24684; Sat, 11 Jan 1997 07:44:19 -0800 Received: from mercury.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id HAA02630; Sat, 11 Jan 1997 07:44:17 -0800 Received: from cs.albany.edu (cs.albany.edu [169.226.2.22]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id HAA01828 for ; Sat, 11 Jan 1997 07:44:18 -0800 Received: from bhaskara.cs.albany.edu (sreeni@bhaskara.cs.albany.edu [169.226.2.60]) by cs.albany.edu (8.7.4/HUB03) with ESMTP id KAA01464; Sat, 11 Jan 1997 10:44:06 -0500 (EST) Received: (from sreeni@localhost) by bhaskara.cs.albany.edu (8.7.4/CLI2) id KAA09608; Sat, 11 Jan 1997 10:43:58 -0500 (EST) From: Sreenivasa Rao Viswanadha Date: Sat, 11 Jan 1997 10:43:58 -0500 (EST) Message-Id: <199701111543.KAA09608@bhaskara.cs.albany.edu> To: jack-interest@asap.Eng.Sun.COM Subject: A note on using RE's matching the emty string Cc: sreeni@cs.albany.edu X-Sun-Charset: US-ASCII Content-Type: text Content-Length: 1639 X-Lines: 32 Status: RO *** EOOH *** Return-Path: From: Sreenivasa Rao Viswanadha Date: Sat, 11 Jan 1997 10:43:58 -0500 (EST) To: jack-interest@asap.Eng.Sun.COM Subject: A note on using RE's matching the emty string Cc: sreeni@cs.albany.edu X-Sun-Charset: US-ASCII Content-Type: text Content-Length: 1639 X-Lines: 32 In the last couple of days, we had seen a couple of users facing problems with regular expressions that match "". There is a minor bug in the way it is implemented in 0.6.-9. We will fix it. But the purpose of this mail is to suggest you should be careful when you use RE's that match the "" string. Consider the following example of string literals where two consecutive "" are interpreted as the literal " (equivalent to \" in Java). < STRING_LITERAL: ( "\"" (~["\""])* "\"" )* > This will work in general. But, if this a part of a lot of other lexical rules, then if there a lexical error, say a char is given that cannot be the first one of any token, then, the lexer decides to use the empty string "" and match it as STRING_LITERAL without actually giving the lexical error. And since this is the empty string, no character will be consumed and you will start getting the same STRING_LITERAL token (with "" as the image) infinte number of times. In fact, if this was the only lexical rule, then if you give a input that starts with any char other than the ", you will get into an infinite loop. So a better alternative is to use the + operator which will not match the empty string. As a matter of fact, I don't know any practical grammar where matching "" is useful. In version 0.5, the lexer generated implicitly treated it as + (which is not totally right). But in 0.6.-9, it does it right and so there is a chance that your grammar that used to work with 0.5 will not work with 0.6.-9. So if you have any top-level lexical rule with ? or *, please change those rules so that they don't match the empty string "". Sreeni.  1,, Summary-line: 11-Jan kimbo@highway1.com #Re: Looking for HTML.jack Return-Path: Received: from Eng.Sun.COM by schizophrenia.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id QAA21339; Sat, 11 Jan 1997 16:41:53 -0800 Received: from sunmail1.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id QAA18458; Sat, 11 Jan 1997 16:38:44 -0800 Received: from Eng.Sun.COM by sunmail1.Sun.COM (SMI-8.6/SMI-4.1) id QAA16642; Sat, 11 Jan 1997 16:38:51 -0800 Received: from suntest.Eng.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id QAA18449; Sat, 11 Jan 1997 16:38:42 -0800 Received: from asap.Eng.Sun.COM by suntest.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id QAA23127; Sat, 11 Jan 1997 16:38:42 -0800 Received: from Eng.Sun.COM by asap.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id QAA24956; Sat, 11 Jan 1997 16:38:41 -0800 Received: from mercury.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id QAA18438; Sat, 11 Jan 1997 16:38:36 -0800 Received: from chmls01.highway1.com (ne.highway1.com [24.128.1.82]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id QAA22125 for ; Sat, 11 Jan 1997 16:38:42 -0800 Received: from papa ([24.128.36.164]) by chmls01.highway1.com (Netscape Mail Server v2.0) with SMTP id AAA17669; Sat, 11 Jan 1997 19:38:31 -0400 Message-ID: <32D83288.5A8@highway1.com> Date: Sat, 11 Jan 1997 19:38:32 -0500 From: kimbo@highway1.com (Kimbo Mundy) X-Mailer: Mozilla 3.0Gold (WinNT; U) MIME-Version: 1.0 To: Rupert Nagler CC: jack-interest@asap.Eng.Sun.COM Subject: Re: Looking for HTML.jack References: <1.5.4.32.19970111214054.0069e40c@mail.austria.eu.net> Content-Transfer-Encoding: 7bit X-Lines: 13 Status: RO Content-Type: text/plain; charset="us-ascii" Content-Length: 447 *** EOOH *** Return-Path: Date: Sat, 11 Jan 1997 19:38:32 -0500 From: kimbo@highway1.com (Kimbo Mundy) X-Mailer: Mozilla 3.0Gold (WinNT; U) MIME-Version: 1.0 To: Rupert Nagler CC: jack-interest@asap.Eng.Sun.COM Subject: Re: Looking for HTML.jack References: <1.5.4.32.19970111214054.0069e40c@mail.austria.eu.net> Content-Transfer-Encoding: 7bit X-Lines: 13 Content-Type: text/plain; charset="us-ascii" Content-Length: 447 Rupert Nagler wrote: > > I am very impressed by the Jack-Concept and I am looking for a "HTML.jack". > Is there anybody out there who has an example of a Jack-Definition file for > HTML 3.2? I previously sent a message entitled "A first cut at an HTML grammar". Did people not get it? If not, see: http://www.tiac.net/users/kimbo/jack/HTML.jack > Is there a way to construct a *.jack file out of a *.sgml file? Sorry, I can't help with this.  1,, Summary-line: 13-Jan kimbo@highway1.com #Re: HTML? Return-Path: Received: from suntest.Eng.Sun.COM by schizophrenia.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id XAA21746; Sun, 12 Jan 1997 23:06:34 -0800 Received: from Eng.Sun.COM by suntest.Eng.Sun.COM (SMI-8.6/SMI-SVR4) id XAA29422; Sun, 12 Jan 1997 23:03:31 -0800 Received: from mercury.Sun.COM by Eng.Sun.COM (SMI-8.6/SMI-5.3) id XAA07269; Sun, 12 Jan 1997 23:03:30 -0800 Received: from chmls01.highway1.com (ne.highway1.com [24.128.1.82]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA18021 for ; Sun, 12 Jan 1997 23:03:30 -0800 Received: from papa ([24.128.36.164]) by chmls01.highway1.com (Netscape Mail Server v2.0) with SMTP id AAA20062 for ; Mon, 13 Jan 1997 02:03:25 -0400 Message-ID: <32D9DE40.3604@highway1.com> Date: Mon, 13 Jan 1997 02:03:28 -0500 From: kimbo@highway1.com (Kimbo Mundy) X-Mailer: Mozilla 3.0Gold (WinNT; U) MIME-Version: 1.0 To: Sriram Sankar Subject: Re: HTML? References: <199612162037.MAA29649@schizophrenia.Eng.Sun.COM> Content-Transfer-Encoding: 7bit X-Lines: 34 Status: RO Content-Type: text/plain; charset="us-ascii" Content-Length: 1420 *** EOOH *** Return-Path: Date: Mon, 13 Jan 1997 02:03:28 -0500 From: kimbo@highway1.com (Kimbo Mundy) X-Mailer: Mozilla 3.0Gold (WinNT; U) MIME-Version: 1.0 To: Sriram Sankar Subject: Re: HTML? References: <199612162037.MAA29649@schizophrenia.Eng.Sun.COM> Content-Transfer-Encoding: 7bit X-Lines: 34 Content-Type: text/plain; charset="us-ascii" Content-Length: 1420 Well, I finally got an HTML grammar out there (at http://www.tiac.net/users/kimbo/jack/HTML.jack). I hope you saw it, I got some mailer errors, that seemed like the kind that you could ignore, but at least one person didn't receive my first posting. I'd be interested to know if this is the sort of thing people are looking for, or do they want the full set of tags enumerated in the grammar as well? Also, if there you have any desire to bundle this with Jack (possibly after upgrades and/or integration with other people's work), please feel free. I must say Jack is an amazing tool. It was really easy to learn. I love how readable the grammars are, and I love being able to pass info up and down the productions as the parser runs. I never want to have to settle for LALR(1) again! Thanks for writing it!