]>
No source; created in m-r form. 18 Nov 91 : CMSMcQ : Made file during meeting from skeleton
Minutes of the Steering Committee Meeting, Bergen C. M. Sperberg-McQueen 18 November 1991 Apologies for Absence

Present were:

Absent were: NI, RA. Minutes of the Last Meeting

Accepted. Review of Myrdal Meeting Outstanding Problems

The outstanding question is what to do about feature structures. Gary Simons (GS)'s suggestions were quite clear, but very substantial for something presented new on the last day. TL noted that he had long been rethinking the feature structure notation but had not presented his suggestions in writing. GS had persuaded TL that some of the details of his thinking were wrong, so the changes now on the table are in fact not very radical.

Should the atomic content become an attribute value? If so, should it be on an ATOMIC tag or on the parent FEATURE tag? This is motivated in part by a comment from Mark Vilain, who objected to atomics' having data content.

The argument for renaming F.STRUCT as RECORD and FEATURE as FIELD is to make the names less intimidating to non-linguists, without affecting linguists' ability to use the structures. LB observed that not everyone liked the new names; MSM suggested F and FG. The group discussed the possible addition of a feature structure type declaration. TL presented an example of the cross-reference mechanism for feature structures; noun singular noun ... ]]> SH reviewed the subcommittees which had worked in Vatnahalsen. Character set proposals were mostly uncontroversial. Text chunk committee proposals achieved near consensus; MSM and LB were not sure that it would actually handle Shakespeare's casual shifts from verse to prose. The committee should be left to prepare their proposal, and asked for some examples. Inheritance and grouping committee. PR did not want to recommend a particular mechanism because of the lack of software; this suggests a problem of presentation: should the application of general tools to specific areas be described under the general tools or under the specific areas? MSM suggested that description under the specific areas would be better. MSM expressed serious skepticism about the idea of borrowing the GRP tag to handle text-critical problems; he did not think any content model could be written to behave as required. Group 5, situational parameters, achieved consensus and its work is in a good state. SJ pointed out that place, time, and channel may all depend upon individual participants: telephone calls, for example, do not take place in a single place. The situational parameters also do not, in speech, characterize the whole text, but parts of it. (LB observed that this could also be true in written texts.) SJ recalled the original distinction between quotation and direct speech; this became quotation in P1. SJ would prefer quotation to be used only where there is real quotation. LB noted that the same question had arisen in the literary work group. This is not to be resolved immediately, but it would be nice to solve it. NB this is still outstanding. The uncertainty group did not impress everyone as having solved the problem, but MSM argued that SJ suggested the name INDISTINCT for illegible/inaudible passages, and PRECISION rather than certainty for some current applications and in general for measurements. Group 7 on pointers and alignment maps did not completely finish; TL wants to sit down with MSM and SJ. Plan for Resolving them

Some things that need change are simply things that need cleaning up; some are directions for long-term further work. What to do with P2; we had expected no serious changes between P2 and P3, but with the volume of new material now in the works for P2, we will definitely need to perform a technical review of P2. LB urged that it would be dangerous to drift into an unintended policy of making serious technical changes between P2 and P3; MSM agreed, but suggested that the technical review would be necessary anyway, to identify problem and distinguish those which can be fixed with minor changes, those which represent gaps to fill with further work, and those which represent serious flaws (and would result in removing material between P2 and P3). SH proposed, funding permitting, to hold a review meeting in the spring, with working papers evaluating sections of P2. The meeting would produce lists of corrections to make in P3, and lists of areas for further long-term work. Possible methods of review: review meeting as per SH do nothing: mail P2 out and hope select specific reviewers and ask for reports (how motivate?) hold meeting/conference to plan future (Poughkeepsie II) inform public to comment before the specific meeting do something with affiliated projects (SJ) mail to specified reviewers with a request to look at very specific points, e.g. Wilhelm Ott with a request to look at character sets -- people who can be expected to do something and may feel some obligation to do something. Then invite people to a meeting depending on how people have responded. get advisory board more deeply involved in soliciting technical review by their people. If such a meeting is to be held, it would be 23-24 March 1992. Probably Chicago (possibly New Jersey as backup). Undecided whom to invite, what sort of people. Can we do anything about the work groups which have not produced work? Mss. still plans to draft. Paul Ellison claims to be writing sections on mathematics and on tables. Character sets, done if they produce text Text criticism has a draft Hypermedia Formulas Corpora -- will appear mostly in header chapter Printed books -- will not appear Literature -- will do joint draft Linguistics -- will include morphological starter set as FSD Spoken texts -- will have draft History -- will do names, dates Dictionary -- Nicoletta will do it Lexicon -- Ingria says he will draft Terminology -- will have a draft Tutorials and Casebook

SH asked SJ, TL what topics should be covered by specific tutorials: spoken texts morphological analysis trees (from Penn Tree Bank) historians literary texts (needs more thought) terminology dictionary / lexicographers text critics metrical analysis? (David Chisolm) For the case book, editors should list material we might get from affiliated projects and circulate it. Plan of work

SH noted the plan for the TEI to continue as an umbrella organization in the standardization area. Matters arising from previous meeting not dealt with elsewhere TR9 American representation

MSM spoke to Elizabeth Brown of AHA, who has a number of suggestions for American participation. This area will need to be reconsidered, along with printed books; no action to be taken now. AI2 document to D. Gibbon

LB sent a list of addresses to Wendy Plotkin asking that AI2 W1 be sent to them; MSM believes that WP has sent the paper to all of them. MacWhinney

LB reported that his most recent contact with MacWhinney has been quite cordial; no action is needed. AI3 summary and survey of contents

SH has recently been asked what the status is of Rosanne Potter's survey of literary scholars; RP seems to be collecting responses, but tabulating them is not at the top of the list. The MLA does not seem very organized with regard to TEI participation; it's not clear what is to be done. SH actioned to contact relevant people in MLA and sort out the problems, to wit: how to channel information within the MLA. MSM noted that the MLA Newsletter has run pieces on the TEI two or three times, so that it is not an issue of informing the membership at large; the problem appears to be that relevant committees and projects within the MLA have not kept themselves informed and we have not kept them informed. SH will write and send a reply. The summary of the AI3 comments on P1 has not gone further. SH will continue to ask for an agreement that the summary is a fair paraphrase of the comments; MSM will ask WP to cross-index the summary with the original. SH will draft reply to the critique; it is important that we be seen to have answered it point by point. Review of Budget and remaining expenditure

A review of the accounts showed the following amounts remaining disposable (not spent and not committed) in the following accounts: $136,000 in NEH cycle 2 $ 60,000 in Mellon funds $106,000 (85,000 ECU) in EEC funds for 91-92 (when the contract is signed) $302,000 total If we assume the Myrdal meeting will cost $50,000, we have about $250,000 left. $40,000 for the advisory board meeting leaves about $210,000. Were the EEC contract to fail, we would have $196,000 on hand. Our commitments then would include: ca. $20,000 already lent to the EEC account from Mellon ca. $25,000 for Oxford salary (committed in writing) ca. $20,000 for Oslo ca. $50,000 for the Myrdal meeting ca. $35,000 for the Advisory Board meeting In this case, we would have about $45,000 disposable. That would be enough to fund another review meeting for P2. It was agreed to fund TR9's manuscript meeting in February if the EEC contract is funded by then. SH will so inform JH. The point of the meeting is to review JH's and CH's draft(s), refine the suggestions, and identify areas needing yet further work. It was explicitly understood that this meeting's results may not -- probably will not -- be reflected in P3. A review meeting (23-24 March) would cost ca. $17,000 with 15 people. MSM suggested some further meeting with the affiliated projects might be feasible. LB noted that Gunnel KaEllgren and Rich Giordano have asked for funding for sending Rich to Stockholm for a couple days to work on the TEI encoding of the Stockholm/Umeå corpus. Agreed to fund up to 300 pounds. MSM reported a suggestion from Dan Greenstein that a meeting of four or five people in Glasgow could readily make serious progress on the issues of historical encoding. Agreed to fund this if the EEC funding comes through. Try to hold this in the first six weeks of 1992; produce a case study and some tutorial material? General rule agreed that if the EEC contract comes through, the editors jointly can authorize ad hoc travel up to $500. Editors' report on other activities

LB reported on the Bibliothe``que de France's work group meeting on the Projet de Lecteurs Assiste''s par Ordinator. This was a work group / seminar for the exchange of ideas, much of which was very interesting. About 60,000 works are to be digitized either into bit-mapped images or (further) into encoded texts. Franc,,ois Chahuneau was asked and recommended they invite LB. SGML 91 in Providence was very successful; a substantial trip report is in progress. SH asked whether we are operating on approximately the same plane as others working with SGML; LB said that on the contrary, Yuri Rubinsky had included the TEI on his list of the most important events of 1991. The total funding for the TEI and all its affiliated projects, estimated roughly, came to some tens of millions of dollars of funds on projects committed to TEI and SGML. CARG at AAR/SBL will be the end of this week. MSM will give a proselytizing speech to CARG. The editors will also confer, and are assigned to ask Bob Kraft urgently to see that SBL files a formal request to join the advisory board as soon as possible, to ensure that SBL can be represented at the May Advisory Board meeting. CATH 91 is to have a two-hour session on the TEI to be done by LB and Elaine Brennan; LB reports that EB has fallen silent and he is growing concerned. He has asked Rich Giordano to come as well, but has not specifically invited RG to help with the workshop. MLA will have a TEI session: Elaine Brennan, Malcolm Brown, and MSM will speak. The Pisa workshop may be followed by a low-key informal meeting of the editors; DW was unhappy that the list of attendees was not firm, and felt that if the workshop was not well attended it would damage the TEI's reputation. LB and MSM resisted the characterization of the meeting as a 'workshop'. SH asked about the editors' work with Chadwyck-Healey. At a presentation of the English Poetry project at Princeton, the C-H presenter laid great stress on the participation of the editors on the advisory committee and as consultants. This can have an impact on the TEI's public image, particularly when the presentation is poor. LB pointed out that any poorly presented project which advertises its relation with the TEI would have the same effect. After discussion, MSM suggested that this situation could be attacked in several ways: the editors could ask Chadwyck-Healey not to mention the TEI in their advertising the editors could sever their relations with C-H the editors could endeavor to ensure that C-H presenters gain a better grasp of the technical aspect of the encoding Future of the TEI Funding Enquiries

AZ has made progress with the EEC, and says prospects are encouraging, but nothing has been signed yet. NI has not spoken to NEH; SH is uncertain whether we should count on NEH funds, since they have already been so generous. DW has spoken with NN about getting money from the Linguistic Data Consortium, and to DARPA -- but is rapidly approaching the point at which a firm formal proposal must be made.

DW spoke with Syun Tutiya in Myrdal; he will know in January whether a proposal for $100,000 from the Ministry of Education will be available in April, mainly to cover travel and conferences. It was not clear whether this was for Japanese travel to meetings elsewhere or for internal Japanese meetings.

In February, a proposal is due with the Education Ministry for 500 M yen ($3,000,000) over three years under a program for international joint research. This program is in response to the NSF/DARPA stimulation. Like DARPA and ESPRIT this will cover many things, not just TEI; it represents a framework within which TEI support may be gained. In addition, Toshio Yokoi (head of Electronic Dictionary Research project) is preparing a MITI proposal on large-scale knowledge-bases using text as the storage format (as opposed to CYC, which they had found disappointing). Yokoi's funds have supported all the cooperative travel so far. Members of the TEI Japan Committee include Nagao, Yokoi, Syun Tutiya and _. Kameda, as well as about 25 others. The Japanese seem very serious, judging by the amount of work they have done so far. Yokoi has explicitly envisaged the possibility of contributing financially to the funding of TEI core activities. The EEC call for tenders for the Linguistic Research and Engineering Framework program specifically mentions the TEI; the TEI does not intend, however, to seek funds specifically under this line. SH and LB observed that proposals under this line must come from partners in different European countries, ideally with an industrial partner. MSM asked whether Pisa and Oxford could not apply under this line as partners to coordinate the European participation in the TEI. This was felt not to be possible. SH felt very strongly that the TEI must formulate a coherent plan of work and decide where to seek money for different activities. Proposals to individual funding agencies should be made in the larger context, and not ad hoc, as our previous proposals have been. MSM summarized the discussion thus: we will need long term funding for ongoing work, and believe we need a coherent central plan for that. A coherent central plan will not be possible before mid-1992, leading to funding in 1993. The consequence is that right now we are looking for short-term bridge funding to get us through 1992-93. If the EEC funds come through and we can extend the NEH funding, we can possibly make it through that period even if we don't get further U.S. funding. SH and DW assigned to find out about deadlines and form of proposal for a U.S. contribution to the interim funding (DARPA, NSF). SH will also continue looking through the foundation indices for leads. To clear up MSM's confusion about the possible U.S. funding sources, DW reviewed the players: Consortium for Lexical Research, NMSU, a funding sink not source Linguistic Data Consortium, part of a program initiated by Congress and funded by DARPA, to support development of industrially important technologies; LDC is one of several consortia thus created, with the primary purpose of encouraging creation and processing of massive bodies of data in speech, text, and lexicon; grammar collections somewhat less critical. LDC will not support research, though the data collected will support the aims of the DARPA research program. Mark Liberman heads committee to set up the structure. National Science Foundation If Terry Langendoen is willing to continue, we should put in for his salary during this bridge period. DW felt that dissemination, education, and evaluation would be fundable activities; also coding oversight (data validation). Program of Work

SH listed five specific areas needing further work: linguistics speech literature historians physical description of copy text

MSM suggested a number of possible activities for the TEI to be involved in, in the long run. As revised by the committee, these included: review of proposals and regular publication of updates formation of work groups and production of proposals dissemination of copies and publication program, including workshops work with affiliated projects / exemplary applications; this is a continuation of AP work validation of TEI conformance of data (possibly eventually as a publicly available service. software specification and evaluation software development LB thought that software evaluation should be very high on the list. Information about what is available, what it does and what can be done with it, and how to obtain it. This would need to be tied to work in software specification, along the lines of the test bed / test suite described in the Oxford meeting of ML. LB is also becoming less hostile to software development, especially in the areas where the TEI is most innovative: manipulation of f.structs, extraction of f.struct data for loading in databases, graphic display of overlap from timelines, etc. It was agreed that software-related activities were not ipso facto inappropriate for the TEI, though we will need to be careful about relationship with industrial vendors. DW noted that other kinds of ideas for long term activities might arise from the functional analysis needed for the program of work. For the interim period (1992-93), we could see these activities taking the following specific forms: The new work (item 2) could include the four activities already noted. LB noted that in all of these areas, JH's model of proposals presented to gradually wider audiences could be used in these, if it works in MSS. Publication program: user training, workshops; tutorials, case books, collections of working papers. Work with individual projects; DW argued that this should be linked to the development of data validation methods and measures; LB objected to continuing the current notion of the affiliated projects. It was agreed that the current affiliated projects should be discharged at the end of this cycle; any work with projects in the future could be done on a different basis. The long-term basis of such collaborative work must be carefully worked out; we need to monitor who is using the Guidelines and checking to see whether they are running into problems. Intractable problems found by the projects need to be submitted to the review group. SH suggested that for the interim period, the editors or other consultants should simply be allowed to request funds for specific consultation work. Review group: SH will produce a plan in a paper for the next SC meeting. Software will not be a priority for the interim period.

SH asked whether we want to operate with the same structure and personnel, or change. The SC will continue; should the editors continue, with their support? Should support to TL and SJ continue? Should support be added for the other areas of new work? MSM expressed his willingness and desire to continue as editor in chief at least during the interim period, pending approval from his superiors. LB asked for a half-time person for TEI-related clerical work. If this is possible, and a better compensation can be arranged for OUCS, he is willing to continue to serve during the interim period. He would also like to discuss his position vis a vis the steering committee and the constitution of the steering committee; this was postponed so it could be discussed when the entire steering committee was present. Publication of P3

This topic was postponed from earlier meetings. DW urged that the question facing us was not merely a choice of commercial publisher, but the decision whether to use a commercial publisher at all. commercial publication -- over their imprint commercial publication -- published by X on behalf of commercial publication -- over our imprint we handle distribution ourselves (as with P1) -- for free or for charge we provide electronic distribution from a fileserver SH provided the following overview of the advantages and disadvantages. Commercial publication: provides a seal of approval. (DW noted that P3 will already have been reviewed extensively and a commercial publisher would not provide an imprimatur.) They will also handle production, design, distribution, and reviews. Drawback is that it will cost more and may be delayed in appearance. Publishers may also wish to constrain our other activities with the text. Distribution on our own part will make ongoing work for us, require handling of money, and will not necessarily penetrate to unexpected areas. Electronic distribution will require special negotiation with any print publisher and may make it harder to do print publication. LB suggested that the crucial parts for electronic distribution were the DTDs and the examples. On the basis of 8 years publishing Computational Linguistics as a contractor working with subcontractors, DW argued that self-publication was a more viable option. A chief advantage is in the price flexibility it gives us. There is very little delay in publication; recognizing that we have already had a very extensive review process. MSM observed that commercial publication would have the advantage of getting notice of the publication to the library community. DW argued that through the advisory board we will have better access to individuals, and that the inferior access to the institutional market does not matter. The topic cannot be decided at this meeting; what is essential is that a clear summary of the arguments be provided to all members of the steering committee. DW pointed out also that loose-leaf publication and revisions will be harder for a commmercial publisher. LB pointed out that one partial solution to this would be to divide the object. Date of Next Meeting

The date for this was fixed at the previous meeting: January 30-31; in Pisa or alternatively in New Jersey. Any Other Business

SH suggested that in addition to summarizing the contents of P3, the TEI session at ALLC should also give a presentation from David Robey, Tom Corns, and Elli Mylonas, to bring the TEI to the literary community. LB objected to focusing too much on any one part of the Guidelines.

DW asked whether anyone other than the editors needs to be brought in to be responsible for the case book or the tutorials.

MSM reported that he had received a request to perform consulting work on an academic project; the committee had no objection and left the decision to MSM.

LB reported that he is acting as a technical consultant for Oxford Electronic Publishing's library of electronic text. This is done on his own time.

LB reminded the committee that the historians asked us to consider publishing their book.