Answers to Questions from SI Challenge Teams

Dear Solvers,

Below please find answers to the questions asked by SI Chemistry Challenge Teams during our meetings in July. The questions are organized by type. If you have any additional questions, please do not hesitate to contact us.

Scalability
- What happens if the solution has a different reaction type compared to what was asked?
  - Each team can define its own measure of scalability. The scalability by reaction type list provided by the organizers is now meant to help but it is not a complete list.
- Do we want teams to provide a scalability index for the entire pathway, or just the scalability index for each step?
  - We would like teams to provide a scalability index for each step in the pathway
- Can the organizers provide additional scoring for different reaction classes?
  - Any team can send us additional reaction types and we will get back to them. We will share that information with all teams. One team has already sent a long list and we expect to distribute our analysis next week.
- Some reactions never appear in the list of reactions but we still get new and novel reactions. Is it positive or negative in our opinion?
  - For us the question is “does it work and is it scalable”. So we don’t mind if the Team’s submit reactions that don’t appear on the list.
- What if a reaction with two or more different reaction types on the list?
  - One of the reasons that we’ve moved away from requiring the scalability by reaction list method is that it has imperfections like this. This list is provided as a guideline. One method that you could use is to average the scalability scores if the reaction falls in two classes.
Starting Materials
- How should teams estimate the cost of input materials for their pathways?
  - Teams should focus on relative cost using the best data available from public domain sources, such as Sigma Aldrich or web based vendors using the prices relevant to the largest scale available
- Are there any recommendations as to materials costs etc.
  - Cost and availability are used to determine the termination points of the tree search for backward reactions. “True” costs vary widely and there is no single source of truth, and in this stage, semifinalists can use whatever source they want. We suggest erring on the side of caution and go further back 1-3 steps if you are uncertain of the viability or cost of a starting material. This will not be penalized in the judging process. The organizers will likely have asymmetric information about the cost, and will pick which step to start the synthesis. An expanded JSON file will be distributed with the reactant metatag (Role: “StartingMaterial”) to identify the starting material you believe is viable.
- Question about the database: do you have any restrictions in terms of the database we use for feedstock?
  - We will not prescribe the database that you use for starting point availability. The general rule is in section 4.1.6; use the biggest quantity that’s available. We will end up asking what database was used.
- What are acceptable commercial sources of starting materials?
  - There is no single source of truth when it comes to pricing, especially given that cost is often a function of purity or reagent grade (ACS, USP, etc). Please use any source or grade, with the exception of "technical" or <95% purity. We may ask for your source over time.
- Can we provide a list of green solvents?
  - We choose not to provide this because we don’t want to exclude reactions that are performed in non-green solvents, because a skilled organic chemist can often adapt the solvent system
Platform Runtime
- Is runtime a critical factor?
  - This is a relatively low throughput exercise, so accuracy and rank are more important than time, but we want to make sure that the runtime can be done in the minutes to hours time frame rather than days. Runtime depends on the number of compute cycles and available processing power for the participant, so a direct comparison of time between the semifinalists will not be meaningful.
- Question about the time judging.
  - Allowed run-time could be a parameter that your platform can control (e.g. give me your best answer within 30 min.)
Confidence Estimate
- Does the confidence estimate measure the likelihood of success for the pathway based on observed results or based on literature support?
  - Each semifinalist can choose their own method of estimating the confidence estimate for each reaction step. This is intended to me a measure of how likely the reaction will work (e.g. will the reaction produce the desired products as the major products of the reaction step)?
Real-Time Demo
- Will the real time demo on Aug be scored?
  - Yes, there will be a portion of real time demo scoring.
- Will there be some time to present some interim choices?
  - The Aug. live call is the opportunity to discuss that.
- For the realtime demo: is an hour the plan?
  - Yes
Final Stage Questions
- In the final stage, will the PoC be done in a lab?
  - Our plan is to verify a few key steps in a laboratory environment.
- Some of the advanced features which I’ve submitted will not be in the prototype, but only in the second phase.
  - Yes, no problem. This is aligned with our expectations.
Reaction Metrics Questions
- How much should yield be taken into account in the ranking?
  - Yield is a very nice to have but not a required parameter since it is very difficult to estimate
- What should be included in the detailed description of the reaction conditions
  - As with yields, reaction conditions are preferred but not required since they are not included in all training sets. They may help the judges estimate the likelihood of success and scalability of a reaction. Reaction conditions could include temperature, pressure, solvents.
JSON Questions
- JSON - What can we add to the meta-data fields?
  - The semifinalists are free to add different kinds of meta-data for the reactants, products, reagents and the reaction step. Some useful attributes for molecules include name, boiling point, CAS#, unit cost. Some useful attributes for the reaction step include yield, temperature, pressure, pH. None of these are required to be included, but you can understand how they would help the judges estimate the likelihood of success of a reaction step.
- What kind of units do we expect to see? (e.g. gr, mg, etc.)?
  - We are seeking SI units. Sometimes there is a choice between types (e.g. mols vs g). Either approach is acceptable. In an update of the JSON file we will include a method to transmit information about the custom meta data and what units are used.
- JSON template vs real live presentation: in the template is there an opportunity to provide additional text?
  - In the template there’s a meta data tag, which isn’t mandatory but can be useful if you’d like to provide metadata for a given pathway.
Other Questions
- Do you consider the set of rules as part of the system? The system will contain a model which would allow extracting templates, rules etc. So are the set of rules should be fixed or do we want the system to produce rules?
  - The system can be a mixture of AI-based and rule-based approaches or a hybrid approach where the AI derives templates for the rules.
- Have you used other systems, some of them are free (mentioned several names)?
  - The organizers have tried many available approaches. In general, our observation is that they stay very close to their training data, and pay very little attention to scalability considerations in their prioritization of reaction pathways.
Legal Questions
- What are the restrictions in terms of publishing or making the platform open for the public? What kinds of things can we publish?
  - Anything in the White Paper is safe to publish.
  - Our goal is for the Challenge Winner of the entire challenge to have an ongoing relationship with SI, and there are some legal terms for the Winner which could be jeopardized if the solution is fully published. The Winner of the prize is limited to “to not publicly disclose the Winning Solution’s software code, in particular in open source software form, for five (5) years”. Because the Challenge Winner is required to not publicly disclose the winning software code, it is an implied obligation that while in the competition, Participants can not publicly disclose or publish the code.
  - In the semifinalist phase, we suggest that the teams not publish their source code until the Finalists are selected (Oct 2024).
  - For a team that advances to the next phase (finals), we suggest they further wait until a Winner is chosen (Feb 2025) before publishing their source code.
  - We also limit all Participants via the NDA to not publish (i.e., keep confidential) SI Confidential Information, including the retrosynthesis pathways generated for the challenge.
- Question about background vs. foreground IP
  - Advise you to consult with your own legal council.
- Our platform uses LLMs and tools built on them. We can utilize both commercial LLMs (e.g. chatgpt) or locally hosted LLMs. We will test both to assess them. When using chatgpt we are likely to share context over the internet - is that problematic?
  - In the NDA we are protecting the fact that Grace and SI are working with molecules even if the molecules are public information. When using commercial LLMs like ChatGPT, that doesn’t disclose such information. We may label specific Test Molecules that should not use commercial LLMs but most will be able to use them.
Parameterization Controls and Prioritization Questions
- Are there any particular controls SI is interested in, as part of the solution.
  - We would like to be able to prioritize or de-prioritize things like:
    - Reaction types: “do not use oxidation reactions “
    - Reaction conditions: “only include reaction temperatures greater than -40C and less than 150C”
    - Reagent costs: “do not use starting materials that cost more than $10/kg”
    - Reagent types: “do not use precious metal catalysts”
    - Confidence level: “include only reactions which have a >.8 confidence metric”
    - Scalability level: “include only reaction steps which have a scalability greater than 10”
  - We realize that these controls may not be ready in time for the real-time demonstration, but in that case the semifinalists and the judges could have a conversation about how or it such controls could be added.

Answers to Questions from SI Challenge Teams

comments