CS385, Problem Set 7 Solutions

CS385, Fall 2010
Theory of Computation
Problem Set 7 Solutions

The objective of this problem is to design a PDA that recognizes the language L of all strings over the alphabet {a,b} that contain twice as many a's as b's.
a) Provide a careful and complete argument that will convince a skeptical but rational jury that every nonempty string of the target language L must contain at least one of the strings aab, aba, baa as a substring. Note that the latter strings are the shortest nonempty strings in L. Hint: Given a string w of L, chop w up into consecutive segments of length three. Each of these segments is either one of the desired strings aab, aba, baa, or else it is one of the other five possible strings of length four over the alphabet {a, b}. In the former case, you're done. In the latter, argue that at least one of the segments must be aaa (why?). Take it from there.
Solution
Argue as suggested in the hint. Suppose that w is a string in the given language L. Then the length of w is a multiple of 3 since w contains two a's for every b. The string w can therefore be expressed as the concatenation of disjoint and consecutive substrings of length 3. If one of these substrings is one of the strings aab, aba, baa, then our work is done. If not, then each of the substrings has to be one of the other five possible length 3 strings over the alphabet {a, b}. Notice that aaa is the only one of the latter set of strings that contains more a's than b's. Therefore, in order to make the number of a's in w equal to twice the number of b's in w, at least one of the length 3 substrings into which we chopped up w has to be aaa; otherwise, w would contain at least as many b's as a's, which contradicts the criterion for membership in L.
Zoom in on one of the occurrences of aaa in w. The neighboring length 3 segments of w are either both also aaa, or else one of them contains the symbol b. In the former case, "grow" the aaa segment by adding the neighboring aaa segments to it. Iterate this process until the boundary of the "all a's" region is reached. Notice that an occurrence of b will eventually be found either to the right or to the left of an aaa segment, since otherwise the string would contain only a's and therefore would not belong to L. If an occurrence of b is found to the left, then the string baa will occur at the boundary. If an occurrence of b is found to the right, then the string aab will occur at the boundary (the other boundary may be one of the ends of the string in either of these cases, which is ok). In either case, one of the target strings aab, aba, baa will occur as a substring of w as claimed.
b) Use the fact stated in part a) to design a PDA P that recognizes L. Your PDA design should be directly based on part a). If you wish, provide a state transition diagram in extended PDA (ePDA) notation first, and then show the standard PDA state transition diagram associated with it. Explain.
Solution
I'll give a verbal description that you can easily translate into a state transition diagram for an ePDA. Use three states start, compute, and accept, with the usual stack-bottom transitions between the compute state and the other two (epsilon, epsilon--> $ from start to compute, and epsilon, $--> epsilon from compute to accept). At the compute state, add a loop corresponding to pushing each of the symbols of the alphabet as they are read (e.g. a, epsilon-->a), plus a loop corresponding to popping each of the special length three substrings described above in part a) without reading any input (e.g. epsilon, aab-->epsilon).
Consider the following CFG G:

S -> ASB | 10 | SS
A -> 1A | ε
B -> B0 | ε
a) Design a PDA P that recognizes the language generated by G. Follow the CFG-to-PDA conversion procedure discussed in class. The machine P should have 3 states. Explain.
Solution
The ePDA associated with this CFG by the conversion procedure discussed in class will have three states: the start state, a compute or loop state, and an accepting state. Its stack alphabet will consist of all terminals and nonterminals of the given grammar plus the symbol $. The transition from start to compute (respectively, from compute to accept) just pushes S$ onto (respectively pops $ from) the stack. Refer to Figure 2.11 on p. 109 of the textbook for the overall architecture of the PDA in a generic situation.
The key transitions are those from the compute state to itself. There is one transition of the form ε, A -> w for each rule of the grammar of the form A -> w, where A is a variable and w is a string of terminals and nonterminals. For the grammar given above, these transitions are the following:
```
     	ε, S -> ASB | 10 | SS  
	ε, A -> 1A | ε  
	ε, B -> B0 | ε
```
The use of the symbol | is shorthand; there will really be a separate transition for each possible consequent of each rule. This corresponds to nondeterminism of the PDA.
We must also add a transition of the form a, a -> ε for each terminal symbol a. In this grammar there are precisely two terminal symbols x, y, so two rules are added in this stage:
```
	0, 0 -> ε
	1, 1 -> ε
```
This completes the construction of the PDA corresponding to the grammar given above.
b) Give a step by step description of an accepting computation of your PDA P from part a) on the input string 11000. Indicate the full sequence of state transitions of P. Also, clearly show each step in the computation by drawing the following ingredients (draw separate copies of each for every step):
- the current state
- the current stack contents
- the unread portion of the input string
For example, at the start of the computation, you should show the start state, an empty stack, and the full input string 11000.
Solution
Here's a sketch of an accepting computation on input 11000. Each ordered pair (s,t) consists of the unread portion s of the input string and the stack contents t at that point. Stack contents are displayed with the stack top on the left.
(11000, ε)
(11000, S$)
(11000, ASB$)
(11000, 1ASB$)
(1000, ASB$)
(1000, SB$)
(1000, 10B$)
(000, 0B$)
(00, B$)
(00, B0$)
(00, B00$)
(00, 00$)
(0, 0$)
(ε, $)
(ε, ε)

c) Give, in detail, the derivation of the string 11000 in the grammar G that corresponds to the particular accepting computation that you described in part b).
Solution

S => ASB => 1ASB => 1SB => 110B => 110B0 => 110B00 => 11000
Consider a new machine model: the "Post-It Automaton", or PIA, which consists of a FA with an additional "notes memory" that can be used to store a symbol of some finite "notes alphabet". Formally, a PIA P can be described by the sequence of its ingredients (Q, Σ, Ν δ, q₀, F), where:
- Q is a finite state space
- Σ is a finite input alphabet
- Ν is a finite notes alphabet that includes a special blank symbol
- δ:Q x Σ x Ν → Q x Ν is the transition function
- q₀ is the start state
- F, a subset of Q, is the set of accepting states
The machine P starts in the start state, with a blank notes memory. If in state q, with n in the notes memory, an input symbol a is read, the machine then transitions to state q' and writes n' in the notes memory, where q' and n' are provided by the transition function:
δ(q,n,a) = (q',n').
The input string is accepted by P if and only if the computation of P on input w ends in an accepting state.
a) Suppose that L is a regular language. Must there exist a PIA P that recognizes L? Explain your answer carefully.
Solution
Yes, of course: every DFA is basically a PIA that ignores its notes memory. Given a regular language L, just construct a DFA M that recognizes L. Then "promote" M to a PIA P by giving it a notes memory consisting of only the required blank symbol, and a transition function that leaves the notes memory unchanged:
δ_P(q,blank,a) = (q',blank), where q' = δ_M(q,a), for all states q
Here, the states q are the same as for the DFA M, as is the input alphabet. Since P so defined operates exactly like the original DFA M, it recognizes the same language L.
b) Can you construct a PIA that recognizes the language L = {0ⁿ 1ⁿ | n ≥ 0}?
Solution
Hmmmm... This problem requires thinking about PIA computation more closely. Suppose that a PIA is in state q, with n in its notes memory. Other than the input, what else do you need to know in order to know everything about the current "configuration" of the machine? Think about this, and work on part c) at the same time. Feel free to stop by during office hours to discuss.
c) (Optional extra credit) Can you give a simple characterization of the class of languages that can be recognized by PIA? Any solution submitted for this portion must include a carefully reasoned and complete explanation.
Solution
See comments above in part b).

CS385, Fall 2010 Theory of Computation Problem Set 7 Solutions

Solution

Solution

Solution

Solution

Solution

Solution

Solution

Solution

CS385, Fall 2010
Theory of Computation
Problem Set 7 Solutions